1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

122 commits

Author SHA1 Message Date
Mukul Joshi
d69a3b762d drm/amdkfd: Cleanup kfd_dev struct
Cleanup kfd_dev struct by removing ddev and pdev as both
drm_device and pci_dev can be fetched from amdgpu_device.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:09 -04:00
Daniel Phillips
1ac354beec drm/amdgpu: Pessimistic availability based on rounded up allocations
Separately accumulate a statistic of rounded up allocations to use
to report availability, with a view to increasing the likelihood a
buffer object can be successfully allocated at exactly the size
reported by the availability API.

Signed-off-by: Daniel Phillips <daniel.phillips@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-10 14:58:57 -04:00
Alex Sierra
3d2af401cf drm/amdgpu: add debugfs for kfd system and ttm mem used
This keeps track of kfd system mem used and kfd ttm mem used.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:16 -04:00
Alex Sierra
f9af3c16bf drm/amdkfd: track unified memory reservation with xnack off
[WHY]
Unified memory with xnack off should be tracked, as userptr mappings
and legacy allocations do. To avoid oversuscribe system memory when
xnack off.
[How]
Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
API and call them on every prange creation and free.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:16 -04:00
Philip Yang
c7f21978fa drm/amdkfd: Add user queue eviction restore SMI event
Output user queue eviction and restore event. User queue eviction may be
triggered by svm or userptr MMU notifier, TTM eviction, device suspend
and CRIU checkpoint and restore.

User queue restore may be rescheduled if eviction happens again while
restore.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30 15:31:14 -04:00
Graham Sider
e77a541f5d drm/amdkfd: Enable GFX11 usermode queue oversubscription
Starting with GFX11, MES requires wptr BOs to be GTT allocated/mapped to
GART for usermode queues in order to support oversubscription. In the
case that work is submitted to an unmapped queue, MES must have a GART
wptr address to determine whether the queue should be mapped.

This change is accompanied with changes in MES and is applicable for
MES_API_VERSION >= 2.

v3:
- Use amdgpu_vm_bo_lookup_mapping for wptr_bo mapping lookup
- Move wptr_bo refcount increment to amdgpu_amdkfd_map_gtt_bo_to_gart
- Remove list_del_init from amdgpu_amdkfd_map_gtt_bo_to_gart
- Cleanup/fix create_queue wptr_bo error handling
v4:
- Add MES version shift/mask defines to amdgpu_mes.h
- Change version check from MES_VERSION to MES_API_VERSION
- Add check in kfd_ioctl_create_queue before wptr bo pin/GART map to
ensure bo is a single page.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23 17:22:12 -04:00
Daniel Phillips
9731dd4cad drm/amdkfd: Add available memory ioctl
Add a new KFD ioctl to return the largest possible memory size that
can be allocated as a buffer object using
kfd_ioctl_alloc_memory_of_gpu. It attempts to use exactly the same
accept/reject criteria as that function so that allocating a new
buffer object of the size returned by this new ioctl is guaranteed to
succeed, barring races with other allocating tasks.

This IOCTL will be used by libhsakmt:
https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html

Signed-off-by: Daniel Phillips <Daniel.Phillips@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-14 21:38:40 -04:00
Andrey Grodzovsky
b5fd0cf3ea drm/amdgpu: Add work_struct for GPU reset from kfd.
We need to have a work_struct to cancel this reset if another
already in progress.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10 15:26:07 -04:00
Felix Kuehling
4e2d104435 drm/amdkfd: Document and fix GTT BO kmap API
Removed an unused parameter from two functions and added kernel-doc
comments.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08 11:40:59 -04:00
Ramesh Errabolu
08a2fd23c6 drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs
Add support for peer-to-peer communication among AMD GPUs over PCIe
bus. Support REQUIRES enablement of config HSA_AMD_P2P.

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08 11:40:12 -04:00
Christian König
4d30a83c74 drm/amdkfd: use tlb_seq from the VM subsystem for SVM as well v2
Instead of hand rolling the table_freed parameter.

v2: add some changes suggested by Philip

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25 12:40:52 -04:00
Tao Zhou
6475ae2b74 drm/amdgpu: add UTCL2 RAS poison query for Aldebaran (v2)
Add help functions to query and reset RAS UTCL2 poison status.

v2: implement it on amdgpu side and kfd only calls it.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25 12:40:26 -04:00
Rajneesh Bhardwaj
011bbb0302 drm/amdkfd: CRIU Implement KFD resume ioctl
This adds support to create userptr BOs on restore and introduces a new
ioctl op to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the
restore process i.e. criu_resume ioctl op is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 17:59:41 -05:00
Rajneesh Bhardwaj
5ccbb057c0 drm/amdkfd: CRIU Implement KFD checkpoint ioctl
This adds support to discover the  buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
during a restore operation.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 17:59:29 -05:00
Nirmoy Das
ffb378fb30 drm/amdkfd: remove unused function
Remove unused amdgpu_amdkfd_get_vram_usage()

CC: Felix.Kuehling@amd.com

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: dfcbe6d5f4 ("drm/amdgpu: Remove unused function pointers")
2022-01-11 15:44:26 -05:00
Tao Zhou
b6485bed40 drm/amdkfd: reset queue which consumes RAS poison (v2)
CP supports unmap queue with reset mode which only destroys specific queue without affecting others.
Replacing whole gpu reset with reset queue mode for RAS poison consumption
saves much time, and we can also fallback to gpu reset solution if reset
queue fails.

v2: Return directly if process is NULL;
    Reset queue solution is not applicable to SDMA, fallback to legacy
way;
    Call kfd_unref_process after lookup process.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:59 -05:00
Ramesh Errabolu
f441dd33db drm/amdgpu: Update BO memory accounting to rely on allocation flag
Accounting system to track amount of available memory (system, TTM
and VRAM of a device) relies on BO's domain. The change is to rely
instead on allocation flag indicating BO type - VRAM, GTT, USERPTR,
MMIO or DOORBELL

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Graham Sider
02274fc0f6 drm/amdkfd: replace trivial funcs with direct access
These get funcs simply return an adev field. Replace funcs/calls with
direct field accesses instead.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 16:58:11 -05:00
Graham Sider
b5d1d755c1 drm/amdkfd: remove kgd_dev declaration and initialization
Completes removal of kgd_dev. Direct references to amdgpu_device objects
should now be used instead.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 16:58:03 -05:00
Graham Sider
dff63da93e drm/amdkfd: replace kgd_dev in gpuvm amdgpu_amdkfd funcs
Modified definitions:

- amdgpu_amdkfd_gpuvm_acquire_process_vm
- amdgpu_amdkfd_gpuvm_release_process_vm
- amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu
- amdgpu_amdkfd_gpuvm_free_memory_of_gpu
- amdgpu_amdkfd_gpuvm_map_memory_to_gpu
- amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu
- amdgpu_amdkfd_gpuvm_sync_memory
- amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel
- amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel
- amdgpu_amdkfd_gpuvm_get_vm_fault_info
- amdgpu_amdkfd_gpuvm_import_dmabuf
- amdgpu_amdkfd_get_tile_config

Removed:

- get_amdgpu_device

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 16:58:02 -05:00
Graham Sider
574c4183ef drm/amdkfd: replace kgd_dev in get amdgpu_amdkfd funcs
Modified definitions:

- amdgpu_amdkfd_get_fw_version
- amdgpu_amdkfd_get_local_mem_info
- amdgpu_amdkfd_get_gpu_clock_counter
- amdgpu_amdkfd_get_max_engine_clock_in_mhz
- amdgpu_amdkfd_get_cu_info
- amdgpu_amdkfd_get_dmabuf_info
- amdgpu_amdkfd_get_vram_usage
- amdgpu_amdkfd_get_hive_id
- amdgpu_amdkfd_get_unique_id
- amdgpu_amdkfd_get_mmio_remap_phys_addr
- amdgpu_amdkfd_get_num_gws
- amdgpu_amdkfd_get_asic_rev_id
- amdgpu_amdkfd_get_noretry
- amdgpu_amdkfd_get_xgmi_hops_count
- amdgpu_amdkfd_get_xgmi_bandwidth_mbytes
- amdgpu_amdkfd_get_pcie_bandwidth_mbytes

Also replaces kfd_device_by_kgd with kfd_device_by_adev, now
searching via adev rather than kgd.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 16:58:02 -05:00
Graham Sider
6bfc7c7e17 drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcs
Modified definitions:

- amdgpu_amdkfd_submit_ib
- amdgpu_amdkfd_set_compute_idle
- amdgpu_amdkfd_have_atomics_support
- amdgpu_amdkfd_flush_gpu_tlb_pasid
- amdgpu_amdkfd_flush_gpu_tlb_pasid
- amdgpu_amdkfd_gpu_reset
- amdgpu_amdkfd_alloc_gtt_mem
- amdgpu_amdkfd_free_gtt_mem
- amdgpu_amdkfd_alloc_gws
- amdgpu_amdkfd_free_gws
- amdgpu_amdkfd_ras_poison_consumption_handler

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 16:58:01 -05:00
Felix Kuehling
5702d05295 drm/amdgpu: Fix dangling kfd_bo pointer for shared BOs
If a kfd_bo was shared (e.g. a dmabuf export), the original kfd_bo may be
freed when the amdgpu_bo still lives on. Free the kfd_bo struct in the
release_notify callback then the amdgpu_bo is freed.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-By: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05 14:12:45 -04:00
Lang Yu
68df0f195a drm/amdkfd: Separate pinned BOs destruction from general routine
Currently, all kfd BOs use same destruction routine. But pinned
BOs are not unpinned properly. Separate them from general routine.

v2 (Felix):
Add safeguard to prevent user space from freeing signal BO.
Kunmap signal BO in the event of setting event page error.
Just kunmap signal BO to avoid duplicating the code.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:12 -04:00
Alex Deucher
5b983db8c3 drm/amdkfd: clean up parameters in kgd2kfd_probe
We can get the pdev and asic type from the adev.  No need
to pass them explicitly.

v2: squash in build fix for !CONFIG_HSA_AMD from Anson

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-04 15:23:02 -04:00
Tao Zhou
c749094923 amd/amdkfd: add ras page retirement handling for sq/sdma (v3)
In ras poison mode, page retirement will be handled by the irq handler of the
module which consumes corrupted data.

v2: rename ras_process_cb to ras_poison_consumption_handler.
    move the handler's implementation from ASIC specific file to common
file.

v3: call gpu reset for xGMI connected mode.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-04 15:22:57 -04:00
James Zhu
8066008482 drm/amdgpu: add amdgpu_amdkfd_resume_iommu
Add amdgpu_amdkfd_resume_iommu for amdgpu.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-09-16 09:56:24 -04:00
James Zhu
fefc01f042 drm/amdkfd: separate kfd_iommu_resume from kfd_resume
Separate kfd_iommu_resume from kfd_resume for fine-tuning
of amdgpu device init/resume/reset/recovery sequence.

v2: squash in fix for !CONFIG_HSA_AMD

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-09-16 09:56:24 -04:00
Eric Huang
fce1a7eb35 Revert "Revert "drm/amdkfd: Make TLB flush conditional on mapping""
This reverts commit 7ed9876c97.

Revert reason: The issue has been resolved.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-02 17:21:24 -04:00
Dave Airlie
04d505de7f Merge tag 'amd-drm-next-5.15-2021-07-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.15-2021-07-29:

amdgpu:
- VCN/JPEG power down sequencing fixes
- Various navi pcie link handling fixes
- Clockgating fixes
- Yellow Carp fixes
- Beige Goby fixes
- Misc code cleanups
- S0ix fixes
- SMU i2c bus rework
- EEPROM handling rework
- PSP ucode handling cleanup
- SMU error handling rework
- AMD HDMI freesync fixes
- USB PD firmware update rework
- MMIO based vram access rework
- Misc display fixes
- Backlight fixes
- Add initial Cyan Skillfish support
- Overclocking fixes suspend/resume

amdkfd:
- Sysfs leak fix
- Add counters for vm faults and migration
- GPUVM TLB optimizations

radeon:
- Misc fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210730033455.3852-1-alexander.deucher@amd.com
2021-07-30 16:48:35 +10:00
Eric Huang
8f0e2d5c99 Revert "Revert "drm/amdkfd: Make TLB flush conditional on mapping""
This reverts commit 7ed9876c97.

Revert reason: The issue has been resolved.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-28 16:37:18 -04:00
Graham Sider
410e302ea5 drm/amdkfd: Update SMI throttle event bitmask
Update Arcturus/Aldebaran thermal throttle SMI event path to use
ASIC-independent throttler bits when logging.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23 10:08:00 -04:00
Jonathan Kim
9330481038 drm/amdkfd: report pcie bandwidth to the kfd
Similar to xGMI reporting the min/max bandwidth between direct peers, PCIe
will report the min/max bandwidth to the KFD.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23 10:07:59 -04:00
Jonathan Kim
3f46c4e9ce drm/amdkfd: report xgmi bandwidth between direct peers to the kfd
Report the min/max bandwidth in megabytes to the kfd for direct
xgmi connections only.  Indirect peers will report 0 since
indirect route is unknown.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23 10:07:59 -04:00
Eric Huang
c37387c354 Revert "drm/amdkfd: Make TLB flush conditional on mapping"
This reverts commit 31f3324378.

Reason for revert: it causes regressions on several Asics.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-13 11:59:22 -04:00
Eric Huang
7ed9876c97 Revert "drm/amdkfd: Make TLB flush conditional on mapping"
This reverts commit 31f3324378.

Reason for revert: it causes regressions on several Asics.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-13 11:38:55 -04:00
Eric Huang
31f3324378 drm/amdkfd: Make TLB flush conditional on mapping
It is to optimize memory mapping latency, and also aviod
a page fault in a corner case of changing valid PDE into
PTE.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-04 12:40:01 -04:00
Thomas Zimmermann
304ba5dca4 Merge drm/drm-next into drm-misc-next
Backmerging from drm/drm-next to the patches for AMD devices
for v5.14.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2021-05-22 07:17:05 +02:00
Andrey Grodzovsky
e9669fb782 drm/amdgpu: Add early fini callback
Use it to call disply code dependent on device->drv_data
before it's set to NULL on device unplug

v5:
Move HW finilization into this callback to prevent MMIO accesses
post cpi remove.

v7:
Split kfd suspend from device exit to expdite HW related
stuff to amdgpu_pci_remove

v8:
Squash previous KFD commit into this commit to avoid compile break.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com
2021-05-19 23:48:50 -04:00
Felix Kuehling
5ac3c3e45f drm/amdgpu: Add DMA mapping of GTT BOs
Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:44:06 -04:00
Felix Kuehling
264fb4d332 drm/amdgpu: Add multi-GPU DMA mapping helpers
Add BO-type specific helpers functions to DMA-map and unmap
kfd_mem_attachments. Implement this functionality for userptrs by creating
one SG BO per GPU and filling it with a DMA mapping of the pages from the
original mem->bo.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:56 -04:00
Felix Kuehling
c780b2eedb drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
This name is more fitting, especially for the changes coming next to
support multi-GPU systems with proper DMA mappings. Cleaned up the code
and renamed some related functions and variables to improve readability.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:43:46 -04:00
Philip Yang
765385ec00 drm/amdkfd: heavy-weight flush TLB after unmap
Need do a heavy-weight TLB flush to make sure we have no more dirty data
in the cache for the unmapped pages.

Define enum TLB_FLUSH_TYPE, add flush_type parameter to
amdgpu_amdkfd_flush_gpu_tlb_pasid.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19 22:37:50 -04:00
Alex Sierra
eb2cec5537 drm/amdkfd: add svm_bo reference for eviction fence
[why]
As part of the SVM functionality, the eviction mechanism used for
SVM_BOs is different. This mechanism uses one eviction fence per prange,
instead of one fence per kfd_process.

[how]
A svm_bo reference to amdgpu_amdkfd_fence to allow differentiate between
SVM_BO or regular BO evictions. This also include modifications to set the
reference at the fence creation call.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:49:22 -04:00
Philip Yang
c46ebb6a6d drm/amdkfd: set memory limit to avoid OOM with HMM enabled
HMM migration alloc sizeof(struct page) on system memory for each VRAM
page, it is 1GB system memory reserved for 64GB VRAM. To avoid
application OOM, increase system memory used size based on VRAM size of
all GPUs, then application alloc memory will fail if system memory usage
reach the limit.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:48:00 -04:00
Felix Kuehling
f80fe9d3c1 drm/amdkfd: map svm range to GPUs
Use amdgpu_vm_bo_update_mapping to update GPU page table to map or unmap
svm range system memory pages address to GPUs.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:47:21 -04:00
Felix Kuehling
d4ec4bdc0b drm/amdkfd: Allow access for mmapping KFD BOs
DRM render node file handles are used for CPU mapping of BOs using mmap
by the Thunk. It uses the DRM render node of the GPU where the BO was
allocated.

DRM allows mmap access automatically when it creates a GEM handle for a
BO. KFD BOs don't have GEM handles, so KFD needs to manage access
manually. Use drm_vma_node_allow to allow user mode to mmap BOs allocated
with kfd_ioctl_alloc_memory_of_gpu through the DRM render node that was
used in the kfd_ioctl_acquire_vm call for the same GPU.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:45:54 -04:00
Felix Kuehling
b40a6ab2cf drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap
to access the BO through the corresponding file descriptor. The VM can
also be extracted from drm_priv, so drm_priv can replace the vm parameter
in the kfd2kgd interface.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:45:45 -04:00
Felix Kuehling
f45e6b9d03 drm/amdkfd: Remove legacy code not acquiring VMs
ROCm user mode has acquired VMs from DRM file descriptors for as long
as it supported the upstream KFD. Legacy code to support older versions
of ROCm is not needed any more.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:32:44 -04:00
shaoyunl
8e2712e71b drm/amdgpu: Add kfd init_complete flag to check from amdgpu side
amdgpu driver may be in reset state during init which will not initialize the kfd,
driver need to initialize the KFD after reset by check the flag

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 23:10:28 -04:00