1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

165 commits

Author SHA1 Message Date
Sunil Khatri
40356542c3 drm/amdgpu: add protype for print ip state
Add the protoype for print ip state to be used
to print the registers in devcoredump during
a gpu reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Sunil Khatri
e21d253bd7 drm/amdgpu: add prototype for ip dump
Add the prototype to dump ip registers
for all ips of different asics and set
them to NULL for now. Based on the
requirement add a function pointer for
each of them.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Srinivasan Shanmugam
fa8a91b0e5 drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r'

Fixes: fac4ebd79f ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:19 -05:00
Arunpravin Paneer Selvam
00a11f977b drm/amdgpu: Enable seq64 manager and fix bugs
- Enable the seq64 mapping sequence.
- Fix wflinfo va conflict and other bugs.

v1:
  - The seq64 area needs to be included in the AMDGPU_VA_RESERVED_SIZE
    otherwise the areas will conflict with user space allocations (Alex)

  - It needs to be mapped read only in the user VM (Alex)

v2:
  - Instead of just one define for TOP/BOTTOM
    reserved space separate them into two (Christian)

  - Fix the CPU and VA calculations and while at it
    also cleanup error handling and kerneldoc (Christian)

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2024-01-22 17:13:18 -05:00
Alex Deucher
bff3315ba8 drm/amdgpu: fix AGP init order
The default AGP settings were overwriting the IP selected
ones since the default was getting set after the IP ones
were selected.

Fixes: de59b69932 ("drm/amdgpu/gmc: set a default disable value for AGP")
Link: https://lists.freedesktop.org/archives/amd-gfx/2023-November/100966.html
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
2023-11-09 16:59:46 -05:00
Alex Deucher
8dbf1ba867 drm/amdgpu: cache gpuvm fault information for gmc7+
Cache the current fault info in the vm struct.  This can be queried
by userspace later to help debug UMDs.

Cc: samuel.pitoiset@gmail.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:37:07 -04:00
Alex Deucher
917f91d8d8 drm/amdgpu/gmc: add a way to force a particular placement for GART
We normally place GART based on the location of VRAM and the
available address space around that, but provide an option
to force a particular location for hardware that needs it.

v2: Switch to passing the placement via parameter

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:36:07 -04:00
Alex Deucher
de59b69932 drm/amdgpu/gmc: set a default disable value for AGP
To disable AGP, the start needs to be set to a higher
value than the end.  Set a default disable value for
the AGP aperture and allow the IP specific GMC code
to enable it selectively be calling amdgpu_gmc_agp_location().

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:22 -04:00
Alex Deucher
29495d8145 drm/amdgpu/gmc6-8: properly disable the AGP aperture
The BOT register needs to be larger than the TOP register
for this to be properly disabled.  The lower 22 bits
of the BOT address are always 0 and the lower 22 bits of
the TOP register are always 1 so you need to make
the upper bits of BOT larger than the upper bits of BOT.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:22 -04:00
Christian König
e2e3788850 drm/amdgpu: rework lock handling for flush_tlb v2
Instead of each implementation doing this more or less correctly
move taking the reset lock at a higher level.

v2: fix typo

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König
3983c9fd2d drm/amdgpu: drop error return from flush_gpu_tlb_pasid
That function never fails, drop the error return.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König
fb4c52db69 drm/amdgpu: fix and cleanup gmc_v7_0_flush_gpu_tlb_pasid
Testing for reset is pointless since the reset can start right after the
test. Grab the reset semaphore instead.

The same PASID can be used by more than once VMID, build a mask of VMIDs
to invalidate instead of just restting the first one.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Srinivasan Shanmugam
a62e702ee1 drm/amdgpu: Avoid possiblity of kernel crash in 'gmc_v8_0, gmc_v7_0_init_microcode()'
If the function 'gmc_v8_0_ or gmc_v7_0_init_microcode()' fails, the
driver will just fail to load, hence return -EINVAL rather having BUG(),
fixes WARNING: Do not crash the kernel unless it is absolutely
unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead
of BUG() or variants

Fixes: 2f77b5931f ("drm/amdgpu: Fix error & warnings in gmc_v8_0.c")
Fixes: 0cfc1d6830 ("drm/amdgpu: Fix errors & warnings in gmc_ v6_0, v7_0.c")
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:09:30 -04:00
Srinivasan Shanmugam
0cfc1d6830 drm/amdgpu: Fix errors & warnings in gmc_ v6_0, v7_0.c
Fix below checkpatch errors & warnings:

ERROR: trailing statements should be on next line
+       default: BUG();
ERROR: trailing statements should be on next line

WARNING: braces {} are not necessary for single statement blocks
WARNING: braces {} are not necessary for any arm of this statement
WARNING: Block comments use * on subsequent lines
WARNING: Missing a blank line after declarations
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:36 -04:00
Mukul Joshi
e77673d14f drm/amdgpu: Update invalid PTE flag setting
Update the invalid PTE flag setting with TF enabled.
This is to ensure, in addition to transitioning the
retry fault to a no-retry fault, it also causes the
wavefront to enter the trap handler. With the current
setting, the fault only transitions to a no-retry fault.
Additionally, have 2 sets of invalid PTE settings, one for
TF enabled, the other for TF disabled. The setting with
TF disabled, doesn't work with TF enabled.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Srinivasan Shanmugam
1bae03aab2 drm/amdgpu: Fix up missing parameter in kdoc for 'inst' in gmc_ v7, v8, v9, v10, v11.c
Fix these warnings by adding 'inst' arguments to kdocs.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:428: warning: Function parameter or member 'inst' not described in 'gmc_v7_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:626: warning: Function parameter or member 'inst' not described in 'gmc_v8_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:423: warning: Function parameter or member 'inst' not described in 'gmc_v10_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c:328: warning: Function parameter or member 'inst' not described in 'gmc_v11_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:950: warning: Function parameter or member 'inst' not described in 'gmc_v9_0_flush_gpu_tlb_pasid'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:35 -04:00
Mukul Joshi
f87f686482 drm/amdgpu: Add XCC inst to PASID TLB flushing
Add XCC instance to select the correct KIQ ring when
flushing TLBs on a multi-XCC setup.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:42:46 -04:00
Le Ma
d9426c3d9b drm/amdgpu: add bitmask to iterate vmhubs
As the layout of VMHUB definition has been changed to cover multiple
XCD/AID case, the original num_vmhubs is not appropriate to do vmhub
iteration any more.

Drop num_vmhubs and introduce vmhubs_mask instead.

v2: switch to the new VMHUB layout
v3: use DECLARE_BITMAP to define vmhubs_mask

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:40:17 -04:00
Mario Limonciello
ee138d86ef drm/amd: Use amdgpu_ucode_* helpers for GMC7
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Christian König
da2f992091 drm/amdgpu: cleanup visible vram size handling
Centralize the limit handling and validation in one place instead
of spreading that around in different hw generations.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-03 16:50:14 -05:00
Christian König
7ccfd79fdd drm/amdgpu: rename vram_scratch into mem_scratch
Rename vram_scratch into mem_scratch and allow allocating it into GTT as
well.

The only problem with that is that we won't have a default page for the
system aperture any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-03 16:50:03 -05:00
Alex Deucher
b818a5d374 drm/amdgpu/gmc: use PCI BARs for APUs in passthrough
If the GPU is passed through to a guest VM, use the PCI
BAR for CPU FB access rather than the physical address of
carve out.  The physical address is not valid in a guest.

v2: Fix HDP handing as suggested by Michel

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25 12:40:24 -04:00
Christian König
1b08dfb889 drm/amdgpu: remove gart.ready flag
That's just a leftover from old radeon days and was preventing CS and GART
bindings before the hardware was initialized. But nowdays that is
perfectly valid.

The only thing we need to warn about are GART binding before the table
is even allocated.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-19 22:32:47 -05:00
Xiaojian Du
479e3b02b7 drm/amdgpu: add vram check function for GMC
This patch will add vram check function for GMC block.
It will write pattern data to the vram and then read back from the vram,
so that to verify the work status of vram.
This patch  will cover gmc v6/7/8/9/10.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-19 22:31:51 -05:00
Nirmoy Das
575e55ee4f drm/amdgpu: recover gart table at resume
Get rid off pin/unpin of gart BO at resume/suspend and
instead pin only once and try to recover gart content
at resume time. This is much more stable in case there
is OOM situation at 2nd call to amdgpu_device_evict_resources()
while evicting GART table.

v3: remove gart recovery from other places
v2: pin gart at amdgpu_gart_table_vram_alloc()

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11 15:44:26 -05:00
Dave Airlie
5745d647d5 Merge tag 'amd-drm-next-5.14-2021-06-02' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.14-2021-06-02:

amdgpu:
- GC/MM register access macro clean up for SR-IOV
- Beige Goby updates
- W=1 Fixes
- Aldebaran fixes
- Misc display fixes
- ACPI ATCS/ATIF handling rework
- SR-IOV fixes
- RAS fixes
- 16bpc fixed point format support
- Initial smartshift support
- RV/PCO power tuning fixes for suspend/resume
- More buffer object subclassing work
- Add new INFO query for additional vbios information
- Add new placement for preemptable SG buffers

amdkfd:
- Misc fixes

radeon:
- W=1 Fixes
- Misc cleanups

UAPI:
- Add new INFO query for additional vbios information
  Useful for debugging vbios related issues.  Proposed umr patch:
  https://patchwork.freedesktop.org/patch/433297/
- 16bpc fixed point format support
  IGT test:
  https://lists.freedesktop.org/archives/igt-dev/2021-May/031507.html
  Proposed Vulkan patch:
  a25d480207
- Add a new GEM flag which is only used internally in the kernel driver.  Userspace
  is not allowed to set it.

drm:
- 16bpc fixed point format fourcc

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210602214009.4553-1-alexander.deucher@amd.com
2021-06-04 06:13:57 +10:00
Lee Jones
9f7724f088 drm/amd/amdgpu/gmc_v7_0: Fix potential copy/paste issue
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:526: warning: expecting prototype for gmc_v8_0_set_fault_enable_default(). Prototype was for gmc_v7_0_set_fault_enable_default() instead

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-21 10:32:16 -04:00
Andrey Grodzovsky
d10d0daa20 drm/amdgpu: Handle IOMMU enabled case.
Problem:
Handle all DMA IOMMU group related dependencies before the
group is removed. Those manifest themself in that when IOMMU
enabled DMA map/unmap is dependent on the presence of IOMMU
group the device belongs to but, this group is released once
the device is removed from PCI topology.

Fix:
Expedite all such unmap operations to pci remove driver callback.

v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate
v6: Drop the BO unamp list
v7:
Drop amdgpu_gart_fini
In amdgpu_ih_ring_fini do uncinditional  check (!ih->ring)
to avoid freeing uniniitalized rings.
Call amdgpu_ih_ring_fini unconditionally.
v8: Add deatiled explanation

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210517143851.475058-1-andrey.grodzovsky@amd.com
2021-05-19 23:50:27 -04:00
Oak Zeng
f1dc12ca56 drm/amdgpu: Moved gart_size calculation to mc_init functions
In amdgpu_gmc_gart_location function, gart_size is adjusted
by a smu_prv_buffer_size. This logic shouldn't belong to
this function. Move the logic to the mc_init functions

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 22:58:30 -04:00
Alex Deucher
64f2c15892 drm/amdgpu: remove amdgpu_ttm_late_init and amdgpu_bo_late_init
No longer used.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-12-08 23:06:33 -05:00
Nirmoy Das
68fce5f07c drm/amdgpu: use AMDGPU_NUM_VMID when possible
Replace hardcoded vmid number with AMDGPU_NUM_VMID macro.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-12-08 23:05:40 -05:00
Lee Jones
c39b7992cc drm/amd/amdgpu/gmc_v7_0: Add some missing kernel-doc descriptions
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:433: warning: Function parameter or member 'flush_type' not described in 'gmc_v7_0_flush_gpu_tlb_pasid'
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:433: warning: Function parameter or member 'all_hub' not described in 'gmc_v7_0_flush_gpu_tlb_pasid'
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:471: warning: Function parameter or member 'vmhub' not described in 'gmc_v7_0_flush_gpu_tlb'
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:471: warning: Function parameter or member 'flush_type' not described in 'gmc_v7_0_flush_gpu_tlb'
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:771: warning: Function parameter or member 'pasid' not described in 'gmc_v7_0_vm_decode_fault'

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-11-24 12:09:53 -05:00
Dennis Li
53b3f8f40e drm/amdgpu: refine codes to avoid reentering GPU recovery
if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24 12:22:56 -04:00
Christian König
f1403342eb drm/amdgpu: revert "fix system hang issue during GPU reset"
The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1aa2.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Alex Deucher
71755699b5 drm/amdgpu/gmc7: switch to using amdgpu_gmc_get_vbios_allocations
The new helper centralizes the logic in one place.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04 17:29:28 -04:00
Alex Deucher
fcbc92e2e1 drm/amdgpu: move stolen vga bo from amdgpu to amdgpu.gmc
Since that is where we store the other data related to
the stolen vga memory.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04 17:29:28 -04:00
Alex Deucher
81b54fb7a2 drm/amdgpu: use a define for the memory size of the vga emulator
Rather than open coding it everywhere.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-04 17:29:28 -04:00
Dennis Li
df9c8d1aa2 drm/amdgpu: fix system hang issue during GPU reset
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249]  dump_stack+0x98/0xd5
[ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831]  ksys_ioctl+0x98/0xb0
[ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
[ 1230.205174]  do_syscall_64+0x5f/0x250
[ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: Luben Tukov <luben.tuikov@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-27 16:21:37 -04:00
Felix Kuehling
40111ec229 drm/amdgpu: Clean up KFD VMID assignment
The KFD VMID assignment was hard-coded in a few places. Consolidate that in
a single variable adev->vm_manager.first_kfd_vmid. The value is still
assigned in gmc-ip-version-specific code.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-02 12:02:50 -04:00
Felix Kuehling
6e696ae36e drm/amdgpu: Add missing parameter description in comments
Kerneldoc comments should describe all function parameters.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01 10:00:26 -04:00
Aurabindo Pillai
dd4fa6c1b8 drm/amd/amdgpu: remove hardcoded module name in prints
Let format prefixes take care of printing the module name
through pr_fmt and dev_fmt definitions.

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:40 -04:00
Christian König
77171eade8 drm/amdgpu: add coreboot workaround for KV/KB
Coreboot seems to have a problem correctly setting up access to the stolen VRAM
on KV/KB. Use the direct access only when necessary.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reported-and-tested-by: Fredrik Bruhn <fredrik.bruhn@unibap.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:45 -05:00
Alex Sierra
ea930000a6 drm/amdgpu: export function to flush TLB via pasid
This can be used directly from amdgpu and amdkfd to invalidate
TLB through pasid.
It supports gmc v7, v8, v9 and v10.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:34:27 -05:00
Christian König
cbfae36cea drm/amdgpu: cleanup PTE flag generation v3
Move the ASIC specific code into a new callback function.

v2: mask the flags for SI and CIK instead of a BUG_ON().
v3: remove last missed BUG_ON().

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 09:59:29 -05:00
Christian König
71776b6dae drm/amdgpu: cleanup mtype mapping
Unify how we map the UAPI flags to the PTE hardware flags for a mapping.

Only the MTYPE is actually ASIC dependent, all other flags should be
copied over 1 to 1 and ASIC differences are handled later on.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 09:59:21 -05:00
Alex Deucher
bad4c3e665 drm/amdgpu: set adev->num_vmhubs for gmc6,7,8
So that we properly handle them on older asics.

Fixes: 3ff985485b ("drm/amdgpu: Export function to flush TLB of specific vm hub")
Tested-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-23 11:35:25 -05:00
Oak Zeng
3ff985485b drm/amdgpu: Export function to flush TLB of specific vm hub
This is for kfd to reuse amdgpu TLB invalidation function.
On gfx10, kfd only needs to flush TLB on gfx hub but not
on mm hub. So export a function for KFD flush TLB only on
specific hub.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15 10:57:48 -05:00
Christoph Hellwig
244511f386 drm/amdgpu: simplify and cleanup setting the dma mask
Use dma_set_mask_and_coherent to set both masks in one go, and remove
the no longer required fallback, as the kernel now always accepts
larger than required DMA masks.  Fail the driver probe if we can't
set the DMA mask, as that means the system can only support a larger
mask.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15 10:51:01 -05:00
Sam Ravnborg
47b757fba2 drm/amd: drop use of drmP.h in remaining files
With this commit drm/amd/ has no longer any uses of
the deprecated drmP.h header file.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-11-sam@ravnborg.org
2019-06-10 23:04:34 +02:00
Dave Airlie
457109829f Merge branch 'drm-next-5.2' of git://people.freedesktop.org/~agd5f/linux into drm-next
amdgpu:
- Switch to HMM for userptr (reverted until HMM fixes land)
- New experimental SMU 11 replacement for powerplay for vega20 (not enabled by default)
- Initial RAS support for vega20
- BACO support for vega12
- BACO fixes for vega20
- Rework IH handling for page fault and retry interrupts
- Cleanly split CPU and GPU paths for GPUVM updates
- Powerplay fixes
- XGMI fixes
- Rework how DC interacts with atomic for planes
- Clean up and simplify DC/Powerplay interfaces
- Misc cleanups and bug fixes

amdkfd:
- Switch to HMM for userptr (reverted until HMM fixes land)
- Add initial RAS support
- MQD fixes

ttm:
- Unify DRM_FILE_PAGE_OFFSET handling
- Account for kernel allocations in kernel zone only
- Misc cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190402170820.22197-1-alexander.deucher@amd.com
2019-04-03 13:26:11 +10:00