1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

9967 commits

Author SHA1 Message Date
Dave Airlie
951bad0bd9 Merge tag 'amd-drm-fixes-5.16-2021-11-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-5.16-2021-11-10:

amdgpu:
- Don't allow partial copy from user for DC debugfs
- SRIOV fixes
- GFX9 CSB pin count fix
- Various IP version check fixes
- DP 2.0 fixes
- Limit DCN1 MPO fix to DCN1

amdkfd:
- SVM fixes
- Reset fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211110222536.7527-1-alexander.deucher@amd.com
2021-11-11 10:14:44 +10:00
Dave Airlie
f8ca7b7419 Removed the TTM Huge Page functionnality to address a crash, a timeout
fix for udl, CONFIG_FB dependency improvements, a fix for a circular
 locking depency in imx, a NULL pointer dereference fix for virtio, and a
 naming collision fix for drm/locking.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCYYuA2wAKCRDj7w1vZxhR
 xQdZAP4oqiWpCO1JfnZdvEJ/lOULqvdzYkbUZexshGLdbb4ECwEA83TzIbQvXP8p
 jsC1hPNAIsOBkQ+nGZwJkTOtDcpEAQ4=
 =N2pK
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-fixes-2021-11-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

Removed the TTM Huge Page functionnality to address a crash, a timeout
fix for udl, CONFIG_FB dependency improvements, a fix for a circular
locking depency in imx, a NULL pointer dereference fix for virtio, and a
naming collision fix for drm/locking.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20211110082114.vfpkpnecwdfg27lk@gilmour
2021-11-11 08:14:19 +10:00
Guchun Chen
4d395f938a drm/amdgpu: add missed support for UVD IP_VERSION(3, 0, 64)
Fixes: 96b8dd4423 ("drm/amdgpu/amdgpu_vcn: convert to IP version checking")
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-10 12:03:41 -05:00
Guchun Chen
b45a36032d drm/amdgpu: drop jpeg IP initialization in SRIOV case
Fixes: b05b9c591f ("drm/amdgpu: clean up set IP function")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-10 12:00:08 -05:00
shaoyunl
9f4f2c1a35 drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
The KFD pre_reset should be called before reset been executed, it will
hold the lock to prevent other rocm process to sent the packlage to hiq
during host execute the real reset on the HW

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-09 17:08:00 -05:00
Evan Quan
4fc30ea780 drm/amdgpu: fix uvd crash on Polaris12 during driver unloading
There was a change(below) target for such issue:
d82e2c249c ("drm/amdgpu: Fix crash on device remove/driver unload")
But the fix for VI ASICs was missing there. This is a supplement for
that.

Fixes: d82e2c249c ("drm/amdgpu: Fix crash on device remove/driver unload")

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-09 17:06:15 -05:00
Alex Deucher
2d32ffd6e9 drm/amdgpu: fix SI handling in amdgpu_device_asic_has_dc_support()
Properly handle SI DC support when CONFIG_DRM_AMD_DC_SI is not
set.

Fixes: f7f12b2582 ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05 14:12:53 -04:00
Felix Kuehling
5702d05295 drm/amdgpu: Fix dangling kfd_bo pointer for shared BOs
If a kfd_bo was shared (e.g. a dmabuf export), the original kfd_bo may be
freed when the amdgpu_bo still lives on. Free the kfd_bo struct in the
release_notify callback then the amdgpu_bo is freed.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-By: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05 14:12:45 -04:00
Evan Quan
e6ef9b396b drm/amdgpu: correctly toggle gfx on/off around RLC_SPM_* register access
As part of the ib padding process, accessing the RLC_SPM_* register may
trigger gfx hang. Since gfxoff may be already kicked during the whole period.
To address that, we manually toggle gfx on/off around the RLC_SPM_*
register access.

This can resolve the gfx hang issue observed on running Talos with RDP launched
in parallel.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05 14:12:29 -04:00
Tao Zhou
7513c9ff44 drm/amdgpu: correct xgmi ras error count reset
The error count reset for xgmi3x16 pcs is missed.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05 14:12:21 -04:00
YuBiao Wang
6ddc0eb7a2 drm/amd/amdgpu: Fix csb.bo pin_count leak on gfx 9
[Why]
csb bo is not unpinned in gfx 9. It will lead to pin_count leak on
driver unload.

[How]
Call bo_free_kernel corresponding to bo_create_kernel in
gfx_rlc_init_csb. This will also unify the code path with other gfx
versions.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05 14:11:32 -04:00
YuBiao Wang
c4fc13b581 drm/amd/amdgpu: Avoid writing GMC registers under sriov in gmc9
[Why]
For Vega10, disabling gart of gfxhub could mess up KIQ and PSP
under sriov mode, and lead to DMAR on host side.

[How]
Skip writing GMC registers under sriov.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05 14:11:20 -04:00
Kent Russell
7ef6b7f844 drm/amdgpu: Make sure to reserve BOs before adding or removing
BOs need to be reserved before they are added or removed, so ensure that
they are reserved during kfd_mem_attach and kfd_mem_detach

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05 14:11:05 -04:00
Jason Gunthorpe
0d97950953 drm/ttm: remove ttm_bo_vm_insert_huge()
The huge page functionality in TTM does not work safely because PUD and
PMD entries do not have a special bit.

get_user_pages_fast() considers any page that passed pmd_huge() as
usable:

	if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
		     pmd_devmap(pmd))) {

And vmf_insert_pfn_pmd_prot() unconditionally sets

	entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));

eg on x86 the page will be _PAGE_PRESENT | PAGE_PSE.

As such gup_huge_pmd() will try to deref a struct page:

	head = try_grab_compound_head(pmd_page(orig), refs, flags);

and thus crash.

Thomas further notices that the drivers are not expecting the struct page
to be used by anything - in particular the refcount incr above will cause
them to malfunction.

Thus everything about this is not able to fully work correctly considering
GUP_fast. Delete it entirely. It can return someday along with a proper
PMD/PUD_SPECIAL bit in the page table itself to gate GUP_fast.

Fixes: 314b6580ad ("drm/ttm, drm/vmwgfx: Support huge TTM pagefaults")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Thomas Hellström <thomas.helllstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
[danvet: Update subject per Thomas' &Christian's review]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/0-v2-a44694790652+4ac-ttm_pmd_jgg@nvidia.com
2021-11-05 11:13:19 +01:00
James Zhu
93cec18478 drm/amdgpu: remove duplicated kfd_resume_iommu
Remove duplicated kfd_resume_iommu which already runs
in mdgpu_amdkfd_device_init.

Tested-By: Ken Moffat <zarniwhoop@ntlworld.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-03 12:22:08 -04:00
Aaron Liu
e8a423c589 drm/amdgpu: update RLC_PG_DELAY_3 Value to 200us for yellow carp
For yellow carp, the desired CGPG hysteresis value is 0x4E20.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-03 12:22:08 -04:00
Mario Limonciello
c92f909614 drm/amdgpu: Convert SMU version to decimal in debugfs
This is more useful when talking to the SMU team to have the information
in this format, save one less step to manually do it.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-03 12:22:07 -04:00
Oak Zeng
72c148d776 drm/amdgpu: use correct register mask to extract field
Aldebaran has different register mask definitions for
regiter MC_VM_XGMI_LFB_CNTL. Use the correct masks
to interpret fields of this register.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-03 12:22:07 -04:00
Jingwen Chen
38d4e4638e drm/amd/amdgpu: fix bad job hw_fence use after free in advance tdr
[Why]
In advance tdr mode, the real bad job will be resubmitted twice, while
in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job
is put one more time than other jobs.

[How]
Adding dma_fence_get before resbumit job in
amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-03 12:22:07 -04:00
Alex Deucher
403475be6d drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bits
The DMA mask on SI parts is 40 bits not 44.  Copy
paste typo.

Fixes: 244511f386 ("drm/amdgpu: simplify and cleanup setting the dma mask")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1762
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:27:00 -04:00
Alex Deucher
58f8c7fa88 drm/amdgpu/discovery: add SDMA IP instance info for soc15 parts
Add secondary instance version info for soc15 parts.

Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:59 -04:00
Alex Deucher
074b2092d9 drm/amdgpu/discovery: add UVD/VCN IP instance info for soc15 parts
Add secondary instance version info for vega20, arcturure, and
aldebaran.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:59 -04:00
Tao Zhou
3d1a8d950d drm/amdgpu: remove GPRs init for ALDEBARAN in gpu reset (v3)
Remove GPRs init for ALDEBARAN in gpu reset temporarily, will add the init once the
algorithm is stable.

v2: Only remove GPRs init in gpu reset.
v3: Suspend needs it, only skip it in gpu reset.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:13 -04:00
Tao Zhou
f7e053435c drm/amdgpu: skip GPRs init for some CU settings on ALDEBARAN
Skip GPRs init in specific condition since current GPRs init algorithm only works for some CU settings.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:13 -04:00
Candice Li
4320e6f86d drm/amdgpu: Update TA version output in driver
TA version should only be displayed in firmware version column.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:12 -04:00
Lang Yu
a5c5d8d50e drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw()
amdgpu_fence_driver_sw_fini() should be executed before
amdgpu_device_ip_fini(), otherwise fence driver resource
won't be properly freed as adev->rings have been tore down.

Fixes: 72c8c97b15 ("drm/amdgpu: Split amdgpu_device_fini into early and late")

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:12 -04:00
Lang Yu
68df0f195a drm/amdkfd: Separate pinned BOs destruction from general routine
Currently, all kfd BOs use same destruction routine. But pinned
BOs are not unpinned properly. Separate them from general routine.

v2 (Felix):
Add safeguard to prevent user space from freeing signal BO.
Kunmap signal BO in the event of setting event page error.
Just kunmap signal BO to avoid duplicating the code.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:12 -04:00
Philip Yang
3b8a23ae52 drm/amdkfd: restore userptr ignore bad address error
The userptr can be unmapped by application and still registered to
driver, restore userptr work return user pages will get -EFAULT bad
address error. Pretend this error as succeed. GPU access this userptr
will have VM fault later, it is better than application soft hangs with
stalled user mode queues.

v2: squash in warning fix (Alex)

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:12 -04:00
Kent Russell
68daadf3d6 drm/amdgpu: Add kernel parameter support for ignoring bad page threshold
When a GPU hits the bad_page_threshold, it will not be initialized by
the amdgpu driver. This means that the table cannot be cleared, nor can
information gathering be performed (getting serial number, BDF, etc).

If the bad_page_threshold kernel parameter is set to -2,
continue to initialize the GPU, while printing a warning to dmesg that
this action has been done

v2: squash in Luben's fix to restore RAS info reporting

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:12 -04:00
Kent Russell
8483fdfea7 drm/amdgpu: Warn when bad pages approaches 90% threshold
dmesg doesn't warn when the number of bad pages approaches the
threshold for page retirement. WARN when the number of bad pages
is at 90% or greater for easier checks and planning, instead of waiting
until the GPU is full of bad pages.

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:12 -04:00
Dave Airlie
970eae1560 Linux 5.15-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmF298ceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGIJYH/1rsEFQQ6caeQdy1
 z9eFIe48DNM4l7bFk+qEj2UAbzPdahVJ299Mg5fW0n2CDemOc9/n0b9TxQ37YObi
 mOzu0xwJVupIxkyFMPQSSc2q8aLm67NSpJy08DsmaNses5hSvu8x15RPHLQTybjt
 SwtKns+jpCq79P1GWbrB5e5UkLb0VNoxNp4L1U4pMrYGcEkJUXbaxNY2V/JcXdM7
 Vtn+qN0T/J6V6QVftv0t8Ecj3bjEnmL3kZHaTaNg3dGeKRpCGyHc5lcBQ0cNFG6t
 vjZ9VbuhBzGI3TN2tHH5hpA1UXo7HPBBCwQqxF1jeGLGHULikYwZ3TAPWqL3QZqC
 9cxr9SY=
 =p75d
 -----END PGP SIGNATURE-----

BackMerge tag 'v5.15-rc7' into drm-next

The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-10-28 14:59:38 +10:00
Alex Deucher
df9feb1a69 drm/amdgpu/nbio7.4: use original HDP_FLUSH bits
The extended bits were not available for use on vega20 and
presumably arcturus as well.

Fixes: a0f9f85466 ("drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH bit 12")
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-22 10:11:41 -04:00
Alex Deucher
8cbc52c207 drm/amdgpu: Workaround harvesting info for some navy flounder boards
Some navy flounder boards do not properly mark harvested
VCN instances.  Fix that here.

v2: use IP versions

Fixes: 1b592d00b4 ("drm/amdgpu/vcn: remove manual instance setting")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-21 23:39:00 -04:00
Alex Deucher
47be978be0 drm/amdgpu/vcn3.0: remove intermediate variable
No need to use the id variable, just use the constant
plus instance offset directly.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-21 23:38:57 -04:00
Alex Deucher
7876c7ea14 drm/amdgpu/vcn2.0: remove intermediate variable
No need to use the tmp variable, just use the constant
directly.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-21 23:38:53 -04:00
Alex Deucher
c5dd5667f4 drm/amdgpu: Consolidate VCN firmware setup code
Roughly the same code was present in all VCN versions.
Consolidate it into a single function.

v2: use AMDGPU_UCODE_ID_VCN + i, check if num_inst >= 2

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
2021-10-21 23:38:46 -04:00
Alex Deucher
e8ac9e93b4 drm/amdgpu/vcn3.0: handle harvesting in firmware setup
Only enable firmware for the instance that is enabled.

v2: use AMDGPU_UCODE_ID_VCN + i

Fixes: 1b592d00b4 ("drm/amdgpu/vcn: remove manual instance setting")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743
Reviewed-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-21 23:38:41 -04:00
Jingwen Chen
e77f0f5c6a drm/amd/amdgpu: add dummy_page_addr to sriov msg
Add dummy_page_addr to sriov msg for host driver to set
GCVM_L2_PROTECTION_DEFAULT_ADDR* registers correctly.

v2:
should update vf2pf msg instead

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-21 23:38:16 -04:00
Huang Rui
a61794bd2f drm/amdgpu: remove grbm cam index/data operations for gfx v10
PSP firmware will be responsible for applying the GRBM CAM remapping in
the production. And the GRBM_CAM_INDEX / GRBM_CAM_DATA registers will be
protected by PSP under security policy. So remove it according to the
new security policy.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-21 23:38:10 -04:00
Kent Russell
dcd5ea9f94 drm/amdgpu: Clarify error when hitting bad page threshold
Change the error message when the bad_page_threshold is reached,
explicitly stating that the GPU will not be initialized.

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-20 11:43:57 -04:00
Alex Deucher
0d055f09e1 drm/amdgpu: drop navi reg init functions
No longer used since IP enumeration is driven by the IP
discovery table now.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-20 11:43:57 -04:00
Alex Deucher
bf99b9b032 drm/amdgpu: drop nv_set_ip_blocks()
No longer used since IP enumeration is now driven by
amdgpu IP discovery code.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-20 11:43:57 -04:00
Alex Deucher
7092432e3c drm/amdgpu: drop soc15_set_ip_blocks()
No longer used since IP enumeration is now driven by
amdgpu IP discovery code.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-20 11:43:57 -04:00
Alex Deucher
c9c7d18045 drm/amdgpu/gfx10: fix typo in gfx_v10_0_update_gfx_clock_gating()
Check was incorrectly converted to IP version checking.

Fixes: 4b0ad84254 ("drm/amdgpu/gfx10: convert to IP version checking")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-20 11:43:56 -04:00
Qing Wang
40320159f0 drm/amdgpu: replace snprintf in show functions with sysfs_emit
show() must not use snprintf() when formatting the value to be
returned to user space.

Fix the following coccicheck warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c:427:
WARNING: use scnprintf or sprintf.

Signed-off-by: Qing Wang <wangqing@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-20 11:43:56 -04:00
Aaron Liu
5efacdf072 drm/amdgpu: support B0&B1 external revision id for yellow carp
B0 internal rev_id is 0x01, B1 internal rev_id is 0x02.
The external rev_id for B0 and B1 is 0x20.
The original expression is not suitable for B1.

v2: squash in fix for display code (Alex)

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-20 11:43:56 -04:00
Guchun Chen
dac35c4239 drm/amdgpu/discovery: parse hw_id_name for SDMA instance 2 and 3
Otherwise, hw_id_name string is NULL for SDMA 2 and 3 when dumping
ip version from VBIOS.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-19 17:32:52 -04:00
Tao Zhou
42f88ab772 drm/amdgpu: output warning for unsupported ras error inject (v2)
Output a warning message if RAS TA returns UNSUPPORTED_ERROR_INJ status.

v2: implement it in psp_ras_ta_check_status function.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-19 17:32:52 -04:00
Tao Zhou
1b5254e8d9 drm/amdgpu: centralize checking for RAS TA status
Create new function to check status returned by RAS TA.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-19 17:32:52 -04:00
YuBiao Wang
a3848df60b drm/amd/amdgpu: Do irq_fini_hw after ip_fini_early
[Why]
drm_irq_uninstall is called in irq_fini_hw so that irq is disabled in sw
stage. SMU (and maybe other IP blocks) fini_hw will call irq_put for
cleanup and the whole cleanup process will be skipped because of
drm->irq_enable = false.

[How]
Move ip_fini_early before irq_fini_hw.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-19 17:16:50 -04:00