1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

13994 commits

Author SHA1 Message Date
Luben Tuikov
56e449603f drm/sched: Convert the GPU scheduler to variable number of run-queues
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.

Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
2023-10-26 12:03:47 -04:00
Mario Limonciello
64ffd2f1d0 drm/amd: Disable ASPM for VI w/ all Intel systems
Originally we were quirking ASPM disabled specifically for VI when
used with Alder Lake, but it appears to have problems with Rocket
Lake as well.

Like we've done in the case of dpm for newer platforms, disable
ASPM for all Intel systems.

Cc: stable@vger.kernel.org # 5.15+
Fixes: 0064b0ce85 ("drm/amd/pm: enable ASPM by default")
Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-25 09:53:17 -04:00
Dave Airlie
0ecf4aa32b amd-drm-next-6.7-2023-10-20:
amdgpu:
 - SMU 13 updates
 - UMSCH updates
 - DC MPO fixes
 - RAS updates
 - MES 11 fixes
 - Fix possible memory leaks in error pathes
 - GC 11.5 fixes
 - Kernel doc updates
 - PSP updates
 - APU IMU fixes
 - Misc code cleanups
 - SMU 11 fixes
 - OD fix
 - Frame size warning fixes
 - SR-IOV fixes
 - NBIO 7.11 updates
 - NBIO 7.7 updates
 - XGMI fixes
 - devcoredump updates
 
 amdkfd:
 - Misc code cleanups
 - SVM fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZTLX4QAKCRC93/aFa7yZ
 2ORJAP9lnoyvUPIP63Hx5TADQtZHiA+ShkATGQmDia94ABtCxwEAlo88TipxAo7c
 tRX8Mn+rix3M739FDFxV0bp7hCXsbgQ=
 =fY9I
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.7-2023-10-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.7-2023-10-20:

amdgpu:
- SMU 13 updates
- UMSCH updates
- DC MPO fixes
- RAS updates
- MES 11 fixes
- Fix possible memory leaks in error pathes
- GC 11.5 fixes
- Kernel doc updates
- PSP updates
- APU IMU fixes
- Misc code cleanups
- SMU 11 fixes
- OD fix
- Frame size warning fixes
- SR-IOV fixes
- NBIO 7.11 updates
- NBIO 7.7 updates
- XGMI fixes
- devcoredump updates

amdkfd:
- Misc code cleanups
- SVM fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020195043.4937-1-alexander.deucher@amd.com
2023-10-25 10:54:22 +10:00
Christian König
4984fc578a drm/amdkfd: reserve a fence slot while locking the BO
Looks like the KFD still needs this.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 8abc1eb298 ("drm/amdkfd: switch over to using drm_exec v3")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020123306.43978-1-christian.koenig@amd.com
2023-10-23 14:48:47 +02:00
Dave Airlie
7cd62eab9b Linux 6.6-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmU1ngkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGrsIH/0k/+gdBBYFFdEym
 foRhKir9WV3ZX4oIozJjA1f7T+qVYclKs6kaYm3gNepRBb6AoG8pdgv4MMAqhYsf
 QMe2XHi0MrO/qKBgfNfivxEa9jq+0QK5uvTbqCRqCAB8LfwVyDqapCmg3EuiZcPW
 UbMITmnwLIfXgPxvp9rabmCsTqO6FLbf0GDOVIkNSAIDBXMpcO1iffjrWUbhRa7n
 oIoiJmWJLcXLxPWDsRKbpJwzw2cIG08YhfQYAiQnC3YaeRm1FKLDIICRBsmfYzja
 rWv9r4dn4TDfV4/AnjggQnsZvz2yPCxNaFSQIT88nIeiLvyuUTJ9j8aidsSfMZQf
 xZAbzbA=
 =NoQv
 -----END PGP SIGNATURE-----

BackMerge tag 'v6.6-rc7' into drm-next

This is needed to add the msm pr which is based on a higher base.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-10-23 18:20:06 +10:00
Luben Tuikov
d3df66fd98 drm/amdgpu: Remove redundant call to priority_is_valid()
Remove a redundant call to amdgpu_ctx_priority_is_valid() from
amdgpu_ctx_priority_permit(), which is called from amdgpu_ctx_init() which is
called from amdgpu_ctx_alloc() which is called from amdgpu_ctx_ioctl(), where
we've called amdgpu_ctx_priority_is_valid() already first thing in the
function.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20231018010359.30393-1-luben.tuikov@amd.com
2023-10-21 20:27:15 -04:00
André Almeida
de009982c6 drm/amdgpu: Create version number for coredumps
Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.

Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:29 -04:00
André Almeida
69619868d3 drm/amdgpu: Move coredump code to amdgpu_reset file
Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:29 -04:00
André Almeida
2d6a2a28cd drm/amdgpu: Encapsulate all device reset info
To better organize struct amdgpu_device, keep all reset information
related fields together in a separated struct.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Shiwu Zhang
723fac64d0 drm/amdgpu: support the port num info based on the capability flag
XGMI TA will set the capability flag to indicate whether the port_num
info is supported or not. KGD checks the flag and accordingly picks up
the right buffer format and send the right command to TA to retrieve
the info.

v2: simplify the code by reusing the same statement (lijo)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Shiwu Zhang
e8a5ded36b drm/amdgpu: prepare the output buffer for GET_PEER_LINKS command
Per the xgmi ta implementation, KGD needs to fill in node_ids
in concern into the shared command output buffer rather than the
command input buffer.

Input buffer is not used for GET_PEER_LINKS command execution.

In this way, xgmi ta can reuse the node info in the output buffer
just filled in and populate the same buffer with link info directly.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Tao Zhou
d9443ac4f9 drm/amdgpu: drop status query/reset for GCEA 9.4.3 and MMEA 1.8
PMFW will be responsible for them.

v2: remove query interfaces.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Shiwu Zhang
626121fce4 drm/amdgpu: update the xgmi ta interface header
Update the header file to the v20.00.00.13

v1: rename TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO to
TA_COMMAND_XGMI__GET_TOPOLOGY_INFO

And also rename struct ta_xgmi_cmd_get_peer_link_info_output to
ta_xgmi_cmd_get_peer_link_info accordingly

v2: add structs to support xgmi GET_EXTEND_PEER_LINK command

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Tao Zhou
8096df7664 drm/amdgpu: add set/get mca debug mode operations
Record the debug mode status in RAS.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Tao Zhou
21226f02d7 drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_count
Simplify the code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Li Ma
9d7a965e22 drm/amdgpu: add clockgating support for NBIO v7.7.1
add clockgating support for NBIO ip 7.7.1

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Li Ma
fa9dd7a285 drm/amdgpu: fix missing stuff in NBIO v7.11
add get_clockgating_state, update_medium_grain_light_sleep and
update_medium_grain_clock_gating in nbio_v7_11_funcs
v1:
add missing funcs in nbio_v7_11.c
v2:
modify the if condition and add spport for nbio v7.11 clockgating.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Stanley.Yang
66d64e4e03 drm/amdgpu: Enable RAS feature by default for APU
Enable RAS feature by default for aqua vanjaram on apu
platform.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Yang Wang
49c260bef3 drm/amdgpu: fix typo for amdgpu ras error data print
typo fix.

Fixes: 5b1270beb3 ("drm/amdgpu: add ras_err_info to identify RAS error source")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Bokun Zhang
017634a68d drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P4
- In VCN 4 SRIOV code path, add code to enable RB decouple feature

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Bokun Zhang
eb9d6256b9 drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P3
- Update VCN header for RB decouple feature
- Add metadata struct, metadata will be placed after each RB

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Bokun Zhang
fc3136730b drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P2
- Add function to check if RB decouple is enabled under SRIOV

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Bokun Zhang
97b2821643 drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P1
- Update SRIOV header with RB decouple flag

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Stanley.Yang
8a65661114 drm/amdgpu: Fix delete nodes that have been relesed
Fix delete nodes that it has been freed.

Fixes: 5b1270beb3 ("drm/amdgpu: add ras_err_info to identify RAS error source")
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Hawking Zhang
f2176d7063 drm/amdgpu: Add UVD_VCPU_INT_EN2 to dpg sram
Add RAS sepcifc programming to dpg sram.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Hawking Zhang
9248462d7e drm/amdgpu: Enable software RAS in vcn v4_0_3
Set VCN/JPEG RAS masks to enable software RAS for
VCN and JPEG.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:26 -04:00
Tao Zhou
472c5fb297 drm/amdgpu: define ras_reset_error_count function
Make the code architecture more simple.

v2: reuse ras_reset_error_count in ras_reset_error_status.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:26 -04:00
Candice Li
afcf949cf3 drm/amdgpu: Log UE corrected by replay as correctable error
Support replay mode where UE could be converted to CE.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:26 -04:00
Dave Airlie
d43c76c820 Short summary of fixes pull:
amdgpu:
 - Disable AMD_CTX_PRIORITY_UNSET
 
 bridge:
 - ti-sn65dsi86: Fix device lifetime
 
 edid:
 - Add quirk for BenQ GW2765
 
 ivpu:
 - Extend address range for MMU mmap
 
 nouveau:
 - DP-connector fixes
 - Documentation fixes
 
 panel:
 - Move AUX B116XW03 into panel-simple
 
 scheduler:
 - Eliminate DRM_SCHED_PRIORITY_UNSET
 
 ttm:
 - Fix possible NULL-ptr deref in cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmUxFsoACgkQaA3BHVML
 eiNkkwf/YORvSPLN2LEqh4P8to7X92moFR7XwkA1GVofKOC3C04wDBq+Ezdy5pnw
 7T/uvEnmrnDP5Iucerly8bE5IdJGkafIB8+oBIcGhGvPgtYNGSV8NWoeA/lhBmsS
 av56+zBPpBF3s5fZa4rB/WciGRuCsLhbcYUN/LGGVBXdhLgb8LRb/seTv/Ah9Sra
 LQtFmrDNNE0+FIWuKkUsY/CQYysbMzuHYFiLumtY59lE1R5kyvlzE8OrdcqlDXhC
 HCFt0KuA+4onzdHKnge/as5T3jBJucGV9mTcgal3158n94U/ODrXEJqVD/KFKle6
 052vhpApOL3+RyDKGjXQLm/GumEc9w==
 =LJB1
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2023-10-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

amdgpu:
- Disable AMD_CTX_PRIORITY_UNSET

bridge:
- ti-sn65dsi86: Fix device lifetime

edid:
- Add quirk for BenQ GW2765

ivpu:
- Extend address range for MMU mmap

nouveau:
- DP-connector fixes
- Documentation fixes

panel:
- Move AUX B116XW03 into panel-simple

scheduler:
- Eliminate DRM_SCHED_PRIORITY_UNSET

ttm:
- Fix possible NULL-ptr deref in cleanup

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20231019114605.GA22540@linux-uq9g
2023-10-20 14:07:58 +10:00
Felix Kuehling
316baf09d3 drm/amdgpu: Reserve fences for VM update
In amdgpu_dma_buf_move_notify reserve fences for the page table updates
in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON
in dma_resv_add_fence when using SDMA for page table updates.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:56:57 -04:00
Felix Kuehling
51b79f3381 drm/amdgpu: Fix possible null pointer dereference
abo->tbo.resource may be NULL in amdgpu_vm_bo_update.

Fixes: 1802537820 ("drm/ttm: stop allocating dummy resources during BO creation")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:56:50 -04:00
Felix Kuehling
207430b76a drm/amdgpu: Reserve fences for VM update
In amdgpu_dma_buf_move_notify reserve fences for the page table updates
in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON
in dma_resv_add_fence when using SDMA for page table updates.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:52 -04:00
Felix Kuehling
e6f8588733 drm/amdgpu: Fix possible null pointer dereference
abo->tbo.resource may be NULL in amdgpu_vm_bo_update.

Fixes: 1802537820 ("drm/ttm: stop allocating dummy resources during BO creation")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:52 -04:00
Stanley.Yang
b1338a8e71 drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery
This is workaround, kiq ring test failed in suspend stage when do ras
recovery.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:52 -04:00
Mario Limonciello
e56690bb37 drm/amd: Read IMU FW version from scratch register during hw_init
If the IMU version wasn't discovered from the header, such as when
the firmware was directly loaded by PSP then there is no firmware
version to show to userspace from sysfs or IOCTL.

The IMU F/W stores the version in the first scratch register though,
so fetch it in these cases to let the driver export.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Mario Limonciello
4916615fe9 drm/amd: Don't parse IMU ucode version if it won't be loaded
When the IMU ucode is loaded by the PSP parsing the version that comes from
Linux will vary. Rather than showing the wrong data to kernel interface
consumers, avoid populating it in this case.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Mario Limonciello
d757dfd667 drm/amd: Move microcode init step to early_init()
The intention for early init is to find any missing microcode early
and fail the driver load if it's missing.  Move this step to earlier
in driver init to match other IP blocks.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Asad Kamal
d8c1925ba8 drm/amdgpu: update retry times for psp BL wait
Increase retry time for PSP BL wait, to compensate
for longer time to set c2pmsg 35 ready bit during
mode1 with RAS

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Alex Deucher
28ab9a02b6 drm/amdgpu/mes11: remove aggregated doorbell code
It's not enabled in hardware so the code is dead.
Remove it.

Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Asad Kamal
53dd920c1f drm/amdgpu : Add hive ras recovery check
If one of the devices in the hive detects a
fatal error, need to send ras recovery reset
message to PMFW of all devices in the hive.
For that add a flag in hive to indicate that
it's undergoing ras recovery

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Mangesh Gadre
2d955a06a5 Revert "drm/amdgpu: Program xcp_ctl registers as needed"
This reverts commit 0bdebfef3f.

XCP_CTL register is programmed by firmware and
register access is protected.

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:50 -04:00
Lang Yu
ab29ac57ad drm/amdgpu/umsch: add suspend and resume callback
Add missing IP callbacks.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:50 -04:00
Christian König
6b18ef481f drm/amdgpu: ignore duplicate BOs again
Looks like RADV is actually hitting this.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: ca6c1e210a ("drm/amdgpu: use the new drm_exec object for CS v3")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231017121015.1336786-1-christian.koenig@amd.com
2023-10-19 13:19:44 +02:00
Dave Airlie
27442758e9 amd-drm-next-6.7-2023-10-13:
amdgpu:
 - DC replay fixes
 - Misc code cleanups and spelling fixes
 - Documentation updates
 - RAS EEPROM Updates
 - FRU EEPROM Updates
 - IP discovery updates
 - SR-IOV fixes
 - RAS updates
 - DC PQ fixes
 - SMU 13.0.6 updates
 - GC 11.5 Support
 - NBIO 7.11 Support
 - GMC 11 Updates
 - Reset fixes
 - SMU 11.5 Updates
 - SMU 13.0 OD support
 - Use flexible arrays for bo list handling
 - W=1 Fixes
 - SubVP fixes
 - DPIA fixes
 - DCN 3.5 Support
 - Devcoredump fixes
 - VPE 6.1 support
 - VCN 4.0 Updates
 - S/G display fixes
 - DML fixes
 - DML2 Support
 - MST fixes
 - VRR fixes
 - Enable seamless boot in more cases
 - Enable content type property for HDMI
 - OLED fixes
 - Rework and clean up GPUVM TLB flushing
 - DC ODM fixes
 - DP 2.x fixes
 - AGP aperture fixes
 - SDMA firmware loading cleanups
 - Cyan Skillfish GPU clock counter fix
 - GC 11 GART fix
 - Cache GPU fault info for userspace queries
 - DC cursor check fixes
 - eDP fixes
 - DC FP handling fixes
 - Variable sized array fixes
 - SMU 13.0.x fixes
 - IB start and size alignment fixes for VCN
 - SMU 14 Support
 - Suspend and resume sequence rework
 - vkms fix
 
 amdkfd:
 - GC 11 fixes
 - GC 10 fixes
 - Doorbell fixes
 - CWSR fixes
 - SVM fixes
 - Clean up GC info enumeration
 - Rework memory limit handling
 - Coherent memory handling fixes
 - Use partial migrations in GPU faults
 - TLB flush fixes
 - DMA unmap fixes
 - GC 9.4.3 fixes
 - SQ interrupt fix
 - GTT mapping fix
 - GC 11.5 Support
 
 radeon:
 - Misc code cleanups
 - W=1 Fixes
 - Fix possible buffer overflow
 - Fix possible NULL pointer dereference
 
 UAPI:
 - Add EXT_COHERENT memory allocation flags.  These allow for system scope atomics.
   Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88
 - Add support for new VPE engine.  This is a memory to memory copy engine with advanced scaling, CSC, and color management features
   Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713
 - Add INFO IOCTL interface to query GPU faults
   Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238
   Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZSmDAQAKCRC93/aFa7yZ
 2EdeAQC2lkQ9IHLOon5kIZUK+r9IPYlgFsii+qfmMPLBaMcuwgEA8F4eJln/cc9V
 02EKhlapkggYXYa+uhOE2KTnWgMFJgI=
 =SEXq
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.7-2023-10-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.7-2023-10-13:

amdgpu:
- DC replay fixes
- Misc code cleanups and spelling fixes
- Documentation updates
- RAS EEPROM Updates
- FRU EEPROM Updates
- IP discovery updates
- SR-IOV fixes
- RAS updates
- DC PQ fixes
- SMU 13.0.6 updates
- GC 11.5 Support
- NBIO 7.11 Support
- GMC 11 Updates
- Reset fixes
- SMU 11.5 Updates
- SMU 13.0 OD support
- Use flexible arrays for bo list handling
- W=1 Fixes
- SubVP fixes
- DPIA fixes
- DCN 3.5 Support
- Devcoredump fixes
- VPE 6.1 support
- VCN 4.0 Updates
- S/G display fixes
- DML fixes
- DML2 Support
- MST fixes
- VRR fixes
- Enable seamless boot in more cases
- Enable content type property for HDMI
- OLED fixes
- Rework and clean up GPUVM TLB flushing
- DC ODM fixes
- DP 2.x fixes
- AGP aperture fixes
- SDMA firmware loading cleanups
- Cyan Skillfish GPU clock counter fix
- GC 11 GART fix
- Cache GPU fault info for userspace queries
- DC cursor check fixes
- eDP fixes
- DC FP handling fixes
- Variable sized array fixes
- SMU 13.0.x fixes
- IB start and size alignment fixes for VCN
- SMU 14 Support
- Suspend and resume sequence rework
- vkms fix

amdkfd:
- GC 11 fixes
- GC 10 fixes
- Doorbell fixes
- CWSR fixes
- SVM fixes
- Clean up GC info enumeration
- Rework memory limit handling
- Coherent memory handling fixes
- Use partial migrations in GPU faults
- TLB flush fixes
- DMA unmap fixes
- GC 9.4.3 fixes
- SQ interrupt fix
- GTT mapping fix
- GC 11.5 Support

radeon:
- Misc code cleanups
- W=1 Fixes
- Fix possible buffer overflow
- Fix possible NULL pointer dereference

UAPI:
- Add EXT_COHERENT memory allocation flags.  These allow for system scope atomics.
  Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88
- Add support for new VPE engine.  This is a memory to memory copy engine with advanced scaling, CSC, and color management features
  Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713
- Add INFO IOCTL interface to query GPU faults
  Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238
  Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231013175758.1735031-1-alexander.deucher@amd.com
2023-10-18 16:08:07 +10:00
Luben Tuikov
fa8391ad68 gpu/drm: Eliminate DRM_SCHED_PRIORITY_UNSET
Eliminate DRM_SCHED_PRIORITY_UNSET, value of -2, whose only user was
amdgpu. Furthermore, eliminate an index bug, in that when amdgpu boots, it
calls drm_sched_entity_init() with DRM_SCHED_PRIORITY_UNSET, which uses it to
index sched->sched_rq[].

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Link: https://lore.kernel.org/r/20231017035656.8211-2-luben.tuikov@amd.com
2023-10-17 20:35:38 -04:00
Luben Tuikov
eab0261967 drm/amdgpu: Unset context priority is now invalid
A context priority value of AMD_CTX_PRIORITY_UNSET is now invalid--instead of
carrying it around and passing it to the Direct Rendering Manager--and it
becomes AMD_CTX_PRIORITY_NORMAL in amdgpu_ctx_ioctl(), the gateway to context
creation.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Link: https://lore.kernel.org/r/20231017035656.8211-1-luben.tuikov@amd.com
2023-10-17 20:35:38 -04:00
Ma Ke
cd90511557 drm/amdgpu/vkms: fix a possible null pointer dereference
In amdgpu_vkms_conn_get_modes(), the return value of drm_cvt_mode()
is assigned to mode, which will lead to a NULL pointer dereference
on failure of drm_cvt_mode(). Add a check to avoid null pointer
dereference.

Signed-off-by: Ma Ke <make_ruc2021@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:36:25 -04:00
Yang Wang
3bba4bc6a0 drm/amdgpu: add RAS error info support for umc_v12_0
add RAS error info support for umc_v12_0.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:36:11 -04:00
Yang Wang
8736d17a7f drm/amdgpu: add RAS error info support for mmhub_v1_8
add RAS error info support for mmhub_v1_8.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:36:03 -04:00
Yang Wang
156c2814c2 drm/amdgpu: add RAS error info support for gfx_v9_4_3
add RAS error info support for gfx_v9_4_3.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:35:55 -04:00