1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

24380 commits

Author SHA1 Message Date
Yiqing Yao
226dcfad34 drm/amdgpu: Adjust MES polling timeout for sriov
[why]
MES response time in sriov may be longer than default value
due to reset or init in other VF. A timeout value specific
to sriov is needed.

[how]
When in sriov, adjust the timeout value to calculated
worst case scenario.

Signed-off-by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-21 16:13:26 -04:00
Kenneth Feng
09aef0258a drm/amd/pm: update driver-if header for smu_v13_0_10
update driver-if header for smu_v13_0_10 and merge with smu_v13_0_0

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-21 16:12:59 -04:00
Chengming Gui
79610d3041 drm/amdgpu: fix pstate setting issue
[WHY]
0, original pstate X
1, ctx_A_create -> ctx_A->stable_pstate = X
2, ctx_A_set_pstate (Y) -> current pstate is Y (PEAK or STANDARD)
3, ctx_B_create -> ctx_B->stable_pstate =  Y
4, ctx_A_destroy -> restore pstate to X
5, ctx_B_destroy -> restore pstate to Y
Above sequence will cause final pstate is wrong (Y), should be original X.

[HOW]
When ctx_B create,
if  ctx_A touched pstate setting
(not auto, stable_pstate_ctx != NULL),
set ctx_B->stable_pstate the same value as ctx_A saved,
if stable_pstate_ctx == NULL,
fetch current pstate to fill
ctx_B->stable_pstate.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-10-21 16:12:09 -04:00
Yiqing Yao
bb3c846ad2 drm/amdgpu: Adjust MES polling timeout for sriov
[why]
MES response time in sriov may be longer than default value
due to reset or init in other VF. A timeout value specific
to sriov is needed.

[how]
When in sriov, adjust the timeout value to calculated
worst case scenario.

Signed-off-by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-21 15:45:43 -04:00
Kenneth Feng
7e5632cdf6 drm/amd/pm: update driver-if header for smu_v13_0_10
update driver-if header for smu_v13_0_10 and merge with smu_v13_0_0

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-21 15:45:34 -04:00
Chengming Gui
8a7a5b5f23 drm/amdgpu: fix pstate setting issue
[WHY]
0, original pstate X
1, ctx_A_create -> ctx_A->stable_pstate = X
2, ctx_A_set_pstate (Y) -> current pstate is Y (PEAK or STANDARD)
3, ctx_B_create -> ctx_B->stable_pstate =  Y
4, ctx_A_destroy -> restore pstate to X
5, ctx_B_destroy -> restore pstate to Y
Above sequence will cause final pstate is wrong (Y), should be original X.

[HOW]
When ctx_B create,
if  ctx_A touched pstate setting
(not auto, stable_pstate_ctx != NULL),
set ctx_B->stable_pstate the same value as ctx_A saved,
if stable_pstate_ctx == NULL,
fetch current pstate to fill
ctx_B->stable_pstate.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-21 15:44:56 -04:00
Dave Airlie
cbc543c59e drm-misc-fixes for v6.1-rc2:
- Fix a buffer overflow in format_helper_test.
 - Set DDC pointer in drmm_connector_init.
 - Compiler fixes for panfrost.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmNRMiEACgkQ/lWMcqZw
 E8MJExAAhVv/bEcg9Ib1e2gIirtOKB3w0V20rEvkvplzsfKALMH3oXHx0cIbk2sj
 YBICPiuc2x4sDQW9xlQmPa5Gv+6aoqjSOpu+Zpc5MQAb8qnO/vxDCOlYDJwcicjP
 DdxXwJcJ48+x3FFWmrg6gcU8fmH6Rckb2BgBsF3fyzOhIMF2ME6a02S+bYuHNOxK
 sQfo1Tlbibi5pfrxHFBE9V+Wjf3ohQwxQHmcpAC6wwgtGKOLQSxkU7o+wCFUNsn1
 dlWTVe9+2xgNQenTue091PAkFP5R4T3wv654EPudyUYjybQL8ci8aFjaFTQgR3BT
 s4LsViTViTbm5OrSga2hGA+GGH2OGkW70yN+tqXPVxWyPaMA9GuoTh9xKhY3zVsB
 69HlWBzfhAoyixp0rWmilZIGAvKXn8xf+HzOxQ/ihDk6a/VsyLbEdYP72AQTRCi8
 A+h6vBNxP6AVPwDCA//hpCupG75bI8h/Plf1R/W7uzVTKXCSPAGdzROI8dxjsFAX
 Y4OR9kd8yn+bpf6G6D0q+tJV8BqUAzs5AUVkXWU5i1XaAK6fNpqh066yFQt8xCHX
 hFRcr7StYqI9ZliGP1ZEjE8nsiaqPZfDLMqSQlHl392JEfmf5uwskKwVadGHVfuC
 L+iATevGiNsY4JHHk+YBXHMSWrX9XIRNDio+LZQgFttr/KTK4ac=
 =rzib
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v6.1-rc2:
- Fix a buffer overflow in format_helper_test.
- Set DDC pointer in drmm_connector_init.
- Compiler fixes for panfrost.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c4d05683-8ebe-93b8-d24c-d1d2c68f12c4@linux.intel.com
2022-10-21 09:56:14 +10:00
Alex Deucher
50b0e4d4da drm/amdgpu: fix sdma doorbell init ordering on APUs
Commit 8795e182b0 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
uncovered a bug in amdgpu that required a reordering of the driver
init sequence to avoid accessing a special register on the GPU
before it was properly set up leading to an PCI AER error.  This
reordering uncovered a different hw programming ordering dependency
in some APUs where the SDMA doorbells need to be programmed before
the GFX doorbells. To fix this, move the SDMA doorbell programming
back into the soc15 common code, but use the actual doorbell range
values directly rather than the values stored in the ring structure
since those will not be initialized at this point.

This is a partial revert, but with the doorbell assignment
fixed so the proper doorbell index is set before it's used.

Fixes: e3163bc8ff ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: skhan@linuxfoundation.org
Cc: stable@vger.kernel.org
2022-10-20 09:35:51 -04:00
Thomas Zimmermann
1aca5ce036 Merge drm/drm-fixes into drm-misc-fixes
Backmerging to get v6.1-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2022-10-20 09:09:00 +02:00
Zack Rusin
7c99616e3f drm: Remove drm_mode_config::fb_base
The fb_base in struct drm_mode_config has been unused for a long time.
Some drivers set it and some don't leading to a very confusing state
where the variable can't be relied upon, because there's no indication
as to which driver sets it and which doesn't.

The only usage of fb_base is internal to two drivers so instead of trying
to force it into all the drivers to get it into a coherent state
completely remove it.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Thomas Zimmermann <tzimemrmann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019024401.394617-1-zack@kde.org
2022-10-19 21:46:16 -04:00
Christian König
01f2cf5384 drm/amdgpu: use DRM_SCHED_FENCE_DONT_PIPELINE for VM updates
Make sure that we always have a CPU round trip to let the submission
code correctly decide if a TLB flush is necessary or not.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-2-christian.koenig@amd.com
2022-10-19 12:45:00 +02:00
Arunpravin Paneer Selvam
8273b40486 drm/amdgpu: Fix for BO move issue
A user reported a bug on CAPE VERDE system where uvd_v3_1
IP component failed to initialize as there is an issue with
BO move code from one memory to other.

In function amdgpu_mem_visible() called by amdgpu_bo_move(),
when there are no blocks to compare or if we have a single
block then break the loop.

Fixes: 312b4dc11d ("drm/amdgpu: Fix VRAM BO swap issue")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:14:07 -04:00
YuBiao Wang
2abe92c7ad drm/amdgpu: dequeue mes scheduler during fini
[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.

[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.

v2: Move the dequeue operation inside kiq_hw_fini.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:59 -04:00
Kenneth Feng
5ce4726a13 drm/amd/pm: enable thermal alert on smu_v13_0_10
enable thermal alert on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:34 -04:00
Yifan Zha
97a3d6090f drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
[Why]
L1 blocks most of GC registers accessing by MMIO.

[How]
Use RLCG interface to program GC registers under SRIOV VF in full access time.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:24 -04:00
Nathan Chancellor
e688ba3e27 drm/amdkfd: Fix type of reset_type parameter in hqd_destroy() callback
When booting a kernel compiled with CONFIG_CFI_CLANG on a machine with
an RX 6700 XT, there is a CFI failure in kfd_destroy_mqd_cp():

  [   12.894543] CFI failure at kfd_destroy_mqd_cp+0x2a/0x40 [amdgpu] (target: hqd_destroy_v10_3+0x0/0x260 [amdgpu]; expected type: 0x8594d794)

Clang's kernel Control Flow Integrity (kCFI) makes sure that all
indirect call targets have a type that exactly matches the function
pointer prototype. In this case, hqd_destroy()'s third parameter,
reset_type, should have a type of 'uint32_t' but every implementation of
this callback has a third parameter type of 'enum kfd_preempt_type'.

Update the function pointer prototype to match reality so that there is
no more CFI violation.

Link: https://github.com/ClangBuiltLinux/linux/issues/1738
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:12 -04:00
Guenter Roeck
8a70b2d89e drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o
Building 32-bit images may fail with the following error.

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:
	In function ‘dml32_UseMinimumDCFCLK’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:3142:1:
	error: the frame size of 1096 bytes is larger than 1024 bytes

This is seen when building i386:allmodconfig with any of the following
compilers.

	gcc (Debian 12.2.0-3) 12.2.0
	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

The problem is not seen if the compiler supports GCC_PLUGIN_LATENT_ENTROPY
because in that case CONFIG_FRAME_WARN is already set to 2048 even for
32-bit builds.

dml32_UseMinimumDCFCLK() was introduced with commit dda4fb85e4
("drm/amd/display: DML changes for DCN32/321"). It declares a large
number of local variables. Increase the frame size for the affected
file to 2048, similar to other files in the same directory, to enable
32-bit build tests with affected compilers.

Fixes: dda4fb85e4 ("drm/amd/display: DML changes for DCN32/321")
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reported-by: Łukasz Bartosik <ukaszb@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:03 -04:00
Tim Huang
31c261a7ff drm/amd/pm: add SMU IP v13.0.4 IF version define to V7
The pmfw has changed the driver interface version, so keep same with the
fw.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:51 -04:00
Tim Huang
853fdb4916 drm/amd/pm: update SMU IP v13.0.4 driver interface version
Update the SMU driver interface version to V7.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:44 -04:00
ZhenGuo Yin
5fa993737b drm/amd/pm: Init pm_attr_list when dpm is disabled
[Why]
In SRIOV multi-vf, dpm is always disabled, and pm_attr_list won't
be initialized. There will be a NULL pointer call trace after
removing the dpm check condition in amdgpu_pm_sysfs_fini.
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:amdgpu_device_attr_remove_groups+0x20/0x90 [amdgpu]
Call Trace:
  <TASK>
  amdgpu_pm_sysfs_fini+0x2f/0x40 [amdgpu]
  amdgpu_device_fini_hw+0xdf/0x290 [amdgpu]

[How]
List pm_attr_list should be initialized when dpm is disabled.

Fixes: a6ad27cec5 ("drm/amd/pm: Remove redundant check condition")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:36 -04:00
Evan Quan
3059cd8c5f drm/amd/pm: disable cstate feature for gpu reset scenario
Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:20 -04:00
Evan Quan
ba2f09960e drm/amd/pm: fulfill SMU13.0.7 cstate control interface
Fulfill the functionality for cstate control.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:14 -04:00
Evan Quan
528c0e66e0 drm/amd/pm: fulfill SMU13.0.0 cstate control interface
Fulfill the functionality for cstate control.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:01 -04:00
YiPeng Chai
3bd026c3e3 drm/amdgpu: Add sriov vf ras support in amdgpu_ras_asic_supported
V2:
Add sriov vf ras support in amdgpu_ras_asic_supported.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:11:48 -04:00
YiPeng Chai
001ebcf5b9 drm/amdgpu: Enable ras support for mp0 v13_0_0 and v13_0_10
V1:
Enable ras support for CHIP_IP_DISCOVERY asic type.

V2:
1. Change commit comment.
2. Enable ras support for mp0 v13_0_0 and v13_0_10.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:11:37 -04:00
YiPeng Chai
7cd3f6c3ac drm/amdgpu: Enable gmc soft reset on gmc_v11_0_3
Enable gmc soft reset on gmc_v11_0_3.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:48 -04:00
Likun Gao
b7a76a2914 drm/amdgpu: skip mes self test for gc 11.0.3
Temporary disable mes self teset for gc 11.0.3.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:41 -04:00
Kenneth Feng
f700486cd1 drm/amd/pm: skip loading pptable from driver on secure board for smu_v13_0_10
skip loading pptable from driver on secure board since it's loaded from psp.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Guan Yu <Guan.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:31 -04:00
Kenneth Feng
657e07221c drm/amd/amdgpu: enable gfx clock gating features on smu_v13_0_10
enable gfx clock gating features on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:23 -04:00
Kenneth Feng
4c7f9a3c15 drm/amd/pm: remove the pptable id override on smu_v13_0_10
remove the pptable id override on smu_v13_0_10,
and the id is fetched from vbios now.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:17 -04:00
Kenneth Feng
4d72a4e4fb drm/amd/pm: temporarily disable thermal alert on smu_v13_0_10
temporarily disable thermal alert on smu_v13_0_10 due to kfd test fail.
will enable it again after confirming the thermal hardware setting.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:10 -04:00
Asher Song
4545ae2ed3 drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly"
This reverts commit 16fb4dca95.

Unfortunately, that commit causes fan monitors can't be read and written
properly.

Fixes: 16fb4dca95 ("drm/amdgpu: getting fan speed pwm for vega10 properly")
Signed-off-by: Asher Song <Asher.Song@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:49 -04:00
Victor Zhao
a31e62873f drm/amdgpu: Refactor mode2 reset logic for v11.0.7
- refactor mode2 on v11.0.7 to align with aldebaran
- comment out using mode2 reset as default for now, will introduce
another controller to replace previous reset_level_mask

v2: squash in unused variable removal (Alex)

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:40 -04:00
Victor Zhao
a340847b02 Revert "drm/amdgpu: let mode2 reset fallback to default when failure"
This reverts commit dac6b80818.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:33 -04:00
Victor Zhao
afbaa15501 Revert "drm/amdgpu: add debugfs amdgpu_reset_level"
This reverts commit 5bd8d53f6f.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6f ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:25 -04:00
Danijel Slivka
65f8682b9a drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.

v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT
which indicates that VF MMIO write access is not allowed in sriov runtime

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:07:58 -04:00
Rafael Mendonca
8f8033d566 drm/amdgpu/powerplay/psm: Fix memory leak in power state init
Commit 902bc65de0 ("drm/amdgpu/powerplay/psm: return an error in power
state init") made the power state init function return early in case of
failure to get an entry from the powerplay table, but it missed to clean up
the allocated memory for the current power state before returning.

Fixes: 902bc65de0 ("drm/amdgpu/powerplay/psm: return an error in power state init")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 14:53:23 -04:00
Arunpravin Paneer Selvam
df768a9770 drm/amdgpu: Fix for BO move issue
A user reported a bug on CAPE VERDE system where uvd_v3_1
IP component failed to initialize as there is an issue with
BO move code from one memory to other.

In function amdgpu_mem_visible() called by amdgpu_bo_move(),
when there are no blocks to compare or if we have a single
block then break the loop.

Fixes: 312b4dc11d ("drm/amdgpu: Fix VRAM BO swap issue")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 14:52:49 -04:00
YuBiao Wang
c4dfad81e4 drm/amdgpu: dequeue mes scheduler during fini
[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.

[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.

v2: Move the dequeue operation inside kiq_hw_fini.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 14:52:40 -04:00
Kenneth Feng
c520ba3fad drm/amd/pm: enable thermal alert on smu_v13_0_10
enable thermal alert on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 14:52:24 -04:00
Yifan Zha
7e2c58320e drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
[Why]
L1 blocks most of GC registers accessing by MMIO.

[How]
Use RLCG interface to program GC registers under SRIOV VF in full access time.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 14:52:06 -04:00
Hamza Mahfooz
b3372fa74d drm/amd/display: add an ASSERT() to irq service functions
Currently, if we encounter unimplemented functions, it is difficult to
tell what caused them just by looking at dmesg and that is compounded by
the fact that it is often hard to reproduce said issues, for instance we
have had reports of this condition being triggered when removing a
secondary display that is setup in mirror mode and is connected using
usb-c. So, to have access to more detailed debugging information, add an
ASSERT() to dal_irq_service_ack() and dal_irq_service_set() that only
triggers when we encounter an unimplemented function.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:21 -04:00
Nathan Chancellor
e299b00adf drm/amdkfd: Fix type of reset_type parameter in hqd_destroy() callback
When booting a kernel compiled with CONFIG_CFI_CLANG on a machine with
an RX 6700 XT, there is a CFI failure in kfd_destroy_mqd_cp():

  [   12.894543] CFI failure at kfd_destroy_mqd_cp+0x2a/0x40 [amdgpu] (target: hqd_destroy_v10_3+0x0/0x260 [amdgpu]; expected type: 0x8594d794)

Clang's kernel Control Flow Integrity (kCFI) makes sure that all
indirect call targets have a type that exactly matches the function
pointer prototype. In this case, hqd_destroy()'s third parameter,
reset_type, should have a type of 'uint32_t' but every implementation of
this callback has a third parameter type of 'enum kfd_preempt_type'.

Update the function pointer prototype to match reality so that there is
no more CFI violation.

Link: https://github.com/ClangBuiltLinux/linux/issues/1738
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:21 -04:00
Fabio M. De Francesco
a2c554262d drm/amd/amdgpu: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page().

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and are still valid.

Since its use in amdgpu/amdgpu_ttm.c is safe, it should be preferred.

Therefore, replace kmap() with kmap_local_page() in amdgpu/amdgpu_ttm.c.

Suggested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:21 -04:00
Guenter Roeck
45950d8870 drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o
Building 32-bit images may fail with the following error.

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:
	In function ‘dml32_UseMinimumDCFCLK’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:3142:1:
	error: the frame size of 1096 bytes is larger than 1024 bytes

This is seen when building i386:allmodconfig with any of the following
compilers.

	gcc (Debian 12.2.0-3) 12.2.0
	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

The problem is not seen if the compiler supports GCC_PLUGIN_LATENT_ENTROPY
because in that case CONFIG_FRAME_WARN is already set to 2048 even for
32-bit builds.

dml32_UseMinimumDCFCLK() was introduced with commit dda4fb85e4
("drm/amd/display: DML changes for DCN32/321"). It declares a large
number of local variables. Increase the frame size for the affected
file to 2048, similar to other files in the same directory, to enable
32-bit build tests with affected compilers.

Fixes: dda4fb85e4 ("drm/amd/display: DML changes for DCN32/321")
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reported-by: Łukasz Bartosik <ukaszb@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:21 -04:00
Hawking Zhang
6c0ca74820 drm/amdgpu: move convert_error_address out of umc_ras
RAS error address translation algorithm is common
across dGPU and A + A platform as along as the SOC
integrates the same generation of UMC IP.

UMC RAS is managed by x86 MCA on A + A platform,
umc_ras in GPU driver is not initialized at all on
A + A platform. In such case, any umc_ras callback
implemented for dGPU config shouldn't be invoked
from A + A specific callback.

The change moves convert_error_address out of dGPU
umc_ras structure and makes it share between A + A
and dGPU config.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:21 -04:00
Tim Huang
027bf0cee8 drm/amd/pm: add SMU IP v13.0.4 IF version define to V7
The pmfw has changed the driver interface version, so keep same with the
fw.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-10-17 17:41:21 -04:00
Tim Huang
d1bb3afc05 drm/amd/pm: update SMU IP v13.0.4 driver interface version
Update the SMU driver interface version to V7.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-10-17 17:41:21 -04:00
ZhenGuo Yin
5af392a89b drm/amd/pm: Init pm_attr_list when dpm is disabled
[Why]
In SRIOV multi-vf, dpm is always disabled, and pm_attr_list won't
be initialized. There will be a NULL pointer call trace after
removing the dpm check condition in amdgpu_pm_sysfs_fini.
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:amdgpu_device_attr_remove_groups+0x20/0x90 [amdgpu]
Call Trace:
  <TASK>
  amdgpu_pm_sysfs_fini+0x2f/0x40 [amdgpu]
  amdgpu_device_fini_hw+0xdf/0x290 [amdgpu]

[How]
List pm_attr_list should be initialized when dpm is disabled.

Fixes: a6ad27cec5 ("drm/amd/pm: Remove redundant check condition")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:21 -04:00
Evan Quan
b31d6ada83 drm/amd/pm: disable cstate feature for gpu reset scenario
Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-10-17 17:41:21 -04:00