1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

665 commits

Author SHA1 Message Date
Jim Cromie
f158936b60 drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.
Use DECLARE_DYNDBG_CLASSMAP across DRM:

 - in .c files, since macro defines/initializes a record

 - in drivers, $mod_{drv,drm,param}.c
   ie where param setup is done, since a classmap is param related

 - in drm/drm_print.c
   since existing __drm_debug param is defined there,
   and we ifdef it, and provide an elaborated alternative.

 - in drm_*_helper modules:
   dp/drm_dp - 1st item in makefile target
   drivers/gpu/drm/drm_crtc_helper.c - random pick iirc.

Since these modules all use identical CLASSMAP declarations (ie: names
and .class_id's) they will all respond together to "class DRM_UT_*"
query-commands:

  :#> echo class DRM_UT_KMS +p > /proc/dynamic_debug/control

NOTES:

This changes __drm_debug from int to ulong, so BIT() is usable on it.

DRM's enum drm_debug_category values need to sync with the index of
their respective class-names here.  Then .class_id == category, and
dyndbg's class FOO mechanisms will enable drm_dbg(DRM_UT_KMS, ...).

Though DRM needs consistent categories across all modules, thats not
generally needed; modules X and Y could define FOO differently (ie a
different NAME => class_id mapping), changes are made according to
each module's private class-map.

No callsites are actually selected by this patch, since none are
class'd yet.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220912052852.1123868-3-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-24 15:02:01 +02:00
Christian König
6974340554 drm/amdgpu: bump minor for gang submit
Since that has now landed bump the minor to let userspace know about it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:42:14 -04:00
YiPeng Chai
83d29a5f8a drm/amdgpu: Fixed psp fence and memory issues when removing amdgpu device
V3:
Fixed psp fence and memory issues for the asic
using smu v13_0_2 when removing amdgpu device.

[Why]:
1. psp_suspend->psp_free_shared_bufs->
       psp_ta_free_shared_buf->
           amdgpu_bo_free_kernel->
             ...->amdgpu_bo_release_notify->
                    amdgpu_fill_buffer
   psp will free vram memory used by psp when psp_suspend
   is called. But for the asic using smu v13_0_2, because
   psp_suspend is called before adev->shutdown is set to
   true when removing the first hive device, amdgpu fill_buffer
   will be called, which will cause fence issues when evicting
   all vram resources in amdgpu vram mgr_fini.
2. Since psp_hw_fini is not called after calling psp_suspend
   and psp_suspend only calls psp_ring_stop, the psp ring memory
   will not be released when amdgpu device is removed.

[How]:
1. Set shutdown to true before calling amdgpu_device_gpu_recover,
   then amdgpu_fill_buffer will not be called when psp_suspend is
   called.
2. Free psp ring memory in psp_sw_fini.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:17:47 -04:00
YiPeng Chai
f5c7e77970 drm/amdgpu: Adjust removal control flow for smu v13_0_2
Adjust removal control flow for smu v13_0_2:
   During amdgpu uninstallation, when removing the first
device, the kernel needs to first send a mode1reset message
to all gpu devices. Otherwise, smu initialization will fail
the next time amdgpu is installed.

V2:
1. Update commit comments.
2. Remove the global variable amdgpu_device_remove_cnt
   and add a variable to the structure amdgpu_hive_info.
3. Use hive to detect the first removed device instead of
   a global variable.

V3:
 1. Update commit comments.
 2. Split a patch into multiple patches.
 3. The current patch does:
    a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into
       the existing gpu recover path, which make all devices
       in hive list only have HW reset but no resume (except
       the base IP).
    b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and
       AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover
       in amdgpu_pci_remove when removing the first device in
       hive list.
    c. When removing the first device, the IP blocks keyword
       function call sequence is as follows:
.suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini.
   ^                           |
   |-<----------<---------<----|
	The first three sequences are because of a call to
        amdgpu_device_gpu_recover. The three sequences will be
        executed in a loop until all devices in the hive list
        are iterated.
        The sequences starting from .hw_fini only apply to the
        first device. Since .suspend has been called before,
        except the resumed phase1 basic ip blocks, all other ip
        blocks .hw_fini of current device will do nothing.
     d. When removing other devices, the calling sequences is the
        same as legacy:
	   .hw_fini -> .early_fini -> .sw_fini.
	Since .suspend has been called when removing the first device,
        except the resumed phase1 basic ip blocks, all of other ip
        blocks .hw_fini of current device will do nothing.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:17:20 -04:00
YiPeng Chai
fac53471d0 drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled
V1:
  The psp_cmd_submit_buf function is called by psp_hw_fini to send
TA unload messages to psp to terminate ras, asd and tmr. But when
amdgpu is uninstalled, drm_dev_unplug is called earlier than
psp_hw_fini in amdgpu_pci_remove, the calling order as follows:
static void amdgpu_pci_remove(struct pci_dev *pdev) {
	drm_dev_unplug
	......
	amdgpu_driver_unload_kms->amdgpu_device_fini_hw->...
		->.hw_fini->psp_hw_fini->...
		->psp_ta_unload->psp_cmd_submit_buf
	......
}
The program will return when calling drm_dev_enter in psp_cmd_submit_buf.

So the call to drm_dev_enter in psp_cmd_submit_buf should be
removed, so that the TA unload messages can be sent to the psp
when amdgpu is uninstalled.

V2:
1. Restore psp_cmd_submit_buf to its original code.
2. Move drm_dev_unplug call after amdgpu_driver_unload_kms in
   amdgpu_pci_remove.
3. Since amdgpu_device_fini_hw is called by amdgpu_driver_unload_kms,
   remove the unplug check to release device mmio resource in
   amdgpu_device_fini_hw before calling drm_dev_unplug.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-07 22:21:00 -04:00
Yang Yingliang
6b11af6d1c drm/amdgpu: add missing pci_disable_device() in amdgpu_pmops_runtime_resume()
Add missing pci_disable_device() if amdgpu_device_resume() fails.

Fixes: 8e4d5d43cc ("drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29 17:44:15 -04:00
Leo Li
792a0cdde3 drm/amd/display: Add visualconfirm module parameter
[Why]

Being able to configure visual confirm at boot or in cmdline is helpful
when debugging.

[How]

Add a module parameter to configure DC visual confirm, which works the
same way as the equivalent debugfs entry.

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:03 -04:00
Alex Deucher
465576ca48 drm/amdgpu: bump driver version for IP discovery info in HW INFO
So userspace knows when it is available.

Proposed mesa patch:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17411/diffs?commit_id=c8a63590dfd0d64e6e6a634dcfed993f135dd075

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:03 -04:00
Jason Wang
c19a23fadd drm/amdgpu: Fix comment typo
The double `to' is duplicated in the comment, remove one.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:03 -04:00
Guchun Chen
9c913f3803 drm/amdgpu: drop runpm from amdgpu_device structure
It's redundant, as now switching to rpm_mode to indicate
runtime power management mode.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:39 -04:00
Ramesh Errabolu
08a2fd23c6 drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs
Add support for peer-to-peer communication among AMD GPUs over PCIe
bus. Support REQUIRES enablement of config HSA_AMD_P2P.

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08 11:40:12 -04:00
Christian König
08cffb3eb7 drm/amdgpu: bump minor version number
Increase the minor version number to indicate that the new flags are
available.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-26 14:56:34 -04:00
Alex Deucher
62e9bd2003 drm/amdgpu: add beige goby PCI ID
Add a beige goby PCI ID.

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-05-26 14:56:33 -04:00
Mario Limonciello
0223e51647 drm/amd: Don't reset dGPUs if the system is going to s2idle
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC
reset for handling aborted suspend can't work with s2idle.

This functionality was introduced in commit daf8de0874 ("drm/amdgpu:
always reset the asic in suspend (v2)").  A few other commits have
gone on top of the ASIC reset, but this still doesn't work on the A+A
configuration in s2idle.

Avoid doing the reset on dGPUs specifically when using s2idle.

Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-18 15:20:18 -04:00
Alex Deucher
5a90c24ad0 Revert "drm/amdgpu: disable runpm if we are the primary adapter"
This reverts commit b95dc06af3.

This workaround is no longer necessary.  We have a better workaround
in commit f95af4a923 ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)").

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-05 16:50:00 -04:00
Chengming Gui
a76be7bbc3 drm/amd/amdgpu: add more fw load type to fit new ASICs
Align exported fw load types with internal used.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 10:43:54 -04:00
Jack Xiao
928fe236c0 drm/amdgpu: add mes_kiq module parameter v2
mes_kiq parameter is used to enable mes kiq pipe.
This module parameter is unneccessary or enabled by default
in final version.

v2: reword commit message.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 10:43:49 -04:00
Alex Deucher
4020c22802 drm/amdgpu: don't runtime suspend if there are displays attached (v3)
We normally runtime suspend when there are displays attached if they
are in the DPMS off state, however, if something wakes the GPU
we send a hotplug event on resume (in case any displays were connected
while the GPU was in suspend) which can cause userspace to light
up the displays again soon after they were turned off.

Prior to
commit 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."),
the driver took a runtime pm reference when the fbdev emulation was
enabled because we didn't implement proper shadowing support for
vram access when the device was off so the device never runtime
suspended when there was a console bound.  Once that commit landed,
we now utilize the core fb helper implementation which properly
handles the emulation, so runtime pm now suspends in cases where it did
not before.  Ultimately, we need to sort out why runtime suspend in not
working in this case for some users, but this should restore similar
behavior to before.

v2: move check into runtime_suspend
v3: wake ups -> wakeups in comment, retain pm_runtime behavior in
    runtime_idle callback

Fixes: 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Link: https://lore.kernel.org/r/20220403132322.51c90903@darkstar.example.org/
Tested-by: Michele Ballabio <ballabio.m@gmail.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21 15:59:37 -04:00
Evan Quan
25faeddcf3 drm/amdgpu: expand cg_flags from u32 to u64
With this, we can support more CG flags.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-08 17:24:24 -04:00
Kai-Heng Feng
9e051720f9 drm/amdgpu: Ensure HDA function is suspended before ASIC reset
DP/HDMI audio on AMD PRO VII stops working after S3:
[  149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
[  149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
[  149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
[  149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot
[  150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot
...
[  155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535

The offending commit is daf8de0874 ("drm/amdgpu: always reset the asic in
suspend (v2)"). Commit 34452ac303 ("drm/amdgpu: don't use BACO for
reset in S3 ") doesn't help, so the issue is something different.

Assuming that to make HDA resume to D0 fully realized, it needs to be
successfully put to D3 first. And this guesswork proves working, by
moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
function is in D3.

Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-07 16:35:48 -04:00
Tushar Patel
b7dfbd2e60 drm/amdkfd: Fix Incorrect VMIDs passed to HWS
Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS

v2: squash in warning fix (Alex)

Signed-off-by: Tushar Patel <tushar.patel@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25 12:40:25 -04:00
Ruijing Dong
11eb648d01 drm/amdgpu/vcn: Add vcn firmware log
vcn fwlog is for debugging purpose only,
by default, it is disabled.

Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Andrey Grodzovsky
5aa061474b drm/amdgpu: Bump minor version for hot plug tests enabling.
This will allow to enable the tests only after latest fix
after which the tests passed on my system.

I tested on NV21 standalone and Vega 10 and Polaris as
pair with DRI_PRIME.

It's possible there might be still issues on ASICs i don't
have at my posession but that that the point of enbling
the tests finally - if other people during testing will
encounter errors they will report and I will be able to fix.

The releated merge request for enabling libdrm tests suite  is in
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/227

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Dave Airlie
38a15ad948 Merge tag 'amd-drm-next-5.18-2022-02-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.18-2022-02-25:

amdgpu:
- Raven2 suspend/resume fix
- SDMA 5.2.6 updates
- VCN 3.1.2 updates
- SMU 13.0.5 updates
- DCN 3.1.5 updates
- Virtual display fixes
- SMU code cleanup
- Harvest fixes
- Expose benchmark tests via debugfs
- Drop no longer relevant gart aperture tests
- More RAS restructuring
- W=1 fixes
- PSR rework
- DP/VGA adapter fixes
- DP MST fixes
- GPUVM eviction fix
- GPU reset debugfs register dumping support
- Misc display fixes
- SR-IOV fix
- Aldebaran mGPU fix
- Add module parameter to disable XGMI for testing

amdkfd:
- IH ring overflow logging fixes
- CRIU fixes
- Misc fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225183535.5907-1-alexander.deucher@amd.com
2022-03-01 16:19:02 +10:00
Dave Airlie
6c64ae228f Linux 5.17-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmIb/PEeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGugMH/R8icwG5gOjAuxBr
 fuz9032ECVFS36Cy3ps9Eqgf5EdS4G6yz3joM4aNtp4B7e5FzI9lzBaHS8OAguNL
 y7puFtBr9CywsnniJumZzciB9pEHmF/yyEKfMlRZA3JsRfLDacFstETp+duJnXoA
 +s49IWsy1ot5zoherhDXcFLqDoFhLVU4hYwE1xpLpW/cllqmSnW8SDZMtWW1Ui/B
 Of7zqINR4zBMmRjP4ymGq/ZrPWlFyWLdtOo0xxVoAQkeMgm33kfaaeGzfyK25FqR
 JDGUZ7lkKvwz3PYh2hqJ7dc5K+vhJ18I+F1UmOiL6QAAUF/k8jq7i3Qaf6MKqh2Z
 2v+A5k0=
 =Hiar
 -----END PGP SIGNATURE-----

Backmerge tag 'v5.17-rc6' into drm-next

This backmerges v5.17-rc6 so I can merge some amdgpu and some tegra changes on top.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-02-28 14:57:14 +10:00
Andrey Grodzovsky
2656fd230d drm/amdgpu: Exclude PCI reset method for now.
According to my investigation of the state of PCI
reset recently it's not working. The reason is
due to the fact the kernel PCI code rejects SBR
when there are more then one PF under same bridge
which we always have (at least AUDIO PF but usually
more) and that because SBR will reset all the PFS
and devices under the same bridge as you and you
cannot assume they support SBR.
Once we anble FLR support we can reenable this option as
FLR is doable on single PF and doens't have this
restriction.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-24 17:25:20 -05:00
Alex Sierra
158a05a0b8 drm/amdgpu: Add use_xgmi_p2p module parameter
This parameter controls xGMI p2p communication, which is enabled by
default. However, it can be disabled by setting it to 0. In case xGMI
p2p is disabled in a dGPU, PCIe p2p interface will be used instead.
This parameter is ignored in GPUs that do not support xGMI
p2p configuration.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-24 17:25:12 -05:00
Mario Limonciello
7294863a6f drm/amd: Check if ASPM is enabled from PCIe subsystem
commit 0064b0ce85 ("drm/amd/pm: enable ASPM by default") enabled ASPM
by default but a variety of hardware configurations it turns out that this
caused a regression.

* PPC64LE hardware does not support ASPM at a hardware level.
  CONFIG_PCIEASPM is often disabled on these architectures.
* Some dGPUs on ALD platforms don't work with ASPM enabled and PCIe subsystem
  disables it

Check with the PCIe subsystem to see that ASPM has been enabled
or not.

Fixes: 0064b0ce85 ("drm/amd/pm: enable ASPM by default")
Link: https://wiki.raptorcs.com/w/images/a/ad/P9_PHB_version1.0_27July2018_pub.pdf
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1723
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1739
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1907
Tested-by: koba.ko@canonical.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-02-23 16:31:06 -05:00
Alex Deucher
b784f42cf7 drm/amdgpu: drop testing module parameter
This test is not particularly useful now that GTT and GART
are decoupled in the driver.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23 14:02:51 -05:00
Alex Deucher
0b1a63487b drm/amdgpu: drop benchmark module parameter
Now that we expose the benchmarks via debugfs, there is no
longer a need for the module parameter.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23 14:02:51 -05:00
Paul Menzel
cec2cc7b1c drm/amdgpu: Fix typo in *whether* in comment
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-22 14:52:39 -05:00
Mario Limonciello
0ab5d711ec drm/amd: Refactor amdgpu_aspm to be evaluated per device
Evaluating `pcie_aspm_enabled` as part of driver probe has the implication
that if one PCIe bridge with an AMD GPU connected doesn't support ASPM
then none of them do.  This is an invalid assumption as the PCIe core will
configure ASPM for individual PCIe bridges.

Create a new helper function that can be called by individual dGPUs to
react to the `amdgpu_aspm` module parameter without having negative results
for other dGPUs on the PCIe bus.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17 15:59:05 -05:00
Mario Limonciello
cba07cce39 drm/amd: Check if ASPM is enabled from PCIe subsystem
commit 0064b0ce85 ("drm/amd/pm: enable ASPM by default") enabled ASPM
by default but a variety of hardware configurations it turns out that this
caused a regression.

* PPC64LE hardware does not support ASPM at a hardware level.
  CONFIG_PCIEASPM is often disabled on these architectures.
* Some dGPUs on ALD platforms don't work with ASPM enabled and PCIe subsystem
  disables it

Check with the PCIe subsystem to see that ASPM has been enabled
or not.

Fixes: 0064b0ce85 ("drm/amd/pm: enable ASPM by default")
Link: https://wiki.raptorcs.com/w/images/a/ad/P9_PHB_version1.0_27July2018_pub.pdf
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1723
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1739
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1907
Tested-by: koba.ko@canonical.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17 15:59:05 -05:00
Alex Deucher
dfcc3e8c24 drm/amdgpu: make cyan skillfish support code more consistent
Since this is an existing asic, adjust the code to follow
the same logic as previously so the driver state is consistent.

No functional change intended.

Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-16 17:30:02 -05:00
Dave Airlie
123db17ddf Merge tag 'amd-drm-next-5.18-2022-02-11-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.18-2022-02-11-1:

amdgpu:
- Clean up of power management code
- Enable freesync video mode by default
- Clean up of RAS code
- Improve VRAM access for debug using SDMA
- Coding style cleanups
- SR-IOV fixes
- More display FP reorg
- TLB flush fixes for Arcuturus, Vega20
- Misc display fixes
- Rework special register access methods for SR-IOV
- DP2 fixes
- DP tunneling fixes
- DSC fixes
- More IP discovery cleanups
- Misc RAS fixes
- Enable both SMU i2c buses where applicable
- s2idle improvements
- DPCS header cleanup
- Add new CAP firmware support for SR-IOV

amdkfd:
- Misc cleanups
- SVM fixes
- CRIU support
- Clean up MQD manager

UAPI:
- Add interface to amdgpu CTX ioctl to request a stable power state for profiling
  https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/207
- Add amdkfd support for CRIU
  https://github.com/checkpoint-restore/criu/pull/1709
- Remove old unused amdkfd debugger interface
  Was only implemented for Kaveri and was only ever used by an old HSA tool that was never open sourced

radeon:
- Fix error handling in radeon_driver_open_kms
- UVD suspend fix
- Misc fixes

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220211220706.5803-1-alexander.deucher@amd.com
2022-02-14 10:31:51 +10:00
Alex Deucher
3786a9bc04 drm/amdgpu: drop experimental flag on aldebaran
These have been at production level for a while. Drop
the flag.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 18:03:50 -05:00
Mario Limonciello
e55a3aea41 drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
dGPUs connected to Intel systems configured for suspend to idle
will not have the power rails cut at suspend and resetting the GPU
may lead to problematic behaviors.

Fixes: e25443d276 ("drm/amdgpu: add a dev_pm_ops prepare callback (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1879
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-02 18:35:00 -05:00
Mario Limonciello
04ef860469 drm/amd: Only run s3 or s0ix if system is configured properly
This will cause misconfigured systems to not run the GPU suspend
routines.

* In APUs that are properly configured system will go into s2idle.
* In APUs that are intended to be S3 but user selects
  s2idle the GPU will stay fully powered for the suspend.
* In APUs that are intended to be s2idle and system misconfigured
  the GPU will stay fully powered for the suspend.
* In systems that are intended to be s2idle, but AMD dGPU is also
  present, the dGPU will go through S3

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-02 18:35:00 -05:00
Mario Limonciello
9308a49d8e drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
dGPUs connected to Intel systems configured for suspend to idle
will not have the power rails cut at suspend and resetting the GPU
may lead to problematic behaviors.

Fixes: e25443d276 ("drm/amdgpu: add a dev_pm_ops prepare callback (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1879
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-02 18:27:01 -05:00
Mario Limonciello
d2a197a45d drm/amd: Only run s3 or s0ix if system is configured properly
This will cause misconfigured systems to not run the GPU suspend
routines.

* In APUs that are properly configured system will go into s2idle.
* In APUs that are intended to be S3 but user selects
  s2idle the GPU will stay fully powered for the suspend.
* In APUs that are intended to be s2idle and system misconfigured
  the GPU will stay fully powered for the suspend.
* In systems that are intended to be s2idle, but AMD dGPU is also
  present, the dGPU will go through S3

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-02 18:26:30 -05:00
Alex Deucher
ded81d5b2b drm/amdgpu: bump driver version for new CTX OP to set/get stable pstates
So mesa and tools know when this is available.

Mesa MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/207

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-27 15:50:14 -05:00
Alex Deucher
243c719e87 drm/amdgpu: handle BACO synchronization with secondary funcs
Extend secondary function handling for runtime pm beyond audio
to USB and UCSI.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:36 -05:00
Alex Deucher
d0d66b8c66 drm/amdgpu: move runtime pm init after drm and fbdev init
Seems more logical to enable runtime pm at the end of
the init sequence so we don't end up entering runtime
suspend before init is finished.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:36 -05:00
Alex Deucher
25c6aefcee drm/amdgpu: filter out radeon secondary ids as well
Older radeon boards (r2xx-r5xx) had secondary PCI functions
which we solely there for supporting multi-head on OSs with
special requirements.  Add them to the unsupported list
as well so we don't attempt to bind to them.  The driver
would fail to bind to them anyway, but this does so
in a cleaner way that should not confuse the user.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:32 -05:00
Alex Deucher
9e5a14bce2 drm/amdgpu: filter out radeon secondary ids as well
Older radeon boards (r2xx-r5xx) had secondary PCI functions
which we solely there for supporting multi-head on OSs with
special requirements.  Add them to the unsupported list
as well so we don't attempt to bind to them.  The driver
would fail to bind to them anyway, but this does so
in a cleaner way that should not confuse the user.

Cc: stable@vger.kernel.org
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 17:50:13 -05:00
Linus Torvalds
c2c94b3b18 drm fixes for 5.17-rc1
amdgpu:
 - SR-IOV fix
 - VCN harvest fix
 - Suspend/resume fixes
 - Tahiti fix
 - Enable GPU recovery on yellow carp
 
 radeon:
 - Fix error handling regression in radeon_driver_open_kms
 
 i915:
 - Update EHL display voltage swing table
 - Fix programming the ADL-P display TC voltage swing
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmHp+xAACgkQDHTzWXnE
 hr47LQ//ZeXWUZSOxFiYa8mzRkUuBWCihj7xdGiKlHiSBz6FaGiiaMutqorG9V3O
 ktQKji16Q48vvvLZmRecigrZ3maOtisAgNWgdlKT1XbgMnVCmXcbhb57mNbLC2D/
 HcV6b5wKvLmTpNMyto6gRlPXyDMgczP76ChqyHb+MdUZfXEmAh6yAeP06sR9KaG6
 XF17SMI+KB9OLnnRrwg+ws+Lh6KCHZYVA8LGAapTTGUbn8yAS49/JrE2QjKTCDZo
 1v2i77dblnxHNvI4kPlrDJEndwa+VJdUoqseZTyRwwVBm3vrggNLvkclzCRH9AuI
 61p8RW6+w0xqfM73+5B+HEFb8dpVkts+E6JdYL9ZkQ+5/Hz1EamBDqKcZKd5f6Yd
 DC7yit07rzRPEV/YvAnJV0AMxLKy8RKjbxfB7Q6SapCENVp9kGc8mGJa5nlfbGBh
 3dz1Moop8/tiqf2WRYOY5yotcXBxySDKFzrW9QDABqBb8m8UVbsW9EO4iL+0fhvW
 hosbPWop6CvsvT2QSyHhpeVPhpkZwNmwPzrrONzjf+K6Q7jm9fDYqbbmFkQMrGeL
 c93Ii4OQRjSok/dKTWIH+YCPdQF9bmwtjae8ul6CDkWniBW/p0u5T9fXD2ylUGxW
 D0F0NPcV4G1S/MsrFzAmJXJE7n4Fjd39nnIRiOMg4d4cdRkAIUQ=
 =cNda
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-01-21' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Thanks to Daniel for taking care of things while I was out, just a set
  of merge window fixes that came in this week, two i915 display fixes
  and a bunch of misc amdgpu, along with a radeon regression fix.

  amdgpu:
   - SR-IOV fix
   - VCN harvest fix
   - Suspend/resume fixes
   - Tahiti fix
   - Enable GPU recovery on yellow carp

  radeon:
   - Fix error handling regression in radeon_driver_open_kms

  i915:
   - Update EHL display voltage swing table
   - Fix programming the ADL-P display TC voltage swing"

* tag 'drm-next-2022-01-21' of git://anongit.freedesktop.org/drm/drm:
  drm/radeon: fix error handling in radeon_driver_open_kms
  drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV
  drm/amdgpu: apply vcn harvest quirk
  drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence
  drm/i915/display/ehl: Update voltage swing table
  drm/amd/display: Revert W/A for hard hangs on DCN20/DCN21
  drm/amdgpu: drop flags check for CHIP_IP_DISCOVERY
  drm/amdgpu: Fix rejecting Tahiti GPUs
  drm/amdgpu: don't do resets on APUs which don't support it
  drm/amdgpu: invert the logic in amdgpu_device_should_recover_gpu()
  drm/amdgpu: Enable recovery on yellow carp
2022-01-21 09:25:38 +02:00
Linus Torvalds
59d41458f1 drm fixes for 5.17-rc1:
drivers fixes:
 - i915 fixes for ttm backend + one pm wakelock fix
 - amdgpu fixes, fairly big pile of small things all over. Note this
   doesn't yet containe the fixed version of the otg sync patch that
   blew up
 - small driver fixes: meson, sun4i, vga16fb probe fix
 
 drm core fixes:
 - cma-buf heap locking
 - ttm compilation
 - self refresh helper state check
 - wrong error message in atomic helpers
 - mipi-dbi buffer mapping
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmHhrxUACgkQTA9ye/CY
 qnEowxAAgBPwGEobRGMbR3Me98vEKvcWqSxBe/k1VC4LhO5DrvbG5iW9cuxCJZM2
 wlGlGAtU7C7pcCP5Xp1UlMqZ5a0rSVhqMPPkMKO9+7033ofSlAQatnMI1EENH6Hn
 BkhXwTyuOBSN6zqskg8FKqzF+VPTt5ZV2U5qJzQweP/wFtZPAKI4tWE4oKiHactH
 fJHnAi7T6ytF6a7J21BsSEluk4z7BjmcmFF0tW6iuq7Y6TXDFXFq9QFDR041b2rI
 GYDUXl2mebp/L+2M3sPYuMiIiyJ8enh7crNIdmi+EstmzRADa7RMjnY3j2tJg/7M
 pqnuJZAVcpkCurb7NMr3ycmrxnhfUsZfbuXvm+k5yJYfQCaGNiKy1ObsFWH9zBDz
 XMuxcE+csSaX/7rjoyXrL2ZTRPXnVwJNJ8x1CuKn3giLxMSqnPnDMjyHmNLB8qa1
 R0wbPQbdx5+jWgs/ngUGFNo4vFBnNmqQP4G3LaWJ/Ku5cSrEM+Jt9GJOw5jjVgun
 gaOlKlUpMBlKmXOwkvhRW2AwHRcL7lrBuIw0oFOThdMzkSNlKSNBMpAkgvD/9C7g
 IDtJgA7a3MqzLjhPLmhUB3rFyXz5dg5rNZhoH8z4DFiJVpTKYYA5/UoU/RTXZGqW
 yFdrBkFmNJeKTZZAe+e/4OP/dm1vKVEKK6Ko/M9CrELzBKSussg=
 =h74X
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-01-14' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
"drivers fixes:

   - i915 fixes for ttm backend + one pm wakelock fix

   - amdgpu fixes, fairly big pile of small things all over. Note this
     doesn't yet containe the fixed version of the otg sync patch that
     blew up

   - small driver fixes: meson, sun4i, vga16fb probe fix

  drm core fixes:

   - cma-buf heap locking

   - ttm compilation

   - self refresh helper state check

   - wrong error message in atomic helpers

   - mipi-dbi buffer mapping"

* tag 'drm-next-2022-01-14' of git://anongit.freedesktop.org/drm/drm: (49 commits)
  drm/mipi-dbi: Fix source-buffer address in mipi_dbi_buf_copy
  drm: fix error found in some cases after the patch d1af5cd86997
  drm/ttm: fix compilation on ARCH=um
  dma-buf: cma_heap: Fix mutex locking section
  video: vga16fb: Only probe for EGA and VGA 16 color graphic cards
  drm/amdkfd: Fix ASIC name typos
  drm/amdkfd: Fix DQM asserts on Hawaii
  drm/amdgpu: Use correct VIEWPORT_DIMENSION for DCN2
  drm/amd/pm: only send GmiPwrDnControl msg on master die (v3)
  drm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt
  drm/amdgpu: not return error on the init_apu_flags
  drm/amdkfd: Use prange->update_list head for remove_list
  drm/amdkfd: Use prange->list head for insert_list
  drm/amdkfd: make SPDX License expression more sound
  drm/amdkfd: Check for null pointer after calling kmemdup
  drm/amd/display: invalid parameter check in dmub_hpd_callback
  Revert "drm/amdgpu: Don't inherit GEM object VMAs in child process"
  drm/amd/display: reset dcn31 SMU mailbox on failures
  drm/amdkfd: use default_groups in kobj_type
  drm/amdgpu: use default_groups in kobj_type
  ...
2022-01-16 06:52:38 +02:00
Alex Deucher
a8e6398ffe drm/amdgpu: drop flags check for CHIP_IP_DISCOVERY
Support for IP based discovery is in place now so this
check is no longer required.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 18:08:01 -05:00
Lukas Fink
5f0754ab27 drm/amdgpu: Fix rejecting Tahiti GPUs
eb4fd29afd ("drm/amdgpu: bind to any 0x1002 PCI diplay class device") added
generic bindings to amdgpu so that that it binds to all display class devices
with VID 0x1002 and then rejects those in amdgpu_pci_probe.

Unfortunately it reuses a driver_data value of 0 to detect those new bindings,
which is already used to denote CHIP_TAHITI ASICs.

The driver_data value given to those new bindings was changed in
dd0761fd24ea1 ("drm/amdgpu: set CHIP_IP_DISCOVERY as the asic type by default")
to CHIP_IP_DISCOVERY (=36), but it seems that the check in amdgpu_pci_probe
was forgotten to be changed. Therefore, it still rejects Tahiti GPUs.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1860
Fixes: eb4fd29afd ("drm/amdgpu: bind to any 0x1002 PCI diplay class device")

Signed-off-by: Lukas Fink <lukas.fink1@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 18:07:55 -05:00
Alex Deucher
d82ce3cd30 drm/amdgpu: drop flags check for CHIP_IP_DISCOVERY
Support for IP based discovery is in place now so this
check is no longer required.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 18:06:44 -05:00