1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

977 commits

Author SHA1 Message Date
Jack Xiao
928fe236c0 drm/amdgpu: add mes_kiq module parameter v2
mes_kiq parameter is used to enable mes kiq pipe.
This module parameter is unneccessary or enabled by default
in final version.

v2: reword commit message.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 10:43:49 -04:00
Jack Xiao
2bc956ef54 drm/amdgpu: add the per-context meta data v3
The per-context meta data is a per-context data structure associated
with a mes-managed hardware ring, which includes MCBP CSA, ring buffer
and etc.

v2: fix typo
v3: a. use structure instead of typedef
    b. move amdgpu_mes_ctx_get_offs_* to amdgpu_ring.h
    c. use __aligned to make alignement

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 10:03:17 -04:00
Jack Xiao
5405a52627 drm/amdgpu: define MQD abstract layer for hw ip
Define MQD abstract layer for hw ip, for the passing
mqd configuration not only from ring but more sources,
like user queue.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 10:03:11 -04:00
Likun Gao
7f318f4e30 drm/amdgpu: add tracking for the enablement of SCPM
Add parmeter to shows whether SCPM feature is enabled or not, and
whether is valid.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 09:56:51 -04:00
Likun Gao
563fcfbf31 drm/amdgpu: add hdp version 6 functions
Unify hdp related function into hdp structure for hdp version 6.
V2: Remove hdp invalidate function as hdp v6 doesn't have read cache.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 09:53:58 -04:00
Likun Gao
1d5eee7dd6 drm/amdgpu: add function to decode ip version
Add function to decode IP version.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-28 17:46:28 -04:00
Likun Gao
3202c7e782 drm/amdgpu: increase HWIP MAX INSTANCE
Extend HWIP MAX INSTANCE to 11.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-28 17:46:24 -04:00
Evan Quan
25faeddcf3 drm/amdgpu: expand cg_flags from u32 to u64
With this, we can support more CG flags.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-08 17:24:24 -04:00
Christian König
a190f8dc4a drm/amdgpu: header cleanup
No function change, just move a bunch of definitions from amdgpu.h into
separate header files.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Ruijing Dong
11eb648d01 drm/amdgpu/vcn: Add vcn firmware log
vcn fwlog is for debugging purpose only,
by default, it is disabled.

Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04 13:03:30 -05:00
Dave Airlie
38a15ad948 Merge tag 'amd-drm-next-5.18-2022-02-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.18-2022-02-25:

amdgpu:
- Raven2 suspend/resume fix
- SDMA 5.2.6 updates
- VCN 3.1.2 updates
- SMU 13.0.5 updates
- DCN 3.1.5 updates
- Virtual display fixes
- SMU code cleanup
- Harvest fixes
- Expose benchmark tests via debugfs
- Drop no longer relevant gart aperture tests
- More RAS restructuring
- W=1 fixes
- PSR rework
- DP/VGA adapter fixes
- DP MST fixes
- GPUVM eviction fix
- GPU reset debugfs register dumping support
- Misc display fixes
- SR-IOV fix
- Aldebaran mGPU fix
- Add module parameter to disable XGMI for testing

amdkfd:
- IH ring overflow logging fixes
- CRIU fixes
- Misc fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225183535.5907-1-alexander.deucher@amd.com
2022-03-01 16:19:02 +10:00
Alex Sierra
158a05a0b8 drm/amdgpu: Add use_xgmi_p2p module parameter
This parameter controls xGMI p2p communication, which is enabled by
default. However, it can be disabled by setting it to 0. In case xGMI
p2p is disabled in a dGPU, PCIe p2p interface will be used instead.
This parameter is ignored in GPUs that do not support xGMI
p2p configuration.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-24 17:25:12 -05:00
Dave Airlie
54f43c17d6 drm-misc-next for v5.18:
UAPI Changes:
 
 Cross-subsystem Changes:
 - Split out panel-lvds and lvds dt bindings .
 - Put yes/no on/off disabled/enabled strings in linux/string_helpers.h
   and use it in drivers and tomoyo.
 - Clarify dma_fence_chain and dma_fence_array should never include eachother.
 - Flatten chains in syncobj's.
 - Don't double add in fbdev/defio when page is already enlisted.
 - Don't sort deferred-I/O pages by default in fbdev.
 
 Core Changes:
 - Fix missing pm_runtime_put_sync in bridge.
 - Set modifier support to only linear fb modifier if drivers don't
   advertise support.
 - As a result, we remove allow_fb_modifiers.
 - Add missing clear for EDID Deep Color Modes in drm_reset_display_info.
 - Assorted documentation updates.
 - Warn once in drm_clflush if there is no arch support.
 - Add missing select for dp helper in drm_panel_edp.
 - Assorted small fixes.
 - Improve fb-helper's clipping handling.
 - Don't dump shmem mmaps in a core dump.
 - Add accounting to ttm resource manager, and use it in amdgpu.
 - Allow querying the detected eDP panel through debugfs.
 - Add helpers for xrgb8888 to 8 and 1 bits gray.
 - Improve drm's buddy allocator.
 - Add selftests for the buddy allocator.
 
 Driver Changes:
 - Add support for nomodeset to a lot of drm drivers.
 - Use drm_module_*_driver in a lot of drm drivers.
 - Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau,
   bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86.
 - Add bridge/it6505.
 - Create DP and DVI-I connectors in ast.
 - Assorted nouveau backlight fixes.
 - Rework amdgpu reset handling.
 - Add dt bindings for ingenic,jz4780-dw-hdmi.
 - Support reading edid through aux channel in ingenic.
 - Add a drm driver for Solomon SSD130x OLED displays.
 - Add simple support for sharp LQ140M1JW46.
 - Add more panels to nt35560.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmIWLSEACgkQ/lWMcqZw
 E8OP7hAAjix94EX5fhFa7OAdqUbFtsiKhK/4zNtV9FWpFiEsDBz+dlbfDQWIx5an
 FIiiiQtSfWjpDv6pcMhoNf80w+dDbc/Cuauz6nNGO7Pkaerh2D/EPG74FD7f7nE3
 EIScVs1heYtzM9usKrFKupNYgIdgZxmeovClWuE0OTjLOes2PGvvxXK6iJqNqJMX
 VlDO5SR7GRqsDUDV6vmwl63uKL77xJXAahAXIx+BQ/1xrtEhlu6NwsgHIsmPmMSN
 YluX34zc1xD/6/uUqvEdp7u46/5/He1c5Q/ia1WV3wRxsO/eMZ+axXqCZP3XGZdt
 rMdGNtj1MWKkudYiowStWkCVSG/0fXJCFIAhvRmeZy+YqPdVlqZ2W7g4H1l9iJoo
 UVfT9cHrKoxHsukvIEckC5Ov9v1yr39Bd4wUuqaUTUSxY8VID5vjY63TsXl9Zke1
 SluTFe9qybbnRNz/hYRvwIS1eT8HvUauAfAhypGTLI5DYHTD7PawcfMJkNzCtJm4
 Ta4SC3rTpkpN+7oc8SoNgqRHQ8U9KL5oksP0wVa8vwHsMptSd3X4pUljc6TcfjLv
 GEo41D5AuJz3HRVcn9yqPbLoPE2FFB7bfwIMH77yNnoos4Izy/LGhKpN0YdImmI5
 W5XVFB0jltGSIhkzLe1mFpLrdJwdUTSUVeCK4H5PhZZQEHLkVtg=
 =HuwD
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2022-02-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v5.18:

UAPI Changes:

Cross-subsystem Changes:
- Split out panel-lvds and lvds dt bindings .
- Put yes/no on/off disabled/enabled strings in linux/string_helpers.h
  and use it in drivers and tomoyo.
- Clarify dma_fence_chain and dma_fence_array should never include eachother.
- Flatten chains in syncobj's.
- Don't double add in fbdev/defio when page is already enlisted.
- Don't sort deferred-I/O pages by default in fbdev.

Core Changes:
- Fix missing pm_runtime_put_sync in bridge.
- Set modifier support to only linear fb modifier if drivers don't
  advertise support.
- As a result, we remove allow_fb_modifiers.
- Add missing clear for EDID Deep Color Modes in drm_reset_display_info.
- Assorted documentation updates.
- Warn once in drm_clflush if there is no arch support.
- Add missing select for dp helper in drm_panel_edp.
- Assorted small fixes.
- Improve fb-helper's clipping handling.
- Don't dump shmem mmaps in a core dump.
- Add accounting to ttm resource manager, and use it in amdgpu.
- Allow querying the detected eDP panel through debugfs.
- Add helpers for xrgb8888 to 8 and 1 bits gray.
- Improve drm's buddy allocator.
- Add selftests for the buddy allocator.

Driver Changes:
- Add support for nomodeset to a lot of drm drivers.
- Use drm_module_*_driver in a lot of drm drivers.
- Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau,
  bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86.
- Add bridge/it6505.
- Create DP and DVI-I connectors in ast.
- Assorted nouveau backlight fixes.
- Rework amdgpu reset handling.
- Add dt bindings for ingenic,jz4780-dw-hdmi.
- Support reading edid through aux channel in ingenic.
- Add a drm driver for Solomon SSD130x OLED displays.
- Add simple support for sharp LQ140M1JW46.
- Add more panels to nt35560.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/686ec871-e77f-c230-22e5-9e3bb80f064a@linux.intel.com
2022-02-25 05:50:18 +10:00
Somalapuram Amaranath
5ce5a584cb drm/amdgpu: add debugfs for reset registers list
List of register populated for dump collection during the GPU reset.

Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23 14:26:36 -05:00
Alex Deucher
b784f42cf7 drm/amdgpu: drop testing module parameter
This test is not particularly useful now that GTT and GART
are decoupled in the driver.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23 14:02:51 -05:00
Alex Deucher
0b1a63487b drm/amdgpu: drop benchmark module parameter
Now that we expose the benchmarks via debugfs, there is no
longer a need for the module parameter.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23 14:02:51 -05:00
Alex Deucher
f113cc32e3 drm/amdgpu: add a benchmark mutex
To avoid multiple runs in parallel to avoid mixing results.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23 14:02:50 -05:00
Alex Deucher
e460f244fb drm/amdgpu: plumb error handling though amdgpu_benchmark()
So we can tell when this function fails.

v2: squash in error handling fix (Alex)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23 14:02:50 -05:00
Mario Limonciello
0ab5d711ec drm/amd: Refactor amdgpu_aspm to be evaluated per device
Evaluating `pcie_aspm_enabled` as part of driver probe has the implication
that if one PCIe bridge with an AMD GPU connected doesn't support ASPM
then none of them do.  This is an invalid assumption as the PCIe core will
configure ASPM for individual PCIe bridges.

Create a new helper function that can be called by individual dGPUs to
react to the `amdgpu_aspm` module parameter without having negative results
for other dGPUs on the PCIe bus.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17 15:59:05 -05:00
Luben Tuikov
a6c40b1780 drm/amdgpu: Show IP discovery in sysfs
Add IP discovery data in sysfs. The format is:
/sys/class/drm/cardX/device/ip_discovery/die/D/B/I/<attrs>
where,
X is the card ID, an integer,
D is the die ID, an integer,
B is the IP HW ID, an integer, aka block type,
I is the IP HW ID instance, an integer.
<attrs> are the attributes of the block instance. At the moment these
include HW ID, instance number, major, minor, revision, number of base
addresses, and the base addresses themselves.

A symbolic link of the acronym HW ID is also created, under D/, if you
prefer to browse by something humanly accessible.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Tom StDenis <tom.stdenis@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-14 15:08:40 -05:00
Dave Airlie
123db17ddf Merge tag 'amd-drm-next-5.18-2022-02-11-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.18-2022-02-11-1:

amdgpu:
- Clean up of power management code
- Enable freesync video mode by default
- Clean up of RAS code
- Improve VRAM access for debug using SDMA
- Coding style cleanups
- SR-IOV fixes
- More display FP reorg
- TLB flush fixes for Arcuturus, Vega20
- Misc display fixes
- Rework special register access methods for SR-IOV
- DP2 fixes
- DP tunneling fixes
- DSC fixes
- More IP discovery cleanups
- Misc RAS fixes
- Enable both SMU i2c buses where applicable
- s2idle improvements
- DPCS header cleanup
- Add new CAP firmware support for SR-IOV

amdkfd:
- Misc cleanups
- SVM fixes
- CRIU support
- Clean up MQD manager

UAPI:
- Add interface to amdgpu CTX ioctl to request a stable power state for profiling
  https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/207
- Add amdkfd support for CRIU
  https://github.com/checkpoint-restore/criu/pull/1709
- Remove old unused amdkfd debugger interface
  Was only implemented for Kaveri and was only ever used by an old HSA tool that was never open sourced

radeon:
- Fix error handling in radeon_driver_open_kms
- UVD suspend fix
- Misc fixes

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220211220706.5803-1-alexander.deucher@amd.com
2022-02-14 10:31:51 +10:00
Andrey Grodzovsky
89a7a87093 drm/amdgpu: Move in_gpu_reset into reset_domain
We should have a single instance per entrire reset domain.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74116.html
2022-02-09 12:17:57 -05:00
Andrey Grodzovsky
d0fb18b535 drm/amdgpu: Move reset sem into reset_domain
We want single instance of reset sem across all
reset clients because in case of XGMI we should stop
access cross device MMIO because any of them could be
in a reset in the moment.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
2022-02-09 12:17:32 -05:00
Andrey Grodzovsky
cfbb6b0047 drm/amdgpu: Rework reset domain to be refcounted.
The reset domain contains register access semaphor
now and so needs to be present as long as each device
in a hive needs it and so it cannot be binded to XGMI
hive life cycle.
Adress this by making reset domain refcounted and pointed
by each member of the hive and the hive itself.

v4:

Fix crash on boot witrh XGMI hive by adding type to reset_domain.
XGMI will only create a new reset_domain if prevoius was of single
device type meaning it's first boot. Otherwsie it will take a
refocunt to exsiting reset_domain from the amdgou device.

Add a wrapper around reset_domain->refcount get/put
and a wrapper around send to reset wq (Lijo)

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
2022-02-09 12:17:09 -05:00
Andrey Grodzovsky
54f329cc7a drm/amdgpu: Serialize non TDR gpu recovery with TDRs
Use reset domain wq also for non TDR gpu recovery trigers
such as sysfs and RAS. We must serialize all possible
GPU recoveries to gurantee no concurrency there.
For TDR call the original recovery function directly since
it's already executed from within the wq. For others just
use a wrapper to qeueue work and wait on it to finish.

v2: Rename to amdgpu_recover_work_struct

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74113.html
2022-02-09 12:15:23 -05:00
Andrey Grodzovsky
a4c63cafa5 drm/amdgpu: Introduce reset domain
Defined a reset_domain struct such that
all the entities that go through reset
together will be serialized one against
another. Do it for both single device and
XGMI hive cases.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Suggested-by: Christian König <ckoenig.leichtzumerken@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74111.html
2022-02-09 12:14:32 -05:00
Mario Limonciello
18b66ace6b drm/amd: add support to check whether the system is set to s3
This will be used to help make decisions on what to do in
misconfigured systems.

v2: squash in semicolon fix from Stephen Rothwell

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-02 18:26:30 -05:00
Mario Limonciello
f588a1bbfc drm/amd: Warn users about potential s0ix problems
On some OEM setups users can configure the BIOS for S3 or S2idle.
When configured to S3 users can still choose 's2idle' in the kernel by
using `/sys/power/mem_sleep`.  Before commit 6dc8265f98 ("drm/amdgpu:
always reset the asic in suspend (v2)"), the GPU would crash.  Now when
configured this way, the system should resume but will use more power.

As such, adjust the `amdpu_acpi_is_s0ix function` to warn users about
potential power consumption issues during their first attempt at
suspending.

Reported-by: Bjoren Dasse <bjoern.daase@gmail.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1824
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-31 16:12:29 -05:00
Hawking Zhang
04022982fc drm/amdgpu: switch to common helper to read bios from rom
create a common helper function for soc15 and onwards
to read bios image from rom

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:33 -05:00
Xiaojian Du
86700a4026 drm/amdgpu: modify a pair of functions for the pcie port wreg/rreg
This patch will modify a pair of functions for pcie port wreg/rreg.
AMD GPU have had an independent NBIO block from SOC15 arch.
If the dirver wants to read/write the address space of the pcie devices,
it has to go through the NBIO block.
This patch will move the pcie port wreg/rreg functions to
"amdgpu_device.c", so that to reuse the functions on the
future GPU ASICs.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-19 22:32:11 -05:00
Jonathan Kim
400ef298f4 drm/amdgpu: cleanup ttm debug sdma vram access function
Some suggested cleanups to declutter ttm when doing debug VRAM access over
SDMA.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:52:00 -05:00
yipechai
7cab212405 drm/amdgpu: Modify the compilation failed problem when other ras blocks' .h include amdgpu_ras.h
Modify the compilation failed problem when other ras blocks' .h include amdgpu_ras.h.

v2: squash in forward declaration warning fix (Alex)

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:51:59 -05:00
yipechai
6492e1b07c drm/amdgpu: Unify ras block interface for each ras block
1. Define unified ops interface for each block.
2. Add ras_block_match function pointer in ops interface, each ras block can customize specail match function to identify itself.
3. Add amdgpu_ras_block_match_default new function. If a ras block doesn't define .ras_block_match, default execute amdgpu_ras_block_match_default to identify this ras block.
4. Define unified basic ras block data for each ras block.
5. Create dedicated amdgpu device ras block link list to manage all of the ras blocks.
6. Add amdgpu_ras_register_ras_block new function interface for each ras block to register itself to ras controlling block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:51:59 -05:00
Evan Quan
ebfc253335 drm/amd/pm: do not expose the smu_context structure used internally in power
This can cover the power implementation details. And as what did for
powerplay framework, we hook the smu_context to adev->powerplay.pp_handle.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:51:14 -05:00
Evan Quan
d698a2c485 drm/amd/pm: move pp_force_state_enabled member to amdgpu_pm structure
As it lables an internal pm state and amdgpu_pm structure is the more
proper place than amdgpu_device structure for it.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:51:14 -05:00
Solomon Chiu
de05abe6b9 drm/amd/display: Enable Freesync Video Mode by default
[Why&How]
Freesync Video Mode is a experimental feature previously,
and need to be enabled by kernel parameter. We enable it
by default with removing module paramterter in amdgpu_dm.

v2: squash the patches together

Signed-off-by: Solomon Chiu <solomon.chiu@amd.com>
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:51:13 -05:00
Linus Torvalds
8d0749b4f8 drm for 5.17-rc1
core:
 - add privacy screen support
 - move nomodeset option into drm subsystem
 - clean up nomodeset handling in drivers
 - make drm_irq.c legacy
 - fix stack_depot name conflicts
 - remove DMA_BUF_SET_NAME ioctl restrictions
 - sysfs: send hotplug event
 - replace several DRM_* logging macros with drm_*
 - move hashtable to legacy code
 - add error return from gem_create_object
 - cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
 - kernel.h related include cleanups
 - support XRGB2101010 source buffers
 
 ttm:
 - don't include drm hashtable
 - stop pruning fences after wait
 - documentation updates
 
 dma-buf:
 - add dma_resv selftest
 - add debugfs helpers
 - remove dma_resv_get_excl_unlocked
 - documentation
 - make fences mandatory in dma_resv_add_excl_fence
 
 dp:
 - add link training delay helpers
 
 gem:
 - link shmem/cma helpers into separate modules
 - use dma_resv iteratior
 - import dma-buf namespace into gem helper modules
 
 scheduler:
 - fence grab fix
 - lockdep fixes
 
 bridge:
 - switch to managed MIPI DSI helpers
 - register and attach during probe fixes
 - convert to YAML in several places.
 
 panel:
 - add bunch of new panesl
 
 simpledrm:
 - support FB_DAMAGE_CLIPS
 - support virtual screen sizes
 - add Apple M1 support
 
 amdgpu:
 - enable seamless boot for DCN 3.01
 - runtime PM fixes
 - use drm_kms_helper_connector_hotplug_event
 - get all fences at once
 - use generic drm fb helpers
 - PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
 - add smart trace buffer (STB) for supported GPUs
 - display debugfs entries
 - new SMU debug option
 - Documentation update
 
 amdkfd:
 - IP discovery enumeration refactor
 - interface between driver fixes
 - SVM fixes
 - kfd uapi header to define some sysfs bitfields.
 
 i915:
 - support VESA panel backlights
 - enable ADL-P by default
 - add eDP privacy screen support
 - add Raptor Lake S (RPL-S) support
 - DG2 page table support
 - lots of GuC/HuC fw refactoring
 - refactored i915->gt interfaces
 - CD clock squashing support
 - enable 10-bit gamma support
 - update ADL-P DMC fw to v2.14
 - enable runtime PM autosuspend by default
 - ADL-P DSI support
 - per-lane DP drive settings for ICL+
 - add support for pipe C/D DMC firmware
 - Atomic gamma LUT updates
 - remove CCS FB stride restrictions on ADL-P
 - VRR platform support for display 11
 - add support for display audio codec keepalive
 - lots of display refactoring
 - fix runtime PM handling during PXP suspend
 - improved eviction performance with async TTM moves
 - async VMA unbinding improvements
 - VMA locking refactoring
 - improved error capture robustness
 - use per device iommu checks
 - drop bits stealing from i915_sw_fence function ptr
 - remove dma_resv_prune
 - add IC cache invalidation on DG2
 
 nouveau:
 - crc fixes
 - validate LUTs in atomic check
 - set HDMI AVI RGB quant to full
 
 tegra:
 - buffer objects reworks for dma-buf compat
 - NVDEC driver uAPI support
 - power management improvements
 
 etnaviv:
 - IOMMU enabled system support
 - fix > 4GB command buffer mapping
 - close a DoS vector
 - fix spurious GPU resets
 
 ast:
 - fix i2c initialization
 
 rcar-du:
 - DSI output support
 
 exynos:
 - replace legacy gpio interface
 - implement generic GEM object mmap
 
 msm:
 - dpu plane state cleanup in prep for multirect
 - dpu debugfs cleanups
 - dp support for sc7280
 - a506 support
 - removal of struct_mutex
 - remove old eDP sub-driver
 
 anx7625:
 - support MIPI DSI input
 - support HDMI audio
 - fix reading EDID
 
 lvds:
 - fix bridge DT bindings
 
 megachips:
 - probe both bridges before registering
 
 dw-hdmi:
 - allow interlace on bridge
 
 ps8640:
 - enable runtime PM
 - support aux-bus
 
 tx358768:
 - enable reference clock
 - add pulse mode support
 
 ti-sn65dsi86:
 - use regmap bulk write
 - add PWM support
 
 etnaviv:
 - get all fences at once
 
 gma500:
 - gem object cleanups
 
 kmb:
 - enable fb console
 
 radeon:
 - use dma_resv_wait_timeout
 
 rockchip:
 - add DSP hold timeout
 - suspend/resume fixes
 - PLL clock fixes
 - implement mmap in GEM object functions
 - use generic fbdev emulation
 
 sun4i:
 - use CMA helpers without vmap support
 
 vc4:
 - fix HDMI-CEC hang with display is off
 - power on HDMI controller while disabling
 - support 4K@60Hz modes
 - support 10-bit YUV 4:2:0 output
 
 vmwgfx:
 - fix leak on probe errors
 - fail probing on broken hosts
 - new placement for MOB page tables
 - hide internal BOs from userspace
 - implement GEM support
 - implement GL 4.3 support
 
 virtio:
 - overflow fixes
 
 xen:
 - implement mmap as GEM object function
 
 omapdrm:
 - fix scatterlist export
 - support virtual planes
 
 mediatek:
 - MT8192 support
 - CMDQ refinement
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmHX1vMACgkQDHTzWXnE
 hr6rAw/9ES5RO5N3Ku9foFk1CI9bqy1Kh663KLkkEc+rDdhKpiZbBnAsrKkZ9sGu
 fNuHmWNN5nWXtDSOqHWuslt3F7Gh+qEBQtlkqC9mZsBm3bWB0aJK6E4QaJxfeSaK
 ta6AmyGx8DaV+C69i86dnemQurYSDVjROd7LDPKnCU0Fye/JxiXSXQmXksKMFVxd
 x5vmO9yfeDSg3EF+u1yB6nJNUYZBV0vhrAfjPqxPCRBXuQc7akuaglE/SFwlGnEk
 vn0GjVHEQcRTqYKrHr64xvQxIoKXcJP0pkDUyT7KYCsyj8GJkvxkb7/ls5pp5DvL
 SwyNg3J3vwUVP6w6GEvzf3ffG720qqUZvCbvLmE+A/t2DhGILiAm+HXSo43PTOW8
 uagT7Gxma8dy8EovjSxioS9HPX8Gcu+S+XYavgOsevOZ7oeEt4f4TLW7LXsw9d6y
 75FrMhiUpreab5hAh8Le0swuLYZHjdnJRdjSTqZJ/T6VdTdVftLT6IfwvSDx5CHy
 cWuufgcAjd7xVTXFquHWYXWLTQkiSMGf1M02jx9IWolTd4Cm41LNBhqMEDHZLHJD
 7ngGgoaREVDQ+MqjG90yfIwJFIpJPI3YOaHLi/Kznga+iDzFY6cyOQWW2vX7ZdY5
 7+LJWsgGT8Feb7/bzD5hX1mYqJLxh1pWUIaqIKMl+7LJL7gTVU8=
 =MByd
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Highlights are support for privacy screens found in new laptops, a
  bunch of nomodeset refactoring, and i915 enables ADL-P systems by
  default, while starting to add RPL-S support.

  vmwgfx adds GEM and support for OpenGL 4.3 features in userspace.

  Lots of internal refactorings around dma reservations, and lots of
  driver refactoring as well.

  Summary:

  core:
   - add privacy screen support
   - move nomodeset option into drm subsystem
   - clean up nomodeset handling in drivers
   - make drm_irq.c legacy
   - fix stack_depot name conflicts
   - remove DMA_BUF_SET_NAME ioctl restrictions
   - sysfs: send hotplug event
   - replace several DRM_* logging macros with drm_*
   - move hashtable to legacy code
   - add error return from gem_create_object
   - cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
   - kernel.h related include cleanups
   - support XRGB2101010 source buffers

  ttm:
   - don't include drm hashtable
   - stop pruning fences after wait
   - documentation updates

  dma-buf:
   - add dma_resv selftest
   - add debugfs helpers
   - remove dma_resv_get_excl_unlocked
   - documentation
   - make fences mandatory in dma_resv_add_excl_fence

  dp:
   - add link training delay helpers

  gem:
   - link shmem/cma helpers into separate modules
   - use dma_resv iteratior
   - import dma-buf namespace into gem helper modules

  scheduler:
   - fence grab fix
   - lockdep fixes

  bridge:
   - switch to managed MIPI DSI helpers
   - register and attach during probe fixes
   - convert to YAML in several places.

  panel:
   - add bunch of new panesl

  simpledrm:
   - support FB_DAMAGE_CLIPS
   - support virtual screen sizes
   - add Apple M1 support

  amdgpu:
   - enable seamless boot for DCN 3.01
   - runtime PM fixes
   - use drm_kms_helper_connector_hotplug_event
   - get all fences at once
   - use generic drm fb helpers
   - PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
   - add smart trace buffer (STB) for supported GPUs
   - display debugfs entries
   - new SMU debug option
   - Documentation update

  amdkfd:
   - IP discovery enumeration refactor
   - interface between driver fixes
   - SVM fixes
   - kfd uapi header to define some sysfs bitfields.

  i915:
   - support VESA panel backlights
   - enable ADL-P by default
   - add eDP privacy screen support
   - add Raptor Lake S (RPL-S) support
   - DG2 page table support
   - lots of GuC/HuC fw refactoring
   - refactored i915->gt interfaces
   - CD clock squashing support
   - enable 10-bit gamma support
   - update ADL-P DMC fw to v2.14
   - enable runtime PM autosuspend by default
   - ADL-P DSI support
   - per-lane DP drive settings for ICL+
   - add support for pipe C/D DMC firmware
   - Atomic gamma LUT updates
   - remove CCS FB stride restrictions on ADL-P
   - VRR platform support for display 11
   - add support for display audio codec keepalive
   - lots of display refactoring
   - fix runtime PM handling during PXP suspend
   - improved eviction performance with async TTM moves
   - async VMA unbinding improvements
   - VMA locking refactoring
   - improved error capture robustness
   - use per device iommu checks
   - drop bits stealing from i915_sw_fence function ptr
   - remove dma_resv_prune
   - add IC cache invalidation on DG2

  nouveau:
   - crc fixes
   - validate LUTs in atomic check
   - set HDMI AVI RGB quant to full

  tegra:
   - buffer objects reworks for dma-buf compat
   - NVDEC driver uAPI support
   - power management improvements

  etnaviv:
   - IOMMU enabled system support
   - fix > 4GB command buffer mapping
   - close a DoS vector
   - fix spurious GPU resets

  ast:
   - fix i2c initialization

  rcar-du:
   - DSI output support

  exynos:
   - replace legacy gpio interface
   - implement generic GEM object mmap

  msm:
   - dpu plane state cleanup in prep for multirect
   - dpu debugfs cleanups
   - dp support for sc7280
   - a506 support
   - removal of struct_mutex
   - remove old eDP sub-driver

  anx7625:
   - support MIPI DSI input
   - support HDMI audio
   - fix reading EDID

  lvds:
   - fix bridge DT bindings

  megachips:
   - probe both bridges before registering

  dw-hdmi:
   - allow interlace on bridge

  ps8640:
   - enable runtime PM
   - support aux-bus

  tx358768:
   - enable reference clock
   - add pulse mode support

  ti-sn65dsi86:
   - use regmap bulk write
   - add PWM support

  etnaviv:
   - get all fences at once

  gma500:
   - gem object cleanups

  kmb:
   - enable fb console

  radeon:
   - use dma_resv_wait_timeout

  rockchip:
   - add DSP hold timeout
   - suspend/resume fixes
   - PLL clock fixes
   - implement mmap in GEM object functions
   - use generic fbdev emulation

  sun4i:
   - use CMA helpers without vmap support

  vc4:
   - fix HDMI-CEC hang with display is off
   - power on HDMI controller while disabling
   - support 4K@60Hz modes
   - support 10-bit YUV 4:2:0 output

  vmwgfx:
   - fix leak on probe errors
   - fail probing on broken hosts
   - new placement for MOB page tables
   - hide internal BOs from userspace
   - implement GEM support
   - implement GL 4.3 support

  virtio:
   - overflow fixes

  xen:
   - implement mmap as GEM object function

  omapdrm:
   - fix scatterlist export
   - support virtual planes

  mediatek:
   - MT8192 support
   - CMDQ refinement"

* tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm: (1241 commits)
  drm/amdgpu: no DC support for headless chips
  drm/amd/display: fix dereference before NULL check
  drm/amdgpu: always reset the asic in suspend (v2)
  drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
  drm/amd/display: Fix the uninitialized variable in enable_stream_features()
  drm/amdgpu: fix runpm documentation
  amdgpu/pm: Make sysfs pm attributes as read-only for VFs
  drm/amdgpu: save error count in RAS poison handler
  drm/amdgpu: drop redundant semicolon
  drm/amd/display: get and restore link res map
  drm/amd/display: support dynamic HPO DP link encoder allocation
  drm/amd/display: access hpo dp link encoder only through link resource
  drm/amd/display: populate link res in both detection and validation
  drm/amd/display: define link res and make it accessible to all link interfaces
  drm/amd/display: 3.2.167
  drm/amd/display: [FW Promotion] Release 0.0.98
  drm/amd/display: Undo ODM combine
  drm/amd/display: Add reg defs for DCN303
  drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
  drm/amd/display: Set optimize_pwr_state for DCN31
  ...
2022-01-10 12:58:46 -08:00
Alex Deucher
b95dc06af3 drm/amdgpu: disable runpm if we are the primary adapter
If we are the primary adapter (i.e., the one used by the firwmare
framebuffer), disable runtime pm.  This fixes a regression caused
by commit 55285e21f0 which results in the displays waking up
shortly after they go to sleep due to the device coming out of
runtime suspend and sending a hotplug uevent.

v2: squash in reworked fix from Evan

Fixes: 55285e21f0 ("fbdev/efifb: Release PCI device's runtime PM ref during FB destroy")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215203
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1840
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-31 08:57:45 -05:00
Dave Airlie
cb6846fbb8 Merge tag 'amd-drm-next-5.17-2021-12-30' of ssh://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.17-2021-12-30:

amdgpu:
- Suspend/resume fixes
- Fence fix
- Misc code cleanups
- IP discovery fixes
- SRIOV fixes
- RAS fixes
- GMC 8 VRAM detection fix
- FRU fixes for Aldebaran
- Display fixes

amdkfd:
- SVM fixes
- IP discovery fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211230141032.613596-1-alexander.deucher@amd.com
2021-12-31 10:59:17 +10:00
Kent Russell
6c92fe5fa5 drm/amdgpu: Increase potential product_name to 64 characters
Having seen at least 1 42-character product_name, bump the number up to
64, and put that definition into amdgpu.h to make future adjustments
simpler.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Dave Airlie
b06103b532 Merge tag 'amd-drm-next-5.17-2021-12-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amdgpu:
- Add some display debugfs entries
- RAS fixes
- SR-IOV fixes
- W=1 fixes
- Documentation fixes
- IH timestamp fix
- Misc power fixes
- IP discovery fixes
- Large driver documentation updates
- Multi-GPU memory use reductions
- Misc display fixes and cleanups
- Add new SMU debug option

amdkfd:
- SVM fixes

radeon:
- Fix typo in comment

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216202731.5900-1-alexander.deucher@amd.com
2021-12-23 11:55:28 +10:00
Philip Yang
4a74c38cd6 drm/amdgpu: Detect if amdgpu in IOMMU direct map mode
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode,
set adev->ram_is_direct_mapped flag which will be used to optimize
memory usage for multi GPU mappings.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13 16:33:16 -05:00
Lang Yu
34f3a4a98b drm/amdgpu: introduce a kind of halt state for amdgpu device
It is useful to maintain error context when debugging
SW/FW issues. Introduce amdgpu_device_halt() for this
purpose. It will bring hardware to a kind of halt state,
so that no one can touch it any more.

Compare to a simple hang, the system will keep stable
at least for SSH access. Then it should be trivial to
inspect the hardware state and see what's going on.

v2:
 - Set adev->no_hw_access earlier to avoid potential crashes.(Christian)

Suggested-by: Christian Koenig <christian.koenig@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.co>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13 16:33:16 -05:00
Isabella Basso
e105b64a36 drm/amdgpu: fix location of prototype for amdgpu_kms_compat_ioctl
This fixes the warning below by changing the prototype to a location
that's actually included by the .c files that call
amdgpu_kms_compat_ioctl:

 warning: no previous prototype for ‘amdgpu_kms_compat_ioctl’
 [-Wmissing-prototypes]
 37 | long amdgpu_kms_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
    |      ^~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13 16:32:34 -05:00
Lijo Lazar
fe9c5c9aff drm/amdgpu: Use MAX_HWIP instead of HW_ID_MAX
HW_ID_MAX considers HWID of all IPs, far more than what amdgpu uses.
amdgpu tracks only the IPs defined by amd_hw_ip_block_type whose max
is MAX_HWIP.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01 16:04:10 -05:00
Thomas Zimmermann
a713ca234e Merge drm/drm-next into drm-misc-next
Backmerging from drm/drm-next for v5.16-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2021-11-18 09:36:39 +01:00
Christian König
fa78e367a2 drm/amdgpu: stop getting excl fence separately
Just grab all fences for the display flip in one go.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211028132630.2330-2-christian.koenig@amd.com
2021-11-17 14:32:27 +01:00
Kent Russell
68daadf3d6 drm/amdgpu: Add kernel parameter support for ignoring bad page threshold
When a GPU hits the bad_page_threshold, it will not be initialized by
the amdgpu driver. This means that the table cannot be cleared, nor can
information gathering be performed (getting serial number, BDF, etc).

If the bad_page_threshold kernel parameter is set to -2,
continue to initialize the GPU, while printing a warning to dmesg that
this action has been done

v2: squash in Luben's fix to restore RAS info reporting

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:12 -04:00
Guchun Chen
e17e27f9bd drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume
In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.

To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.

Fixes: c9a6b82f45 ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-05 12:23:56 -04:00
Christian König
c8365dbda0 drm/amdgpu: revert "Add autodump debugfs node for gpu reset v8"
This reverts commit 728e7e0cd6.

Further discussion reveals that this feature is severely broken
and needs to be reverted ASAP.

GPU reset can never be delayed by userspace even for debugging or
otherwise we can run into in kernel deadlocks.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-05 12:22:36 -04:00