1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

19 commits

Author SHA1 Message Date
Yang Wang
258ed689bc drm/amdgpu: change bank cache lock type to spinlock
modify the lock type to 'spinlock' to avoid schedule issue
in interrupt context.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:39 -04:00
Yang Wang
5eccab32c1 drm/amdgpu: avoid dump mca bank log muti times during ras ISR
because the ue valid mca count will only be cleared after gpu reset,
so only dump mca log on the first time to get mca bank after receive RAS interrupt.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:47 -04:00
Yang Wang
76ad30f51a drm/amdgpu: add MCA smu cache support
v1:
because SMU CE valid mca bank will be cleared after reading,
this patch adds mca cache at the driver level to ensure that the mca bank is not lost.

v2:
refine amdgpu_mca_init/fini/reset() function name.

v3:
add mca_cache.lock support
only add CE bank to mca bank cache.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:41 -04:00
Yang Wang
7e0357bef4 drm/amdgpu: remove unused MCA driver codes
- remove unused callback functions.
- make part of mca functions static and refine the function order.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:51:44 -04:00
Yang Wang
9dc57c2adf drm/amdgpu: add ras event id support
add amdgpu ras event id support to better distinguish different
error information sources in dmesg logs.

the following log will be identify by event id:
{event_id} interrupt to inform RAS event
{event_id} ACA logs
{event_id} errors statistic since from current injection/error query
{event_id} errors statistic since from gpu load

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:13 -04:00
YiPeng Chai
afb617f38f drm/amdgpu: add interface to check mca umc status
Add interface to check mca umc status.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:25 -05:00
Hawking Zhang
058eb51912 drm/amdgpu: Switch to aca bank for xgmi pcs err cnt
Instead of software managed counters.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:28:47 -05:00
Yang Wang
37c57631c1 drm/amd/pm: support new mca smu error code decoding
support new mca smu error code decoding from smu 85.86.0 for smu v13.0.6

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Yang Wang
76d2da18af drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support
add pcs xgmi ras error query support for smu v13.0.6.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:59 -05:00
Hawking Zhang
d406aec8dc drm/amdgpu: correct acclerator check architecutre dump
So driver doesn't touch invalid aca entries.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:26 -05:00
Yang Wang
07c1db7036 drm/amdgpu: refine smu v13.0.6 mca dump driver
refine smu mca driver to support query ras error from pmfw path.
- correct gfx smu bank hwid (from mp5 to smu bank)
- retire unused callback function in amdgpu_mca_smu_funcs{}
- add new mca_bank_set{} structure to collect mca bank
- move enum mca_reg_idx into amdgpu_mca.h header
- add mca status register field decode macro

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:51 -05:00
Yang Wang
4051844c66 drm/amdgpu: add amdgpu mca debug sysfs support
add amdgpu mca debug sysfs support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:25:19 -04:00
Yang Wang
7ff607e272 drm/amdgpu: add amdgpu smu mca dump feature support
add amdgpu smu mca dump feature support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:25:01 -04:00
Hawking Zhang
7f544c5488 drm/amdgpu: Rework mca ras sw_init
To align with other IP blocks

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15 18:45:26 -04:00
yipechai
30e58102d5 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in mca ras block
Remove redundant calls of amdgpu_ras_block_late_fini in mca ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
20c43547ad drm/amdgpu: Remove redundant calls of ras_late_init in mca ras block
Remove redundant calls of ras_late_init in mca ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17 15:59:05 -05:00
yipechai
b0e2062dc8 drm/amdgpu: Modify mca block to fit for the unified ras block data and ops
1.Modify mca block to fit for the unified ras block data and ops.
2.Define special .ras_block_match function for mca block to identify itself.
3.Change amdgpu_mca_ras_funcs to amdgpu_mca_ras_block(amdgpu_mca_ras had been used), and the corresponding variable name remove _funcs suffix.
4.Remove the const flag of cma ras variable so that cma ras block can be able to be inserted into amdgpu device ras block link list.
5.Invoke amdgpu_ras_register_ras_block function to register cma ras block into amdgpu device ras block link list.
6.Remove the redundant code about cma in amdgpu_ras.c after using the unified ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:52:00 -05:00
John Clements
640ae42efb drm/amdgpu: Updated RAS infrastructure
Update RAS infrastructure to support RAS query for MCA subblocks

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-23 16:34:43 -04:00
John Clements
3907c49218 drm/amdgpu: Add driver infrastructure for MCA RAS
Add MCA specific IP blocks targetting RAS features

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:36:18 -04:00