1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

21 commits

Author SHA1 Message Date
Rob Clark
08c4aa3ee2 drm/msm/a6xx: Skip crashdumper state if GPU needs_hw_init
I am seeing some crash logs which imply that we are trying to use
crashdumper hw to read back GPU state when the GPU isn't initialized.
This doesn't go well (for example, GPU could be in 32b address mode
and ignoring the upper bits of buffer that it is trying to dump state
to).

I'm not *quite* sure how we get into this state in the first place,
but lets not make a bad situation worse by triggering iova fault
crashes.

While we're at it, also add the information about whether the GPU is
initialized to the devcore dump to make this easier to see in the
logs (which makes the WARN_ON() redundant and even harmful because
it fills up the small bit of dmesg we get with the crash report).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211209193118.1163248-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-12-13 13:46:18 -08:00
Rob Clark
b859f9b009 drm/msm/gpu: Snapshot GMU debug buffer
It appears to be a GMU fw build option whether it does anything with
debug and log buffers, but if they are all zeros it won't add anything
to the devcore size.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211124214151.1427022-10-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-29 16:19:58 -08:00
Rob Clark
1691e00596 drm/msm/gpu: Also snapshot GMU HFI buffer
This also includes a history of start index of the last 8 messages on
each queue, since parsing backwards to decode recently sent HFI messages
is hard(ish).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211124214151.1427022-9-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-29 16:19:58 -08:00
Rob Clark
203dcd5e9d drm/msm/gpu: Make a6xx_get_gmu_log() more generic
Turn it into a thing we can use to snapshot other GMU buffers.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211124214151.1427022-8-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-29 16:19:58 -08:00
Akhil P Oommen
518380cb54 drm/msm/a6xx: Capture gmu log in devcoredump
Capture gmu log in coredump to enhance debugging.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211124214151.1427022-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-28 10:08:15 -08:00
Douglas Anderson
b4d25abf97 drm/msm/a6xx: Allocate enough space for GMU registers
In commit 142639a52a ("drm/msm/a6xx: fix crashstate capture for
A650") we changed a6xx_get_gmu_registers() to read 3 sets of
registers. Unfortunately, we didn't change the memory allocation for
the array. That leads to a KASAN warning (this was on the chromeos-5.4
kernel, which has the problematic commit backported to it):

  BUG: KASAN: slab-out-of-bounds in _a6xx_get_gmu_registers+0x144/0x430
  Write of size 8 at addr ffffff80c89432b0 by task A618-worker/209
  CPU: 5 PID: 209 Comm: A618-worker Tainted: G        W         5.4.156-lockdep #22
  Hardware name: Google Lazor Limozeen without Touchscreen (rev5 - rev8) (DT)
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0x128/0x1ec
   print_address_description+0x88/0x4a0
   __kasan_report+0xfc/0x120
   kasan_report+0x10/0x18
   __asan_report_store8_noabort+0x1c/0x24
   _a6xx_get_gmu_registers+0x144/0x430
   a6xx_gpu_state_get+0x330/0x25d4
   msm_gpu_crashstate_capture+0xa0/0x84c
   recover_worker+0x328/0x838
   kthread_worker_fn+0x32c/0x574
   kthread+0x2dc/0x39c
   ret_from_fork+0x10/0x18

  Allocated by task 209:
   __kasan_kmalloc+0xfc/0x1c4
   kasan_kmalloc+0xc/0x14
   kmem_cache_alloc_trace+0x1f0/0x2a0
   a6xx_gpu_state_get+0x164/0x25d4
   msm_gpu_crashstate_capture+0xa0/0x84c
   recover_worker+0x328/0x838
   kthread_worker_fn+0x32c/0x574
   kthread+0x2dc/0x39c
   ret_from_fork+0x10/0x18

Fixes: 142639a52a ("drm/msm/a6xx: fix crashstate capture for A650")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20211103153049.1.Idfa574ccb529d17b69db3a1852e49b580132035c@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-21 12:39:06 -08:00
Dmitry Baryshkov
1c8e5748fa drm/msm/a6xx: correct cx_debugbus_read arguments
First argument of cx_debugbus_read() should be 'void __iomem *' rather
than 'void * __iomem' to make sparse happy.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20211002183118.748841-1-dmitry.baryshkov@linaro.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-10-15 16:35:40 -07:00
Rob Clark
030af2b05a drm/msm: drop drm_gem_object_put_locked()
No idea why we were still using this.  It certainly hasn't been needed
for some time.  So drop the pointless twin codepaths.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-07-27 18:09:18 -07:00
Rob Clark
e25e92e08e drm/msm: devcoredump iommu fault support
Wire up support to stall the SMMU on iova fault, and collect a devcore-
dump snapshot for easier debugging of faults.

Currently this is a6xx-only, but mostly only because so far it is the
only one using adreno-smmu-priv.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Link: https://lore.kernel.org/r/20210610214431.539029-6-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-06-23 07:33:55 -07:00
Jonathan Marek
a5fc7aa901 drm/msm: replace MSM_BO_UNCACHED with MSM_BO_WC for internal objects
msm_gem_get_vaddr() currently always maps as writecombine, so use the right
flag instead of relying on broken behavior (things don't actually work if
they are mapped as uncached).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Link: https://lore.kernel.org/r/20210423190833.25319-3-jonathan@marek.ca
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-06-23 07:32:14 -07:00
Lee Jones
692bdf972d drm/msm/adreno/a6xx_gpu_state: Make some local functions static
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:83:7: warning: no previous prototype for ‘state_kcalloc’ [-Wmissing-prototypes]
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:95:7: warning: no previous prototype for ‘state_kmemdup’ [-Wmissing-prototypes]
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:947:6: warning: no previous prototype for ‘a6xx_gpu_state_destroy’ [-Wmissing-prototypes]

Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: linux-arm-msm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-11-29 10:36:53 -08:00
Zhenzhong Duan
08d3ab4b46 drm/msm/a6xx: fix a potential overflow issue
It's allocating an array of a6xx_gpu_state_obj structure rathor than
its pointers.

This patch fix it.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-09-12 10:44:57 -07:00
Rob Clark
6f7cd6e40b drm/msm/a6xx: add module param to enable debugbus snapshot
For production devices, the debugbus sections will typically be fused
off and empty in the gpu device coredump.  But since this may contain
data like cache contents, don't capture it by default.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-08-22 10:49:08 -07:00
Jonathan Marek
142639a52a drm/msm/a6xx: fix crashstate capture for A650
A650 has a separate RSCC region, so dump RSCC registers separately, reading
them from the RSCC base. Without this change a GPU hang will cause a system
reset if CONFIG_DEV_COREDUMP is enabled.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-07-31 06:46:16 -07:00
Sharat Masetty
a5ab31767c drm: msm: a6xx: Dump GBIF registers, debugbus in gpu state
Add the relevant GBIF registers and the debug bus to the a6xx gpu
state. This comes in pretty handy when debugging GPU bus related
issues.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-01-02 16:05:37 -08:00
Sharat Masetty
7f4009c4bb drm: msm: a6xx: fix debug bus register configuration
Fix the cx debugbus related register configuration, to collect accurate
bus data during gpu snapshot. This helps with complete snapshot dump
and also complete proper GPU recovery.

Fixes: 1707add815 ("drm/msm/a6xx: Add a6xx gpu state")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/339165
2019-11-06 12:45:18 -05:00
Jordan Crouse
e400b9edb0 drm/msm/a6xx: Add a name for the crashdumper buffer
Add a buffer object name for the a6xx crashdumper so it can be
seen with the changes introduced by 7799a98edd
("drm/msm: Add a name field for gem objects").

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-11 13:07:09 -05:00
Jordan Crouse
d135c7ebb7 drm/msm/a6xx: Use new kernel API free function for gpu state
dadb36b7ec42 ("drm/msm: Add a common function to free kernel buffer objects")
missed freeing the crashdumper state for a6xx.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-11 13:07:09 -05:00
Jordan Crouse
7ad0e8cf63 drm/msm: Count how many times iova memory is pinned
Add a reference count to track how many times a particular
chunk of iova memory is pinned (mapped) in the iomu and
add msm_gem_unpin_iova to give up references.

It is important to note that msm_gem_unpin_iova replaces
msm_gem_put_iova because the new implicit behavior
that an assigned iova in a given vma is now valid for the
life of the buffer and what we are really focusing on is
the use of that iova.

For now the unmappings are lazy; once the reference counts
go to zero they *COULD* be unmapped dynamically but that
will require an outside force such as a shrinker or
mm_notifiers.  For now, we're just focusing on getting
the counting right and setting ourselves up to be ready
for the future.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-11 13:05:32 -05:00
Jordan Crouse
d6852b4b2d drm/msm/a6xx: Track and manage a6xx state memory
The a6xx GPU state allocates a LOT of memory. Add a bit of
infrastructure to track the memory allocations in the GPU structure
and delete them when the state is destroyed much the same way
that devm works with the device model as a whole.  This protects
against the developer accidentally forgetting to add a kfree() to
an ever growing list.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-11 13:05:30 -05:00
Jordan Crouse
1707add815 drm/msm/a6xx: Add a6xx gpu state
Add support for gathering and dumping the a6xx GPU state including
registers, GMU registers, indexed registers, shader blocks,
context clusters and debugbus.

v2: Fix bugs discovered by Sharat Masetty

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-11 13:05:30 -05:00