1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

45 commits

Author SHA1 Message Date
Rob Clark
917e9b7c23 Revert "drm/msm/gpu: Push gpu lock down past runpm"
This reverts commit abe2023b4c.

Changing the locking order means that scheduler/msm_job_run() can race
with the recovery kthread worker, with the result that the GPU gets an
extra runpm get when we are trying to power it off.  Leaving the GPU in
an unrecovered state.

I'll need to come up with a different scheme for appeasing lockdep.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/573835/
2024-02-01 15:24:10 -08:00
Rob Clark
2d7d2c4e84 drm/msm/gem: Split out submit_unpin_objects() helper
Untangle unpinning from unlock/unref loop.  The unpin only happens in
error paths so it is easier to decouple from the normal unlock path.

Since we never have an intermediate state where a subset of buffers
are pinned (ie. we never bail out of the pin or unpin loops) we can
replace the bo state flag bit with a global flag in the submit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/568335/
2023-12-10 10:23:13 -08:00
Matthew Brost
a6149f0393 drm/sched: Convert drm scheduler to use a work queue rather than kthread
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
  - (Boris) default to ordered work queue
v6:
  - (Luben / checkpatch) fix alignment in msm_ringbuffer.c
  - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
  - (Luben) Update comment for drm_sched_wqueue_enqueue
  - (Luben) Positive check for submit_wq in drm_sched_init
  - (Luben) s/alloc_submit_wq/own_submit_wq
v7:
  - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
  - (Luben) Adjust var names / comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2023-11-01 17:29:21 -04:00
Luben Tuikov
56e449603f drm/sched: Convert the GPU scheduler to variable number of run-queues
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.

Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
2023-10-26 12:03:47 -04:00
Rob Clark
abe2023b4c drm/msm/gpu: Push gpu lock down past runpm
Avoid holding gpu lock when calling runpm, to avoid this lockdep splat:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.4.3-debug+ #14 Not tainted
   ------------------------------------------------------
   ring0/373 is trying to acquire lock:
   ffffffead86efb98 (prepare_lock){+.+.}-{3:3}, at: clk_prepare_lock+0x70/0x98

   but task is already holding lock:
   ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm]

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #4 (&gpu->lock){+.+.}-{3:3}:
          __mutex_lock+0xc8/0x388
          mutex_lock_nested+0x2c/0x38
          msm_job_run+0x7c/0x128 [msm]
          drm_sched_main+0x264/0x354 [gpu_sched]
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   -> #3 (dma_fence_map){++++}-{0:0}:
          __dma_fence_might_wait+0x74/0xc0
          dma_resv_lockdep+0x1f0/0x2e8
          do_one_initcall+0xb4/0x214
          kernel_init_freeable+0x338/0x33c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
          fs_reclaim_acquire+0x7c/0x9c
          slab_pre_alloc_hook.constprop.0+0x40/0x250
          __kmem_cache_alloc_node+0x60/0x18c
          kmalloc_node_trace+0x40/0x84
          alloc_worker+0x2c/0x64
          init_rescuer+0x34/0xe0
          workqueue_init+0x168/0x1fc
          kernel_init_freeable+0x15c/0x33c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #1 (fs_reclaim){+.+.}-{0:0}:
          __fs_reclaim_acquire+0x3c/0x48
          fs_reclaim_acquire+0x50/0x9c
          slab_pre_alloc_hook.constprop.0+0x40/0x250
          __kmem_cache_alloc_node+0x60/0x18c
          kmalloc_trace+0x44/0x88
          clk_rcg2_dfs_determine_rate+0x60/0x214
          clk_core_determine_round_nolock+0xb8/0xf0
          clk_core_round_rate_nolock+0x84/0x118
          clk_core_round_rate_nolock+0xd8/0x118
          clk_round_rate+0x6c/0xd0
          geni_se_clk_tbl_get+0x78/0xc0
          geni_se_clk_freq_match+0x44/0xe4
          get_spi_clk_cfg+0x50/0xf4
          geni_spi_set_clock_and_bw+0x54/0x104
          spi_geni_prepare_message+0x130/0x174
          __spi_pump_transfer_message+0x200/0x4d8
          __spi_sync+0x13c/0x23c
          spi_sync_locked+0x18/0x24
          do_cros_ec_pkt_xfer_spi+0x124/0x3f0
          cros_ec_xfer_high_pri_work+0x28/0x3c
          kthread_worker_fn+0x14c/0x27c
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   -> #0 (prepare_lock){+.+.}-{3:3}:
          __lock_acquire+0xdf8/0x109c
          lock_acquire+0x234/0x284
          __mutex_lock+0xc8/0x388
          mutex_lock_nested+0x2c/0x38
          clk_prepare_lock+0x70/0x98
          clk_prepare+0x24/0x50
          clk_bulk_prepare+0x50/0x9c
          a6xx_gmu_resume+0x94/0x800 [msm]
          a6xx_gmu_pm_resume+0x38/0x158 [msm]
          adreno_runtime_resume+0x2c/0x38 [msm]
          pm_generic_runtime_resume+0x30/0x44
          __rpm_callback+0x4c/0x134
          rpm_callback+0x78/0x7c
          rpm_resume+0x3a4/0x46c
          __pm_runtime_resume+0x78/0xbc
          pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
          msm_gpu_submit+0x4c/0x12c [msm]
          msm_job_run+0x88/0x128 [msm]
          drm_sched_main+0x264/0x354 [gpu_sched]
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   other info that might help us debug this:
   Chain exists of:
     prepare_lock --> dma_fence_map --> &gpu->lock
    Possible unsafe locking scenario:
          CPU0                    CPU1
          ----                    ----
     lock(&gpu->lock);
                                  lock(dma_fence_map);
                                  lock(&gpu->lock);
     lock(prepare_lock);

    *** DEADLOCK ***
   2 locks held by ring0/373:
    #0: ffffffead875ae50 (dma_fence_map){++++}-{0:0}, at: drm_sched_main+0x54/0x354 [gpu_sched]
    #1: ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm]

   stack backtrace:
   CPU: 2 PID: 373 Comm: ring0 Not tainted 6.4.3-debug+ #14
   Hardware name: Google Villager (rev1+) with LTE (DT)
   Call trace:
    dump_backtrace+0xb4/0xf0
    show_stack+0x20/0x30
    dump_stack_lvl+0x60/0x84
    dump_stack+0x18/0x24
    print_circular_bug+0x1cc/0x234
    check_noncircular+0x78/0xac
    __lock_acquire+0xdf8/0x109c
    lock_acquire+0x234/0x284
    __mutex_lock+0xc8/0x388
    mutex_lock_nested+0x2c/0x38
    clk_prepare_lock+0x70/0x98
    clk_prepare+0x24/0x50
    clk_bulk_prepare+0x50/0x9c
    a6xx_gmu_resume+0x94/0x800 [msm]
    a6xx_gmu_pm_resume+0x38/0x158 [msm]
    adreno_runtime_resume+0x2c/0x38 [msm]
    pm_generic_runtime_resume+0x30/0x44
    __rpm_callback+0x4c/0x134
    rpm_callback+0x78/0x7c
    rpm_resume+0x3a4/0x46c
    __pm_runtime_resume+0x78/0xbc
    pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
    msm_gpu_submit+0x4c/0x12c [msm]
    msm_job_run+0x88/0x128 [msm]
    drm_sched_main+0x264/0x354 [gpu_sched]
    kthread+0xf0/0x100
    ret_from_fork+0x10/0x20

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/552298/
2023-08-15 10:09:30 -07:00
Rob Clark
7391c282ba drm/msm: Remove vma use tracking
This was not strictly necessary, as page unpinning (ie. shrinker) only
cares about the resv.  It did give us some extra sanity checking for
userspace controlled iova, and was useful to catch issues on kernel and
userspace side when enabling userspace iova.  But if userspace screws
this up, it just corrupts it's own gpu buffers and/or gets iova faults.
So we can just let userspace shoot it's own foot and drop the extra per-
buffer SUBMIT overhead.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Patchwork: https://patchwork.freedesktop.org/patch/551023/
2023-08-10 13:08:03 -07:00
Rob Clark
6ba5daa5d5 drm/msm: Use drm_gem_object in submit bos table
Basically everywhere wants the base ptr type.  So store that instead of
msm_gem_object.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/551021/
2023-08-10 10:44:02 -07:00
Rob Clark
1a8b612ef0 drm/msm: Take lru lock once per job_run
Rather than acquiring it and dropping it for each individual obj.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/551019/
2023-08-10 10:44:01 -07:00
Rob Clark
17b704f1c0 drm/msm/gem: Avoid obj lock in job_run()
Now that everything that controls which LRU an obj lives in *except* the
backing pages is protected by the LRU lock, add a special path to unpin
in the job_run() path, where we are assured that we already have backing
pages and will not be racing against eviction (because the GEM object's
dma_resv contains the fence that will be signaled when the submit/job
completes).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527845/
Link: https://lore.kernel.org/r/20230320144356.803762-10-robdclark@gmail.com
2023-03-25 16:31:44 -07:00
Rob Clark
b14b8c5f0e drm/msm: Decouple vma tracking from obj lock
We need to use the inuse count to track that a BO is pinned until
we have the hw_fence.  But we want to remove the obj lock from the
job_run() path as this could deadlock against reclaim/shrinker
(because it is blocking the hw_fence from eventually being signaled).
So split that tracking out into a per-vma lock with narrower scope.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527839/
Link: https://lore.kernel.org/r/20230320144356.803762-5-robdclark@gmail.com
2023-03-25 16:31:44 -07:00
Rob Clark
fc2f07566a drm/msm/gem: Tidy up VMA API
Stop open coding VMA construction, which will be needed in the next
commit.  And since the VMA already has a ptr to the adress space, stop
passing that around everywhere.  (Also, an aspace always has an mmu so
we can drop a couple pointless NULL checks.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527833/
Link: https://lore.kernel.org/r/20230320144356.803762-4-robdclark@gmail.com
2023-03-25 16:31:44 -07:00
Rob Clark
769fec1e4f drm/msm: Move submit bo flags update from obj lock
The flags are only accessed (1) when submit is constructed, before
enqueuing to gpu sched (ie. when still visible to only the task calling
the submit ioctl), (2) here, where we own a reference to the submit and
are serialized on the gpu sched thread, and (3) after the submit is
retired and last reference is dropped, which is serialized on the
submit's reference count.  Hence locking is unneeded here.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527830/
Link: https://lore.kernel.org/r/20230320144356.803762-3-robdclark@gmail.com
2023-03-25 16:31:43 -07:00
Rob Clark
f94e6a51e1 drm/msm: Pre-allocate hw_fence
Avoid allocating memory in job_run() by pre-allocating the hw_fence.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527832/
Link: https://lore.kernel.org/r/20230320144356.803762-2-robdclark@gmail.com
2023-03-25 16:31:43 -07:00
Rob Clark
084b9e1732 drm/msm/gem: Unpin objects slightly later
The introduction of "drm/msm/gem: Evict active GEM objects when necessary"
exposes a problem with "drm/msm/gem: Unpin buffers earlier", in that we
need to keep the object pinned in the time the submit is queued up in the
gpu scheduler.  Otherwise the shrinker will see it as a thing that can be
evicted if we wait for it to be signaled.  But if the shrinker path is
waiting on it with the obj lock held, the job cannot be scheduled, as that
also requires briefly grabbing the obj lock, leading to deadlock.  (Not to
mention, we don't want the shrinker to evict an obj queued up in gpu
scheduler.)

Fixes: f371bcc0c2 ("drm/msm/gem: Unpin buffers earlier")
Fixes: 025d27239a ("drm/msm/gem: Evict active GEM objects when necessary")
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/19
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: Chia-I Wu <olvaffe@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/504528/
Link: https://lore.kernel.org/r/20220923224043.2449152-1-robdclark@gmail.com
2022-09-30 09:01:33 -07:00
Akhil P Oommen
125e03b2b2 drm/msm: Remove unnecessary pm_runtime_get/put
We already enable gpu power from msm_gpu_submit(), so avoid a duplicate
pm_runtime_get/put from msm_job_run().

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/498390/
Link: https://lore.kernel.org/r/20220819015030.v5.1.Icf1e8f0c9b3e7e9933c3b48c70477d0582f3243f@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-08-28 09:29:27 -07:00
Rob Clark
311e03c29c drm/msm/gem: Separate object and vma unpin
Previously the BO_PINNED state in the submit was tracking two related
but different things: (1) that the buffer object was pinned, and (2)
that the vma (mapping within a set of pagetables) was pinned.  But with
fenced vma unpin (needed so that userspace couldn't race with retire
path for releasing a vma) these two were decoupled.  The fact that the
BO_PINNED flag was already cleared meant that we leaked the bo pin count
which should have been dropped when the submit was retired.

So split this state into BO_OBJ_PINNED and BO_VMA_PINNED, so they can be
dropped independently.

Fixes: 95d1deb02a ("drm/msm/gem: Add fenced vma unpin")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/487559/
Link: https://lore.kernel.org/r/20220527172341.2151005-1-robdclark@gmail.com
2022-06-15 13:06:54 -07:00
Tom Rix
500ca2a10f drm/msm: change msm_sched_ops from global to static
Smatch reports this issue
msm_ringbuffer.c:43:36: warning: symbol 'msm_sched_ops' was not declared. Should it be static?

msm_sched_ops is only used in msm_ringbuffer.c so change its
storage-class specifier to static.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/482883/
Link: https://lore.kernel.org/r/20220421131507.1557667-1-trix@redhat.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2022-04-27 10:50:22 +03:00
Rob Clark
95d1deb02a drm/msm/gem: Add fenced vma unpin
With userspace allocated iova (next patch), we can have a race condition
where userspace observes the fence completion and deletes the vma before
retire_submit() gets around to unpinning the vma.  To handle this, add a
fenced unpin which drops the refcount but tracks the fence, and update
msm_gem_vma_inuse() to check any previously unsignaled fences.

v2: Fix inuse underflow (duplicate unpin)
v3: Fix msm_job_run() vs submit_cleanup() race condition

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20220411215849.297838-10-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21 15:03:12 -07:00
Jiawei Gu
8ab62eda17 drm/sched: Add device pointer to drm_gpu_scheduler
Add device pointer so scheduler's printing can use
DRM_DEV_ERROR() instead, which makes life easier under multiple GPU
scenario.

v2: amend all calls of drm_sched_init()
v3: fill dev pointer for all drm_sched_init() calls

Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220221095705.5290-1-Jiawei.Gu@amd.com
2022-02-23 10:04:14 +01:00
Rob Clark
c28e2f2b41 drm/msm: Remove struct_mutex usage
The remaining struct_mutex usage is just to serialize various gpu
related things (submit/retire/recover/fault/etc), so replace
struct_mutex with gpu->lock.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211109181117.591148-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-28 09:50:33 -08:00
Daniel Vetter
80bcfbd376 drm/msm: Use scheduler dependency handling
drm_sched_job_init is already at the right place, so this boils down
to deleting code.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-13-daniel.vetter@ffwll.ch
2021-08-30 10:59:31 +02:00
Dave Airlie
f1b7996551 Merge tag 'drm-msm-next-2021-07-28' of https://gitlab.freedesktop.org/drm/msm into drm-next
An early pull for v5.15 (there'll be more coming in a week or two),
consisting of the drm/scheduler conversion and a couple other small
series that one was based one.  Mostly sending this now because IIUC
danvet wanted it in drm-next so he could rebase on it.  (Daniel, if
you disagree then speak up, and I'll instead include this in the main
pull request once that is ready.)

This also has a core patch to drop drm_gem_object_put_locked() now
that the last use of it is removed.

[airlied: add NULL to drm_sched_init]

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGumRk7H88bqV=H9Fb1SM0zPBo5B7NsCU3jFFKBYxf5k+Q@mail.gmail.com
2021-07-30 16:24:01 +10:00
Rob Clark
1d8a5ca436 drm/msm: Conversion to drm scheduler
For existing adrenos, there is one or more ringbuffer, depending on
whether preemption is supported.  When preemption is supported, each
ringbuffer has it's own priority.  A submitqueue (which maps to a
gl context or vk queue in userspace) is mapped to a specific ring-
buffer at creation time, based on the submitqueue's priority.

Each ringbuffer has it's own drm_gpu_scheduler.  Each submitqueue
maps to a drm_sched_entity.  And each submit maps to a drm_sched_job.

Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/4
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-10-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-07-28 09:19:00 -07:00
Rob Clark
030af2b05a drm/msm: drop drm_gem_object_put_locked()
No idea why we were still using this.  It certainly hasn't been needed
for some time.  So drop the pointless twin codepaths.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-07-27 18:09:18 -07:00
Rob Clark
375f9a63a6 drm/msm: Docs and misc cleanup
Fix a couple incorrect or misspelt comments, and add submitqueue doc
comment.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-07-27 18:09:17 -07:00
Rob Clark
da3d378dec drm/msm: Let fences read directly from memptrs
Let dma_fence::signaled, etc, read directly from the address that the hw
is writing with updated completed fence seqno, so we can potentially
notice that the fence is signaled sooner.

Plus add some docs.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20210726144359.2179302-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-07-27 17:53:51 -07:00
Rob Clark
77d205290a drm/msm: Protect ring->submits with it's own lock
One less place to rely on dev->struct_mutex.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-11-04 16:00:56 -08:00
Rob Clark
77c406038e drm/msm: Document and rename preempt_lock
Before adding another lock, give ring->lock a more descriptive name.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-11-04 16:00:56 -08:00
Jordan Crouse
604234f336 drm/msm: Enable expanded apriv support for a650
a650 supports expanded apriv support that allows us to map critical buffers
(ringbuffer and memstore) as as privileged to protect them from corruption.

Cc: stable@vger.kernel.org
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-09-04 12:14:07 -07:00
Rob Clark
352c83fb39 drm/msm/gpu: make ringbuffer readonly
The GPU has no business writing into the ringbuffer, let's make it
readonly to the GPU.

Fixes: 7198e6b031 ("drm/msm: add a3xx gpu support")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-08-17 12:25:22 -07:00
Thomas Gleixner
caab277b1d treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not see http www gnu org
  licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:07 +02:00
Jordan Crouse
84c6127580 drm/msm/gpu: Map the ringbuffer in the iova at create time
For reasons that I'm sure made perfect sense at the time we were
opting to defer the iova alloc / pin on the ringbuffer until HW
init time so when we moved to iova reference counting we ended
up adding a reference count every time the hardware started.
Not that it mattered (because the ring is always around) but
it did make the debug output look odd. Allocate and pin the iova
at create time instead.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-11 13:07:03 -05:00
Jordan Crouse
0815d7749a drm/msm: Add a name field for gem objects
For debugging purposes it is useful to assign descriptions
to buffers so that we know what they are used for. Add
a field to the buffer object and use that to name the various
kernel side allocations which ends up looking like like this
in /d/dri/X/gem:

   flags       id ref  offset   kaddr            size     madv      name
   00040000: I  0 ( 1) 00000000 0000000070b79eca 00004096           memptrs
      vmas: [gpu: 01000000,mapped,inuse=1]
   00020000: I  0 ( 1) 00000000 0000000031ed4074 00032768           ring0

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-11 13:06:59 -05:00
Jordan Crouse
1e29dff004 drm/msm: Add a common function to free kernel buffer objects
Buffer objects allocated with msm_gem_kernel_new() are mostly
freed the same way so we can save a few lines of code with a
common function.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-11 13:05:30 -05:00
Steve Kowalik
dc9a9b3205 drm/msm: Replace gem_object deprecated functions
drm_gem_object_{reference,unreference,unreference_unlocked} are
deprecated functions, and merely alias to the get/put functions.
Switch to the new names.

Signed-off-by: Steve Kowalik <steven@wedontsleep.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-20 10:41:21 -05:00
Jordan Crouse
b1fc2839d2 drm/msm: Implement preemption for A5XX targets
Implement preemption for A5XX targets - this allows multiple
ringbuffers for different priorities with automatic preemption
of a lower priority ringbuffer if a higher one is ready.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:38 -04:00
Jordan Crouse
4c7085a5d5 drm/msm: Shadow current pointer in the ring until command is complete
Add a shadow pointer to track the current command being written into
the ring. Don't commit it as 'cur' until the command is submitted.
Because 'cur' is used to construct the software copy of the wptr this
ensures that somebody peeking in on the ring doesn't assume that a
command is inflight while it is being written. This isn't a huge deal
with a single ring (though technically the hangcheck could assume
the system is prematurely busy when it isn't) but it will be rather
important for preemption where the decision to preempt is based
on a non-empty ringbuffer. Without a shadow an aggressive preemption
scheme could assume that the ringbuffer is non empty and switch to it
before the CPU is done writing the command and boom.

Even though preemption won't be supported for all targets because of
the way the code is organized it is simpler to make this generic for
all targets. The extra load for non-preemption targets should be
minimal.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:37 -04:00
Jordan Crouse
f97decac5f drm/msm: Support multiple ringbuffers
Add the infrastructure to support the idea of multiple ringbuffers.
Assign each ringbuffer an id and use that as an index for the various
ring specific operations.

The biggest delta is to support legacy fences. Each fence gets its own
sequence number but the legacy functions expect to use a unique integer.
To handle this we return a unique identifier for each submission but
map it to a specific ring/sequence under the covers. Newer users use
a dma_fence pointer anyway so they don't care about the actual sequence
ID or ring.

The actual mechanics for multiple ringbuffers are very target specific
so this code just allows for the possibility but still only defines
one ringbuffer for each target family.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:36 -04:00
Jordan Crouse
8223286d62 drm/msm: Add a helper function for in-kernel buffer allocations
Nearly all of the buffer allocations for kernel allocate an buffer object,
virtual address and GPU iova at the same time. Make a helper function to
handle the details.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[dropped msm_fbdev conversion to new helper, since it interferes with
display-handover work, where we want to separate allocation and mapping]
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-08-22 13:19:17 -04:00
Sushmita Susheelendra
0e08270a1f drm/msm: Separate locking of buffer resources from struct_mutex
Buffer object specific resources like pages, domains, sg list
need not be protected with struct_mutex. They can be protected
with a buffer object level lock. This simplifies locking and
makes it easier to avoid potential recursive locking scenarios
for SVM involving mmap_sem and struct_mutex. This also removes
unnecessary serialization when creating buffer objects, and also
between buffer object creation and GPU command submission.

Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org>
[robclark: squash in handling new locking for shrinker]
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-17 08:03:07 -04:00
Jordan Crouse
88b333b0ed drm/msm: Ensure that the hardware write pointer is valid
Currently the value written to CP_RB_WPTR is calculated on the fly as
(rb->next - rb->start). But as the code is designed rb->next is wrapped
before writing the commands so if a series of commands happened to
fit perfectly in the ringbuffer, rb->next would end up being equal to
rb->size / 4 and thus result in an out of bounds address to CP_RB_WPTR.

The easiest way to fix this is to mask WPTR when writing it to the
hardware; it makes the hardware happy and the rest of the ringbuffer
math appears to work and there isn't any point in upsetting anything.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[squash in is_power_of_2() check]
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-29 15:02:58 -05:00
Rob Clark
18f23049f6 drm/msm: change gem->vmap() to get/put
Before we can add vmap shrinking, we really need to know which vmap'ings
are currently being used.  So switch to get/put interface.  Stubbed put
fxns for now.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-16 10:09:07 -04:00
Rob Clark
69a834c28f drm/msm: deal with exhausted vmap space better
Some, but not all, callers of obj->vmap() would check if return
IS_ERR().  So let's actually return an error if vmap() fails.  And fixup
the call-sites that were not handling this properly.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-06-04 14:45:48 -04:00
Rob Clark
774449ebcb drm/msm: fix locking inconsistencies in gpu->destroy()
In error paths, this was being called without struct_mutex held.
Leading to panics like:

  msm 1a00000.qcom,mdss_mdp: No memory protection without IOMMU
  Kernel panic - not syncing: BUG!
  CPU: 0 PID: 1409 Comm: cat Not tainted 4.0.0-dirty #4
  Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
  Call trace:
  [<ffffffc000089c78>] dump_backtrace+0x0/0x118
  [<ffffffc000089da0>] show_stack+0x10/0x20
  [<ffffffc0006686d4>] dump_stack+0x84/0xc4
  [<ffffffc0006678b4>] panic+0xd0/0x210
  [<ffffffc0003e1ce4>] drm_gem_object_free+0x5c/0x60
  [<ffffffc000402870>] adreno_gpu_cleanup+0x60/0x80
  [<ffffffc0004035a0>] a3xx_destroy+0x20/0x70
  [<ffffffc0004036f4>] a3xx_gpu_init+0x84/0x108
  [<ffffffc0004018b8>] adreno_load_gpu+0x58/0x190
  [<ffffffc000419dac>] msm_open+0x74/0x88
  [<ffffffc0003e0a48>] drm_open+0x168/0x400
  [<ffffffc0003e7210>] drm_stub_open+0xa8/0x118
  [<ffffffc0001a0e84>] chrdev_open+0x94/0x198
  [<ffffffc000199f88>] do_dentry_open+0x208/0x310
  [<ffffffc00019a4c4>] vfs_open+0x44/0x50
  [<ffffffc0001aa26c>] do_last.isra.14+0x2c4/0xc10
  [<ffffffc0001aac38>] path_openat+0x80/0x5e8
  [<ffffffc0001ac354>] do_filp_open+0x2c/0x98
  [<ffffffc00019b60c>] do_sys_open+0x13c/0x228
  [<ffffffc00019b72c>] SyS_openat+0xc/0x18
  CPU1: stopping

But there isn't any particularly good reason to hold struct_mutex for
teardown, so just standardize on calling it without the mutex held and
use the _unlocked() versions for GEM obj unref'ing

Signed-off-by: Rob Clark <robdclark@gmail.com>
2015-05-15 09:28:27 -04:00
Rob Clark
7198e6b031 drm/msm: add a3xx gpu support
Add initial support for a3xx 3d core.

So far, with hardware that I've seen to date, we can have:
 + zero, one, or two z180 2d cores
 + a3xx or a2xx 3d core, which share a common CP (the firmware
   for the CP seems to implement some different PM4 packet types
   but the basics of cmdstream submission are the same)

Which means that the eventual complete "class" hierarchy, once
support for all past and present hw is in place, becomes:
 + msm_gpu
   + adreno_gpu
     + a3xx_gpu
     + a2xx_gpu
   + z180_gpu

This commit splits out the parts that will eventually be common
between a2xx/a3xx into adreno_gpu, and the parts that are even
common to z180 into msm_gpu.

Note that there is no cmdstream validation required.  All memory access
from the GPU is via IOMMU/MMU.  So as long as you don't map silly things
to the GPU, there isn't much damage that the GPU can do.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-08-24 14:57:18 -04:00