1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

75 commits

Author SHA1 Message Date
Andi Shyti
be5bcc4be9 drm/i915/guc: Create the guc_to_i915() wrapper
Given a reference to "guc", the guc_to_i915() returns the
pointer to "i915" private data.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231206184322.57111-1-andi.shyti@linux.intel.com
2023-12-08 12:31:01 +01:00
John Harrison
706785c19f drm/i915/guc: Add a selftest for FAST_REQUEST errors
There is a mechanism for reporting errors from fire and forget H2G
messages. This is the only way to find out about almost any error in
the GuC backend submission path. So it would be useful to know that it
is working.

v2: Fix some dumb over-complications and a couple of typos - review
feedback from Daniele.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231114010016.234570-3-John.C.Harrison@Intel.com
2023-11-30 13:50:50 -08:00
Prathap Kumar Valsan
af58ee2276 drm/i915: Define and use GuC and CTB TLB invalidation routines
The GuC firmware had defined the interface for Translation Look-Aside
Buffer (TLB) invalidation.  We should use this interface when
invalidating the engine and GuC TLBs.
Add additional functionality to intel_gt_invalidate_tlb, invalidating
the GuC TLBs and falling back to GT invalidation when the GuC is
disabled.
The invalidation is done by sending a request directly to the GuC
tlb_lookup that invalidates the table.  The invalidation is submitted as
a wait request and is performed in the CT event handler.  This means we
cannot perform this TLB invalidation path if the CT is not enabled.
If the request isn't fulfilled in two seconds, this would constitute
an error in the invalidation as that would constitute either a lost
request or a severe GuC overload.

With this new invalidation routine, we can perform GuC-based GGTT
invalidations.  GuC-based GGTT invalidation is incompatible with
MMIO invalidation so we should not perform MMIO invalidation when
GuC-based GGTT invalidation is expected.

The additional complexity incurred in this patch will be necessary for
range-based tlb invalidations, which will be platformed in the future.

Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
CC: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-4-jonathan.cavitt@intel.com
2023-10-18 06:01:09 +02:00
Jonathan Cavitt
ff0dac080a drm/i915/guc: Add CT size delay helper
As of now, there is no mechanism for tracking a given request's
progress through the queue.  Instead, add a helper that returns
an estimated maximum time the queue should take to drain if
completely full.

Suggested-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-3-jonathan.cavitt@intel.com
2023-10-18 06:01:09 +02:00
Kees Cook
3e78f77121 drm/i915/guc: Annotate struct ct_incoming_msg with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct ct_incoming_msg.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-hardening@vger.kernel.org
Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231006201744.work.135-kees@kernel.org
2023-10-09 13:59:18 +02:00
John Harrison
b2edc4148a drm/i915/guc: Force a reset on internal GuC error
If GuC hits an internal error (and survives long enough to report it
to the KMD), it is basically toast and will stop until a GT reset and
subsequent GuC reload is performed. Previously, the KMD just printed
an error message and then waited for the heartbeat to eventually kick
in and trigger a reset (assuming the heartbeat had not been disabled).
Instead, force the reset immediately to guarantee that it happens and
to eliminate the very long heartbeat delay. The captured error state
is also more likely to be useful if captured at the time of the error
rather than many seconds later.

Note that it is not possible to trigger a reset from with the G2H
handler itself. The reset prepare process involves flushing
outstanding G2H contents. So a deadlock could result. Instead, the G2H
handler queues a worker thread to do the reset asynchronously.

v2: Flush the worker on suspend and shutdown. Add rate limiting to
prevent spam from a totally dead system (review feedback from Daniele).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230816003957.3572654-1-John.C.Harrison@Intel.com
2023-08-22 11:38:47 -07:00
Jonathan Cavitt
f1530f912e drm/i915/gt: Apply workaround 22016122933 correctly
WA_22016122933 was recently applied to all MeteorLake engines, which is
simultaneously too broad (should only apply to Media engines) and too
specific (should apply to all platforms that use the same media engine
as MeteorLake).  Correct this in cases where coherency settings are
modified.

There were also two additional places where the workaround was applied
unconditionally.  The change was confirmed as necessary for all
platforms, so the workaround label was removed.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230801153242.2445478-4-jonathan.cavitt@intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20230807121957.598420-4-andi.shyti@linux.intel.com
2023-08-10 14:14:13 +02:00
Michal Wajdeczko
84596e1ab0 drm/i915/guc: Drop legacy CTB definitions
We've already switched to new HXG definitions some time ago,
drop legacy CTB definitions to avoid mistakes.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509201103.538-1-michal.wajdeczko@intel.com
2023-05-31 12:07:07 -07:00
Michal Wajdeczko
a5606b94cd drm/i915/guc: Track all sent actions to GuC
For easier debug of any unexpected error responses from GuC that
might be related to non-blocking fast requests, track action code (and
stack if under DEBUG_GUC config) for every H2G request.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230526235538.2230780-4-John.C.Harrison@Intel.com
2023-05-30 15:18:21 -07:00
Michal Wajdeczko
d9911020ca drm/i915/guc: Update log for unsolicited CTB response
Instead of printing message fence twice, include HXG header of the
unexpected message and its len.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230526235538.2230780-3-John.C.Harrison@Intel.com
2023-05-30 15:18:20 -07:00
Michal Wajdeczko
edfd93e60b drm/i915/guc: Use FAST_REQUEST for non-blocking H2G calls
In addition to the already defined REQUEST HXG message format,
which is used when sender expects some confirmation or data,
HXG protocol includes definition of the FAST REQUEST message,
that may be used when sender does not expect any useful data
to be returned.

Using this instead of GUC_HXG_TYPE_EVENT for non-blocking CTB requests
will allow GuC to send back GUC_HXG_TYPE_RESPONSE_FAILURE in case of
errors.

Note that it is not possible to return such errors to the caller,
since this is for non-blocking calls and the related fence is not
stored. Instead such messages are treated as unexpected, which will
give an indication of potential GuC misprogramming that warrants extra
debugging effort.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230526235538.2230780-2-John.C.Harrison@Intel.com
2023-05-30 15:18:19 -07:00
John Harrison
f6eeea8d70 drm/i915/guc: Dump error capture to dmesg on CTB error
In the past, There have been sporadic CTB failures which proved hard
to reproduce manually. The most effective solution was to dump the GuC
log at the point of failure and let the CI system do the repro. It is
preferable not to dump the GuC log via dmesg for all issues as it is
not always necessary and is not helpful for end users. But rather than
trying to re-invent the code to do this each time it is wanted, commit
the code but for DEBUG_GUC builds only.

v2: Use IS_ENABLED for testing config options.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230418181744.3251240-3-John.C.Harrison@Intel.com
2023-05-16 12:26:48 -07:00
Fei Yang
a161b6dba6 drm/i915/mtl: workaround coherency issue for Media
This patch implements Wa_22016122933.

In MTL, memory writes initiated by the Media tile update the whole
cache line, even for partial writes. This creates a coherency
problem for cacheable memory if both CPU and GPU are writing data
to different locations within a single cache line.
This patch circumvents the issue by making CPU/GPU shared memory
uncacheable (WC on CPU side, and PAT index 2 for GPU).  Additionally,
it ensures that CPU writes are visible to the GPU with an
intel_guc_write_barrier().

While fixing the CTB issue, we noticed some random GSC firmware
loading failure because the share buffers are cacheable (WB) on CPU
side but uncached on GPU side. To fix these issues we need to map
such shared buffers as WC on CPU side. Since such allocations are
not all done through GuC allocator, to avoid too many code changes,
the i915_coherent_map_type() is now hard coded to return WC for MTL.

v2: Simplify the commit message(Matt).

BSpec: 45101

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230424182902.3663500-3-fei.yang@intel.com
2023-04-25 09:23:31 +02:00
Michal Wajdeczko
7388acb253 drm/i915/guc: Update GuC messages in intel_guc_ct.c
Use new macros to have common prefix that also include GT#.

v2: drop unused helpers

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-5-michal.wajdeczko@intel.com
2023-01-31 12:20:05 +00:00
John Harrison
dd9d3cbe9e drm/i915/guc: Don't abort on CTB_UNUSED status
When the KMD sends a CLIENT_RESET request to GuC (as part of the
suspend sequence), GuC will mark the CTB buffer as 'UNUSED'. If the
KMD then checked the CTB queue, it would see a non-zero status value
and report the buffer as corrupted.

Technically, no G2H messages should be received once the CLIENT_RESET
has been sent. However, if a context was outstanding on an engine then
it would get reset and a reset notification would be sent. So, don't
actually treat UNUSED as a catastrophic error. Just flag it up as
unexpected and keep going.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-7-John.C.Harrison@Intel.com
2022-07-29 10:35:59 -07:00
Zhanjun Dong
22645976ae drm/i915/guc: Check for ct enabled while waiting for response
We are seeing error message of "No response for request". Some cases
happened while waiting for response and reset/suspend action was triggered.
In this case, no response is not an error, active requests will be
cancelled.

This patch will handle this condition and change the error message into
debug message.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220715211313.143645-1-zhanjun.dong@intel.com
2022-07-28 11:57:34 -07:00
Lucas De Marchi
ff9fbe7ce1 drm/i915: Use str_enabled_disabled()
Remove the local enableddisabled() implementation and adopt the
str_enabled_disabled() from linux/string_helpers.h.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225234631.3725943-3-lucas.demarchi@intel.com
2022-03-02 08:48:20 -08:00
Lucas De Marchi
707c3a7d99 drm/i915: Use str_enable_disable()
Remove the local enabledisable() implementation and adopt the
str_enable_disable() from linux/string_helpers.h.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225234631.3725943-2-lucas.demarchi@intel.com
2022-03-02 08:48:18 -08:00
Gustavo A. R. Silva
cec49bce6e drm/i915/guc: Use struct_size() helper in kmalloc()
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows that,
in the worst scenario, could lead to heap overflows.

Also, address the following sparse warnings:
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:792:23: warning: using sizeof on a flexible structure

Link: https://github.com/KSPP/linux/issues/174
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220125180726.GA68646@embeddedor
2022-01-26 10:19:08 +00:00
John Harrison
77b6f79df6 drm/i915/guc: Update to GuC version 69.0.3
Update to the latest GuC release.

The latest GuC firmware introduces a number of interface changes:

GuC may return NO_RESPONSE_RETRY message for requests sent over CTB.
Add support for this reply and try resending the request again as a
new CTB message.

A KLV (key-length-value) mechanism is now used for passing
configuration data such as CTB management.

With the new KLV scheme, the old CTB management actions are no longer
used and are removed.

Register capture on hang is now supported by GuC. Full i915 support
for this will be added by a later patch. A minimum support of
providing capture memory and register lists is required though, so add
that in.

The device id of the current platform needs to be provided at init time.

The 'poll CS' w/a (Wa_22012773006) was blanket enabled by previous
versions of GuC. It must now be explicitly requested by the KMD. So,
add in the code to turn it on when relevant.

The GuC log entry format has changed. This requires adding a new field
to the log header structure to mark the wrap point at the end of the
buffer (as the buffer size is no longer a multiple of the log entry
size).

New CTB notification messages are now sent for some things that were
previously only sent via MMIO notifications.

Of these, the crash dump notification was not really being handled by
i915. It called the log flush code but that only flushed the regular
debug log and then only if relay logging was enabled. So just report
an error message instead.

The 'exception' notification was just being ignored completely. So add
an error message for that as well.

Note that in either the crash dump or the exception case, the GuC is
basically dead. The KMD will detect this via the heartbeat and trigger
both an error log (which will include the crash dump as part of the
GuC log) and a GT reset. So no other processing is really required.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220107000622.292081-3-John.C.Harrison@Intel.com
2022-01-11 11:47:36 -08:00
Matthew Brost
2aa9f833dd drm/i915/guc: Kick G2H tasklet if no credits
Let's be paranoid and kick the G2H tasklet, which dequeues messages, if
G2H credits are exhausted.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-7-matthew.brost@intel.com
2021-12-15 19:10:51 -08:00
Matthew Brost
6e94d53962 drm/i915/guc: Add extra debug on CT deadlock
Print CT state (H2G + G2H head / tail pointers, credits) on CT
deadlock.

v2:
 (John Harrison)
  - Add units to debug messages

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-6-matthew.brost@intel.com
2021-12-15 19:10:51 -08:00
Matthew Brost
6b540bf6f1 drm/i915/guc: Implement multi-lrc submission
Implement multi-lrc submission via a single workqueue entry and single
H2G. The workqueue entry contains an updated tail value for each
request, of all the contexts in the multi-lrc submission, and updates
these values simultaneously. As such, the tasklet and bypass path have
been updated to coalesce requests into a single submission.

v2:
 (John Harrison)
  - s/wqe/wqi
  - Use FIELD_PREP macros
  - Add GEM_BUG_ONs ensures length fits within field
  - Add comment / white space to intel_guc_write_barrier
 (Kernel test robot)
  - Make need_tasklet a static function
v3:
 (Docs)
  - A comment for submission_stall_reason
v4:
 (Kernel test robot)
  - Initialize return value in bypass tasklt submit function
 (John Harrison)
  - Add comment near work queue defs
  - Add BUILD_BUG_ON to ensure WQ_SIZE is a power of 2
  - Update write_barrier comment to talk about work queue
v5:
 (John Harrison)
  - Fix typo in work queue comment

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-13-matthew.brost@intel.com
2021-10-15 10:37:40 -07:00
Michal Wajdeczko
fb2d2de353 drm/i915/guc: Move and improve error message for missed CTB reply
If we timeout waiting for a CT reply we print very simple error
message. Improve that and by moving error reporting to the caller
we can use CT_ERROR instead of DRM_ERROR and report just fence
as error code will be reported later anyway.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-5-michal.wajdeczko@intel.com
2021-10-01 12:04:24 -07:00
Michal Wajdeczko
0e9deac513 drm/i915/guc: Print error name on CTB send failure
Instead of plain error value (%d) print more user friendly error
name (%pe).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-4-michal.wajdeczko@intel.com
2021-10-01 12:04:24 -07:00
Michal Wajdeczko
0de9765da5 drm/i915/guc: Print error name on CTB (de)registration failure
Instead of plain error value (%d) print more user friendly error
name (%pe).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-3-michal.wajdeczko@intel.com
2021-10-01 12:04:23 -07:00
Michal Wajdeczko
217ecd310d drm/i915/guc: Verify result from CTB (de)register action
In commit b839a869df ("drm/i915/guc: Add support for data
reporting in GuC responses") we missed the hypothetical case
that GuC might return positive non-zero value as success data.

While that would be lucky treated as error case, and at the
end will result in reporting valid -EIO, in the meantime this
value will be passed to ERR_PTR that could be misleading.

v2: rebased

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-2-michal.wajdeczko@intel.com
2021-10-01 12:04:23 -07:00
Matthew Brost
d67e3d5a5d drm/i915/guc: Process all G2H message at once in work queue
Rather than processing 1 G2H at a time and re-queuing the work queue if
more messages exist, process all the G2H in a single pass of the work
queue.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-6-matthew.brost@intel.com
2021-09-13 11:30:29 -07:00
John Harrison
d75dc57fee drm/i915/guc: Don't complain about reset races
It is impossible to seal all race conditions of resets occurring
concurrent to other operations. At least, not without introducing
excesive mutex locking. Instead, don't complain if it occurs. In
particular, don't complain if trying to send a H2G during a reset.
Whatever the H2G was about should get redone once the reset is over.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-17-matthew.brost@intel.com
2021-07-27 17:31:57 -07:00
Matthew Brost
f7957e603c drm/i915/guc: Handle engine reset failure notification
GuC will notify the driver, via G2H, if it fails to
reset an engine. We recover by resorting to a full GPU
reset.

v2:
 (John Harrison):
  - s/drm_dbg/drm_err

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-14-matthew.brost@intel.com
2021-07-27 17:31:52 -07:00
Matthew Brost
1e0fd2b5da drm/i915/guc: Handle context reset notification
GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

v2:
 (John Harrison)
  - Move msg[0] lookup after length check
v3:
 (John Harrison)
  - s/drm_dbg/drm_err

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-13-matthew.brost@intel.com
2021-07-27 17:31:50 -07:00
Matthew Brost
28ff6520a3 drm/i915/guc: Update GuC debugfs to support new GuC
Update GuC debugfs to support the new GuC structures.

v2:
 (John Harrison)
  - Remove intel_lrc_reg.h include from i915_debugfs.c
 (Michal)
  - Rename GuC debugfs functions

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-17-matthew.brost@intel.com
2021-07-22 10:07:27 -07:00
Matthew Brost
b97060a99b drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

v2: rtimeout -> remaining_timeout
v3: Drop unnecessary includes, guc_submission_busy_loop ->
guc_submission_send_busy_loop, drop negatie timeout trick, move a
refactor of guc_context_unpin to earlier path (John H)
v4: Add stddef.h back into intel_gt_requests.h, sort circuit idle
function if not in GuC submission mode

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-16-matthew.brost@intel.com
2021-07-22 10:07:23 -07:00
Matthew Brost
f4eb1f3fe9 drm/i915/guc: Ensure G2H response has space in buffer
Ensure G2H response has space in the buffer before sending H2G CTB as
the GuC can't handle any backpressure on the G2H interface.

v2:
 (Matthew)
  - s/INTEL_GUC_SEND/INTEL_GUC_CT_SEND
v3:
 (Matthew)
  - Add G2H credit accounting to blocking path, add g2h_release_space
    helper
 (John H)
  - CTB_G2H_BUFFER_SIZE / 4 == G2H_ROOM_BUFFER_SIZE

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-15-matthew.brost@intel.com
2021-07-22 10:07:21 -07:00
Matthew Brost
e0717063cc drm/i915/guc: Defer context unpin until scheduling is disabled
With GuC scheduling, it isn't safe to unpin a context while scheduling
is enabled for that context as the GuC may touch some of the pinned
state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is
done, a call back is added to intel_context_unpin when pin count == 1
to disable scheduling for that context. When the response CTB is
received it is safe to do the final unpin.

Future patches may add a heuristic / delay to schedule the disable
call back to avoid thrashing on schedule enable / disable.

v2:
 (John H)
  - s/drm_dbg/drm_err
 (Daneiel)
  - Clean up sched state function

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-9-matthew.brost@intel.com
2021-07-22 10:07:13 -07:00
Matthew Brost
3a4cdf1982 drm/i915/guc: Implement GuC context operations for new inteface
Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.

v2:
 (Daniel Vetter)
  - Use msleep_interruptible rather than cond_resched in busy loop
 (Michal)
  - Remove C++ style comment
v3:
 (Matthew Brost)
  - Drop GUC_ID_START
 (John Harrison)
  - Fix a bunch of typos
  - Use drm_err rather than drm_dbg for G2H errors
 (Daniele)
  - Fix ;; typo
  - Clean up sched state functions
  - Add lockdep for guc_id functions
  - Don't call __release_guc_id when guc_id is invalid
  - Use MISSING_CASE
  - Add comment in guc_context_pin
  - Use shorter path to rpm
 (Daniele / CI)
  - Don't call release_guc_id on an invalid guc_id in destroy
v4:
 (Daniel Vetter)
  - Add FIXME comment

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-7-matthew.brost@intel.com
2021-07-22 10:07:08 -07:00
John Harrison
3101e9952b drm/i915/guc: Module load failure test for CT buffer creation
Add several module failure load inject points in the CT buffer creation
code path.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-8-matthew.brost@intel.com
2021-07-13 13:50:06 -07:00
Matthew Brost
75452167a2 drm/i915/guc: Optimize CTB writes and reads
CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.

v2:
 (Michal)
  - Add additional sanity checks for head / tail pointers
  - Use GUC_CTB_HDR_LEN rather than magic 1
v3:
 (Michal / John H)
  - Drop redundant check of head value
v4:
 (John H)
  - Drop redundant checks of tail / head values
v5:
 (Michal)
  - Address more nits
v6:
 (Michal)
  - Add GEM_BUG_ON sanity check on ctb->space

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-7-matthew.brost@intel.com
2021-07-13 13:50:05 -07:00
Matthew Brost
b43b995048 drm/i915/guc: Add stall timer to non blocking CTB send function
Implement a stall timer which fails H2G CTBs once a period of time
with no forward progress is reached to prevent deadlock.

v2:
 (Michal)
  - Improve error message in ct_deadlock()
  - Set broken when ct_deadlock() returns true
  - Return -EPIPE on ct_deadlock()
v3:
 (Michal)
  - Add ms to stall timer comment
 (Matthew)
  - Move broken check to intel_guc_ct_send()

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-6-matthew.brost@intel.com
2021-07-13 13:50:03 -07:00
Matthew Brost
1681924d8b drm/i915/guc: Add non blocking CTB send function
Add non blocking CTB send function, intel_guc_send_nb. GuC submission
will send CTBs in the critical path and does not need to wait for these
CTBs to complete before moving on, hence the need for this new function.

The non-blocking CTB now must have a flow control mechanism to ensure
the buffer isn't overrun. A lazy spin wait is used as we believe the
flow control condition should be rare with a properly sized buffer.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

v2:
 (Michal)
  - Use define for H2G room calculations
  - Move INTEL_GUC_SEND_NB define
 (Daniel Vetter)
  - Use msleep_interruptible rather than cond_resched
v3:
 (Michal)
  - Move includes to following patch
  - s/INTEL_GUC_SEND_NB/INTEL_GUC_CT_SEND_NB/g
v4:
 (John H)
  - Update comment, add type local variable

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-5-matthew.brost@intel.com
2021-07-13 13:50:02 -07:00
Matthew Brost
c26e289f1d drm/i915/guc: Increase size of CTB buffers
With the introduction of non-blocking CTBs more than one CTB can be in
flight at a time. Increasing the size of the CTBs should reduce how
often software hits the case where no space is available in the CTB
buffer.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-4-matthew.brost@intel.com
2021-07-13 13:50:01 -07:00
Matthew Brost
dd9c0f3cbb drm/i915/guc: Improve error message for unsolicited CT response
Improve the error message when a unsolicited CT response is received by
printing fence that couldn't be found, the last fence, and all requests
with a response outstanding.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-3-matthew.brost@intel.com
2021-07-13 13:50:00 -07:00
Matthew Brost
1ccf7294b7 drm/i915/guc: Relax CTB response timeout
In upcoming patch we will allow more CTB requests to be sent in
parallel to the GuC for processing, so we shouldn't assume any more
that GuC will always reply without 10ms.

Use bigger value hardcoded value of 1s instead.

v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
v3:
 (Daniel Vetter)
  - Use hardcoded value of 1s rather than config option
v4:
 (Michal)
  - Use defines for timeout values

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-2-matthew.brost@intel.com
2021-07-13 13:49:58 -07:00
Michal Wajdeczko
572f2a5cd9 drm/i915/guc: Update firmware to v62.0.0
Most of the changes to the 62.0.0 firmware revolved around CTB
communication channel. Conform to the new (stable) CTB protocol.

v2:
 (Michal)
  Add values back to kernel DOC for actions
 (Docs)
  Add 'CT buffer' back in to fix warning

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
[mattrope: Tweaked kerneldoc while pushing as suggested by Daniele/Michal]
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616001302.84233-3-matthew.brost@intel.com
2021-06-18 15:31:01 -07:00
Michal Wajdeczko
8d99e09c5d drm/i915/guc: Always copy CT message to new allocation
Since most of future CT traffic will be based on G2H requests,
instead of copying incoming CT message to static buffer and then
create new allocation for such request, always copy incoming CT
message to new allocation. Also by doing it while reading CT
header, we can safely fallback if that atomic allocation fails.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210603051630.2635-19-matthew.brost@intel.com
2021-06-04 10:42:15 +02:00
Michal Wajdeczko
65dd4ed0f4 drm/i915/guc: Don't receive all G2H messages in irq handler
In irq handler try to receive just single G2H message, let other
messages to be received from tasklet.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210603051630.2635-18-matthew.brost@intel.com
2021-06-04 10:41:51 +02:00
Michal Wajdeczko
2e496ac200 drm/i915/guc: Stop using mutex while sending CTB messages
We are no longer using descriptor to hold G2H replies and we are
protecting access to the descriptor and command buffer by the
separate spinlock, so we can stop using mutex.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210603051630.2635-17-matthew.brost@intel.com
2021-06-04 10:40:50 +02:00
Matthew Brost
d35ca60087 drm/i915/guc: Ensure H2G buffer updates visible before tail update
Ensure H2G buffer updates are visible before descriptor tail updates by
inserting a barrier between the H2G buffer update and the tail. The
barrier is simple wmb() for SMEM and is register write for LMEM. This is
needed if more than 1 H2G can be inflight at once.

If this barrier is not inserted it is possible the descriptor tail
update is scene by the GuC before H2G buffer update which results in the
GuC reading a corrupt H2G value. This can bring down the H2G channel
among other bad things.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210603051630.2635-16-matthew.brost@intel.com
2021-06-04 10:39:04 +02:00
Michal Wajdeczko
7c567bbf6f drm/i915/guc: Start protecting access to CTB descriptors
We want to stop using guc.send_mutex while sending CTB messages
so we have to start protecting access to CTB send descriptor.

For completeness protect also CTB receive descriptor.

Add spinlock to struct intel_guc_ct_buffer and start using it.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210603051630.2635-15-matthew.brost@intel.com
2021-06-04 10:35:44 +02:00
Michal Wajdeczko
df12d1c301 drm/i915/guc: Update sizes of CTB buffers
Future GuC will require CTB buffers sizes to be multiple of 4K.
Make these changes now as this shouldn't impact us too much.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210603230408.54856-2-matthew.brost@intel.com
2021-06-04 10:20:13 +02:00