1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

788 commits

Author SHA1 Message Date
Dave Airlie
68b89e23c2 Merge tag 'drm-intel-gt-next-2024-04-26' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-next
UAPI Changes:

- drm/i915/guc: Use context hints for GT frequency

    Allow user to provide a low latency context hint. When set, KMD
    sends a hint to GuC which results in special handling for this
    context. SLPC will ramp the GT frequency aggressively every time
    it switches to this context. The down freq threshold will also be
    lower so GuC will ramp down the GT freq for this context more slowly.
    We also disable waitboost for this context as that will interfere with
    the strategy.

    We need to enable the use of SLPC Compute strategy during init, but
    it will apply only to contexts that set this bit during context
    creation.

    Userland can check whether this feature is supported using a new param-
    I915_PARAM_HAS_CONTEXT_FREQ_HINT. This flag is true for all guc submission
    enabled platforms as they use SLPC for frequency management.

    The Mesa usage model for this flag is here -
    https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint

- drm/i915/gt: Enable only one CCS for compute workload

    Enable only one CCS engine by default with all the compute sices
    allocated to it.

    While generating the list of UABI engines to be exposed to the
    user, exclude any additional CCS engines beyond the first
    instance

    ***

    NOTE: This W/A will make all DG2 SKUs appear like single CCS SKUs by
    default to mitigate a hardware bug. All the EUs will still remain
    usable, and all the userspace drivers have been confirmed to be able
    to dynamically detect the change in number of CCS engines and adjust.

    For the smaller percent of applications that get perf benefit from
    letting the userspace driver dispatch across all 4 CCS engines we will
    be introducing a sysfs control as a later patch to choose 4 CCS each
    with 25% EUs (or 50% if 2 CCS).

    NOTE: A regression has been reported at

    https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10895

    However Andi has been triaging the issue and we're closing in a fix
    to the gap in the W/A implementation:

    https://lists.freedesktop.org/archives/intel-gfx/2024-April/348747.html

Driver Changes:

- Add new and fix to existing workarounds: Wa_14018575942 (MTL),
  Wa_16019325821 (Gen12.70), Wa_14019159160 (MTL), Wa_16015675438,
  Wa_14020495402 (Gen12.70) (Tejas, John, Lucas)
- Fix UAF on destroy against retire race and remove two earlier
  partial fixes (Janusz)
- Limit the reserved VM space to only the platforms that need it (Andi)
- Reset queue_priority_hint on parking for execlist platforms (Chris)
- Fix gt reset with GuC submission is disabled (Nirmoy)
- Correct capture of EIR register on hang (John)

- Remove usage of the deprecated ida_simple_xx() API
- Refactor confusing __intel_gt_reset() (Nirmoy)
- Fix the fix for GuC reset lock confusion (John)
- Simplify/extend platform check for Wa_14018913170 (John)
- Replace dev_priv with i915 (Andi)
- Add and use gt_to_guc() wrapper (Andi)
- Remove bogus null check (Rodrigo, Dan)

. Selftest improvements (Janusz, Nirmoy, Daniele)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZitVBTvZmityDi7D@jlahtine-mobl.ger.corp.intel.com
2024-04-30 14:40:43 +10:00
Dave Airlie
0208ca55aa Linux 6.9-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmYlapoeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGJLgH/RFrsXefpxAWACJv
 wRmIx/YFjI3Z4nypBZPXysX/JOuf/cf4bR22jMHC1vhQDtxVgB7eUpEYx0C0Y5RD
 ll7cunAZCpP+PB725/bOWbaF94eigl8xC4RrKPon4tG1CS9NAvCJFDyNgWpqh48s
 sSKKae68a16zMtIX5Za8nvmsBQOeD67ZFBgI4m7wIDDhTXY3XlG84r8Rz/ZwcdE/
 rExFkDMUqjpoJVAMBtETQNk4pvKZVMrW2B0f/xZSR+IKXw5l2FO0raXkIG/6TERs
 CxJvCuUjKIosqFLIK2O/cU9G+5V2axZ/WD8YRwDr2vlnnOPht+dFQzyPcuSqezbG
 ShIhQo8=
 =GI5F
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.9-rc5' into drm-next

Linux 6.9-rc5

I've had a persistent msm failure on clang, and the fix is in fixes
so just pull it back to fix that.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-04-22 14:35:52 +10:00
Christophe JAILLET
2af231e1b8
drm/i915/guc: Remove usage of the deprecated ida_simple_xx() API
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_range() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/7108c1871c6cb08d403c4fa6534bc7e6de4cb23d.1705245316.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-14 15:09:18 -04:00
John Harrison
152191e5e9
drm/i915/guc: Fix the fix for reset lock confusion
The previous fix for the circlular lock splat about the busyness
worker wasn't quite complete. Even though the reset-in-progress flag
is cleared at the start of intel_uc_reset_finish, the entire function
is still inside the reset mutex lock. Not sure why the patch appeared
to fix the issue both locally and in CI. However, it is now back
again.

There is a further complication that the wedge code path within
intel_gt_reset() jumps around so much that it results in nested
reset_prepare/_finish calls. That is, the call sequence is:
  intel_gt_reset
  | reset_prepare
  | __intel_gt_set_wedged
  | | reset_prepare
  | | reset_finish
  | reset_finish

The nested finish means that even if the clear of the in-progress flag
was moved to the end of _finish, it would still be clear for the
entire second call. Surprisingly, this does not seem to be causing any
other problems at present.

As an aside, a wedge on fini does not call the finish functions at
all. The reset_in_progress flag is left set (twice).

So instead of trying to cancel the worker anywhere at all in the reset
path, just add a cancel to intel_guc_submission_fini instead. Note
that it is not a problem if the worker is still active during a reset.
Either it will run before the reset path starts locking things and
will simply block the reset code for a tiny amount of time. Or it will
run after the locks have been acquired and will early exit due to the
try-lock.

Also, do not use the reset-in-progress flag to decide whether a
synchronous cancel is safe (from a lockdep perspective) or not.
Instead, use the actual reset mutex state (both the genuine one and
the custom rolled BACKOFF one).

Fixes: 0e00a8814e ("drm/i915/guc: Avoid circular locking issue on busyness flush")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240329235306.1559639-1-John.C.Harrison@Intel.com
(cherry picked from commit 3563d85531)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-08 13:09:37 -04:00
John Harrison
3563d85531 drm/i915/guc: Fix the fix for reset lock confusion
The previous fix for the circlular lock splat about the busyness
worker wasn't quite complete. Even though the reset-in-progress flag
is cleared at the start of intel_uc_reset_finish, the entire function
is still inside the reset mutex lock. Not sure why the patch appeared
to fix the issue both locally and in CI. However, it is now back
again.

There is a further complication that the wedge code path within
intel_gt_reset() jumps around so much that it results in nested
reset_prepare/_finish calls. That is, the call sequence is:
  intel_gt_reset
  | reset_prepare
  | __intel_gt_set_wedged
  | | reset_prepare
  | | reset_finish
  | reset_finish

The nested finish means that even if the clear of the in-progress flag
was moved to the end of _finish, it would still be clear for the
entire second call. Surprisingly, this does not seem to be causing any
other problems at present.

As an aside, a wedge on fini does not call the finish functions at
all. The reset_in_progress flag is left set (twice).

So instead of trying to cancel the worker anywhere at all in the reset
path, just add a cancel to intel_guc_submission_fini instead. Note
that it is not a problem if the worker is still active during a reset.
Either it will run before the reset path starts locking things and
will simply block the reset code for a tiny amount of time. Or it will
run after the locks have been acquired and will early exit due to the
try-lock.

Also, do not use the reset-in-progress flag to decide whether a
synchronous cancel is safe (from a lockdep perspective) or not.
Instead, use the actual reset mutex state (both the genuine one and
the custom rolled BACKOFF one).

Fixes: 0e00a8814e ("drm/i915/guc: Avoid circular locking issue on busyness flush")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240329235306.1559639-1-John.C.Harrison@Intel.com
2024-04-05 09:31:37 -07:00
Rodrigo Vivi
7406538860
drm/i915/guc: Remove bogus null check
This null check is bogus because we are already using 'ce' stuff
in many places before this function is called.

Having this here is useless and confuses static analyzer tools
that can see:

struct intel_engine_cs *engine = ce->engine;

before this check, in the same function.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202403101225.7AheJhZJ-lkp@intel.com/
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240328213107.90632-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-03 14:44:22 -04:00
Lucas De Marchi
1bfc03b137 drm/i915: Remove special handling for !RCS_MASK()
With both XEHPSDV and PVC removed (as platforms, most of their code
remain used by others), there's no need to handle !RCS_MASK() as
other platforms don't ever have fused-off render. Remove those code
paths and the special WA flag when initializing GuC.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-22 14:15:00 -07:00
Lucas De Marchi
326e30e462 drm/i915: Drop dead code for pvc
PCI IDs for PVC were never added and platform always marked with
force_probe. Drop what's not used and rename some places as needed.

The registers not used anymore are also removed.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-22 14:14:56 -07:00
Lucas De Marchi
48ba4a6dc3 drm/i915: Update IP_VER(12, 50)
With no platform using graphics/media IP_VER(12, 50), replace the
checks throughout the code with IP_VER(12, 55) so the code makes sense
by itself with no additional explanation of previous baggage.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-22 14:14:52 -07:00
Lucas De Marchi
cb4046d289 drm/i915: Drop dead code for xehpsdv
PCI IDs for XEHPSDV were never added and platform always marked with
force_probe. Drop what's not used and rename some places to either be
xehp or dg2, depending on the platform/IP checks.

The registers not used anymore are also removed.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-22 14:14:39 -07:00
Lucas De Marchi
af7c4a648e drm/i915: Drop WA 16015675438
With dynamic load-balancing disabled on the compute side, there's no
reason left to enable WA 16015675438. Drop it from both PVC and DG2.
Note that this can be done because now the driver always set a fixed
partition of EUs during initialization via the ccs_mode configuration.

The flag to GuC is still needed because of 18020744125, so update
the comment accordingly.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240306144723.1826977-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-11 08:32:35 -07:00
John Harrison
7ad6a8fae5 drm/i915/guc: Enable Wa_14019159160
Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a
super-set of Wa_16019325821, so requires turning that one as well as
setting the new flag for Wa_14019159160 itself.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240223205632.1621019-4-John.C.Harrison@Intel.com
2024-03-07 15:26:47 -08:00
John Harrison
6cc7a5c7dc drm/i915/guc: Add support for w/a KLVs
To prevent running out of bits, new w/a enable flags are being added
via a KLV system instead of a 32 bit flags word.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240223205632.1621019-3-John.C.Harrison@Intel.com
2024-03-07 15:26:45 -08:00
John Harrison
f673d59e31 drm/i915: Enable Wa_16019325821
Some platforms require holding RCS context switches until CCS is idle
(the reverse w/a of Wa_14014475959). Some platforms require both
versions.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240223205632.1621019-2-John.C.Harrison@Intel.com
2024-03-07 15:26:42 -08:00
Vinay Belgaumkar
cec82816d0 drm/i915/guc: Use context hints for GT frequency
Allow user to provide a low latency context hint. When set, KMD
sends a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this context more slowly.
We also disable waitboost for this context as that will interfere with
the strategy.

We need to enable the use of SLPC Compute strategy during init, but
it will apply only to contexts that set this bit during context
creation.

Userland can check whether this feature is supported using a new param-
I915_PARAM_HAS_CONTEXT_FREQ_HINT. This flag is true for all guc submission
enabled platforms as they use SLPC for frequency management.

The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint

v2: Rename flags as per review suggestions (Rodrigo, Tvrtko).
Also, use flag bits in intel_context as it allows finer control for
toggling per engine if needed (Tvrtko).

v3: Minor review comments (Tvrtko)

v4: Update comment (Sushma)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240306012759.204938-1-vinay.belgaumkar@intel.com
2024-03-07 10:25:06 -08:00
John Harrison
4ae86a7f8d drm/i915/guc: Simplify/extend platform check for Wa_14018913170
The above w/a is required for every platform that the i915 driver
supports. It is fixed on the latest platforms but they are only
supported by Xe instead of i915. So just remove the platform check
completely and keep the code simple.

v2: Add extra comment (review feedback from Rodrigo).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240223202846.1532176-1-John.C.Harrison@Intel.com
2024-03-05 17:19:50 -08:00
John Harrison
e45afbeb59 drm/i915/guc: Correct capture of EIR register on hang
The EIR register (0x20B0) was being included in the engine class list
for render and compute as the absolute register address. However, it
is actually a ring register available on all engines at an offset of
(base) + 0xB0. As it was included as an RCS engine but with the
absolute address, GuC was adding on another 0x2000 and coming out at
an invalid location. Thus it would reject the register and complain
about only managing a partial capture.

So update the list to use the RING_EIR version of the register and
include it for all engines.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240223203204.1533410-1-John.C.Harrison@Intel.com
2024-03-04 15:35:22 -08:00
Andi Shyti
3f2f20da79 drm/i915/guc: Use the new gt_to_guc() wrapper
Get the guc reference from the gt using the gt_to_guc() helper.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231229102734.674362-3-andi.shyti@linux.intel.com
2024-03-01 13:19:43 +01:00
Dave Airlie
ca7a1d0d18 Merge tag 'drm-intel-next-2024-02-27-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
drm/i915 feature pull #2 for v6.9:

Features and functionality:
- DP tunneling and bandwidth allocation support (Imre)
- Add more ADL-N PCI IDs (Gustavo)
- Enable fastboot also on older platforms (Ville)
- Bigjoiner force enable debugfs option for testing (Stan)

Refactoring and cleanups:
- Remove unused structs and struct members (Jiri Slaby)
- Use per-device debug logging (Ville)
- State check improvements (Ville)
- Hardcoded cd2x divider cleanups (Ville)
- CDCLK documentation updates (Ville, Rodrigo)

Fixes:
- HDCP MST Type1 fixes (Suraj)
- Fix MTL C20 PHY PLL values (Ravi)
- More hardware access prevention during init (Imre)
- Always enable decompression with tile4 on Xe2 (Juha-Pekka)
- Improve LNL package C residency (Suraj)

drm core changes:
- DP tunneling and bandwidth allocation helpers (Imre)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87sf1devbj.fsf@intel.com
2024-02-28 11:02:55 +10:00
Jiri Slaby (SUSE)
394a1376d8 drm/i915: remove intel_guc::ads_engine_usage_size
intel_guc::ads_engine_usage_size was never used since its addition in
commit 77cdd054dd (drm/i915/pmu: Connect engine busyness stats from
GuC to pmu). Drop it.

Found by https://github.com/jirislaby/clang-struct.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240216065326.6910-9-jirislaby@kernel.org
2024-02-19 15:35:17 -05:00
Randy Dunlap
af3cfcad49 drm/i915/guc: reconcile Excess struct member kernel-doc warnings
Document nested struct members with full names as described in
Documentation/doc-guide/kernel-doc.rst.

intel_guc.h:305: warning: Excess struct member 'lock' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'guc_ids' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'num_guc_ids' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'guc_ids_bitmap' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'guc_id_list' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'guc_ids_in_use' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'destroyed_contexts' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'destroyed_worker' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'reset_fail_worker' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'reset_fail_mask' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'sched_disable_delay_ms' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'sched_disable_gucid_threshold' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'lock' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'gt_stamp' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'ping_delay' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'work' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'shift' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'last_stat_jiffies' description in 'intel_guc'
18 warnings as Errors

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231226195432.10891-3-rdunlap@infradead.org
(cherry picked from commit e4cf1a70fa)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-01-10 11:56:38 +02:00
John Harrison
0e00a8814e drm/i915/guc: Avoid circular locking issue on busyness flush
Avoid the following lockdep complaint:
<4> [298.856498] ======================================================
<4> [298.856500] WARNING: possible circular locking dependency detected
<4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G
    N
<4> [298.856505] ------------------------------------------------------
<4> [298.856507] kworker/4:1H/190 is trying to acquire lock:
<4> [298.856509] ffff8881103e9978 (&gt->reset.backoff_srcu){++++}-{0:0}, at:
_intel_gt_reset_lock+0x35/0x380 [i915]
<4> [298.856661]
but task is already holding lock:
<4> [298.856663] ffffc900013f7e58
((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at:
process_scheduled_works+0x264/0x530
<4> [298.856671]
which lock already depends on the new lock.

The complaint is not actually valid. The busyness worker thread does
indeed hold the worker lock and then attempt to acquire the reset lock
(which may have happened in reverse order elsewhere). However, it does
so with a trylock that exits if the reset lock is not available
(specifically to prevent this and other similar deadlocks).
Unfortunately, lockdep does not understand the trylock semantics (the
lock is an i915 specific custom implementation for resets).

Not doing a synchronous flush of the worker thread when a reset is in
progress resolves the lockdep splat by never even attempting to grab
the lock in this particular scenario.

There are situatons where a synchronous cancel is required, however.
So, always do the synchronous cancel if not in reset. And add an extra
synchronous cancel to the end of the reset flow to account for when a
reset is occurring at driver shutdown and the cancel is no longer
synchronous but could lead to unallocated memory accesses if the
worker is not stopped.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20231219195957.212600-1-John.C.Harrison@Intel.com
2024-01-09 10:14:09 -08:00
Alan Previn
2f2cc53b5f drm/i915/guc: Close deregister-context race against CT-loss
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(&gt->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Tested-by: Mousumi Jana <mousumi.jana@intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231229215143.581619-1-alan.previn.teres.alexis@intel.com
2024-01-09 09:33:08 -08:00
Alan Previn
5e83c060e9 drm/i915/guc: Flush context destruction worker at suspend
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Mousumi Jana <mousumi.jana@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231228045558.536585-2-alan.previn.teres.alexis@intel.com
2024-01-09 09:33:07 -08:00
John Harrison
a797099562 drm/i915/huc: Allow for very slow HuC loading
A failure to load the HuC is occasionally observed where the cause is
believed to be a low GT frequency leading to very long load times.

So a) increase the timeout so that the user still gets a working
system even in the case of slow load. And b) report the frequency
during the load to see if that is the cause of the slow down.

Also update the similar code on the GuC load to not use uncore->gt
when there is a local gt available. The two should match, but no need
for unnecessary de-referencing.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240102222202.310495-1-John.C.Harrison@Intel.com
2024-01-05 16:18:21 -08:00
Shuicheng Lin
835e4d9bb3 drm/i915/guc: Change wa and EU_PERF_CNTL registers to MCR type
Some of the wa registers are MCR register, and EU_PERF_CNTL registers
are MCR register.
MCR register needs extra process for read/write.
As normal MMIO register also could work with the MCR register process,
change all wa registers to MCR type for code simplicity.

Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240102010231.843778-1-shuicheng.lin@intel.com
2024-01-02 10:09:26 -08:00
Randy Dunlap
e4cf1a70fa drm/i915/guc: reconcile Excess struct member kernel-doc warnings
Document nested struct members with full names as described in
Documentation/doc-guide/kernel-doc.rst.

intel_guc.h:305: warning: Excess struct member 'lock' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'guc_ids' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'num_guc_ids' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'guc_ids_bitmap' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'guc_id_list' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'guc_ids_in_use' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'destroyed_contexts' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'destroyed_worker' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'reset_fail_worker' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'reset_fail_mask' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'sched_disable_delay_ms' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'sched_disable_gucid_threshold' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'lock' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'gt_stamp' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'ping_delay' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'work' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'shift' description in 'intel_guc'
intel_guc.h:305: warning: Excess struct member 'last_stat_jiffies' description in 'intel_guc'
18 warnings as Errors

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231226195432.10891-3-rdunlap@infradead.org
2023-12-29 12:24:50 +01:00
Zhao Liu
55a6e46180 drm/i915: Use memcpy_from_page() in gt/uc/intel_uc_fw.c
The use of kmap_atomic() is being deprecated in favor of
kmap_local_page()[1], and this patch converts the call from
kmap_atomic() to kmap_local_page().

The main difference between atomic and local mappings is that local
mappings doesn't disable page faults or preemption  (the preemption is
disabled for !PREEMPT_RT case, otherwise it only disables migration).

With kmap_local_page(), we can avoid the often unwanted side effect of
unnecessary page faults or preemption disables.

In drm/i915/gt/uc/intel_us_fw.c, the function intel_uc_fw_copy_rsa()
just use the mapping to do memory copy so it doesn't need to disable
pagefaults and preemption for mapping. Thus the local mapping without
atomic context (not disable pagefaults / preemption) is enough.

Therefore, intel_uc_fw_copy_rsa() is a function where the use of
memcpy_from_page() with kmap_local_page() in place of memcpy() with
kmap_atomic() is correctly suited.

Convert the calls of memcpy() with kmap_atomic() / kunmap_atomic() to
memcpy_from_page() which uses local mapping to copy.

[1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com/T/#u

Suggested-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231203132947.2328805-8-zhao1.liu@linux.intel.com
2023-12-15 09:34:30 +00:00
Andi Shyti
be5bcc4be9 drm/i915/guc: Create the guc_to_i915() wrapper
Given a reference to "guc", the guc_to_i915() returns the
pointer to "i915" private data.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231206184322.57111-1-andi.shyti@linux.intel.com
2023-12-08 12:31:01 +01:00
Tvrtko Ursulin
cf9cb028ac drm/i915: Use internal class when counting engine resets
Commit 503579448d ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class")
made the GSC0 engine not have a valid uabi class and so broke the engine
reset counting, which in turn was made class based in cb823ed991 ("drm/i915/gt: Use intel_gt as the primary object for handling resets").

Despite the title and commit text of the latter is not mentioning it (and
has left the storage array incorrectly sized), tracking by class, despite
it adding aliasing in hypthotetical multi-tile systems, is handy for
virtual engines which for instance do not have a valid engine->id.

Therefore we keep that but just change it to use the internal class which
is always valid. We also add a helper to increment the count, which
aligns with the existing getter.

What was broken without this fix were out of bounds reads every time a
reset would happen on the GSC0 engine, or during selftests when storing
and cross-checking the counts in igt_live_test_begin and
igt_live_test_end.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: dfed6b58d5 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class")
[tursulin: fixed Fixes tag]
Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231201122109.729006-2-tvrtko.ursulin@linux.intel.com
2023-12-07 11:40:58 +00:00
John Harrison
706785c19f drm/i915/guc: Add a selftest for FAST_REQUEST errors
There is a mechanism for reporting errors from fire and forget H2G
messages. This is the only way to find out about almost any error in
the GuC backend submission path. So it would be useful to know that it
is working.

v2: Fix some dumb over-complications and a couple of typos - review
feedback from Daniele.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231114010016.234570-3-John.C.Harrison@Intel.com
2023-11-30 13:50:50 -08:00
John Harrison
b7d2a4da38 drm/i915/guc: Fix for potential false positives in GuC hang selftest
Noticed that the hangcheck selftest is submitting a non-preemptoble
spinner. That means that even if the GuC does not die, the heartbeat
will still kick in and trigger a reset. Which is rather defeating the
purpose of the test - to verify that the heartbeat will kick in if the
GuC itself has died. The test is deliberately killing the GuC, so it
should never hit the case of a non-dead GuC. But it is not impossible
that the kill might fail at some future point due to other driver
re-work.

So, make the spinner pre-emptible. That way the heartbeat can get
through if the GuC is alive and context switching. Thus a reset only
happens if the GuC dies. Thus, if the kill should stop working the
test will now fail rather than claim to pass.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231114010016.234570-2-John.C.Harrison@Intel.com
2023-11-30 13:50:49 -08:00
Alan Previn
0eec708ec3 drm/i915/pxp: Add drm_dbgs for critical PXP events.
Debugging PXP issues can't even begin without understanding precedding
sequence of important events. Add drm_dbg into the most important PXP
events.

 v5 : - rebase.
 v4 : - rebase.
 v3 : - move gt_dbg to after mutex block in function
        i915_gsc_proxy_component_bind. (Vivaik)
 v2 : - remove __func__ since drm_dbg covers that (Jani).
      - add timeout dbg of the restart from front-end (Alan).

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Vivaik Balasubrawmanian <vivaik.balasubrawmanian@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231122191523.58379-1-alan.previn.teres.alexis@intel.com
2023-11-29 15:03:55 -08:00
Andrzej Hajda
5e4e06e408
drm/i915: Track gt pm wakerefs
Track every intel_gt_pm_get() until its corresponding release in
intel_gt_pm_put() by returning a cookie to the caller for acquire that
must be passed by on released. When there is an imbalance, we can see who
either tried to free a stale wakeref, or who forgot to free theirs.

v2: track recently added calls in gen8_ggtt_bind_get_ce and
    destroyed_worker_func

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231030-ref_tracker_i915-v1-2-006fe6b96421@intel.com
2023-11-20 12:36:56 +01:00
Daniele Ceraolo Spurio
d3715a6471 drm/i915/huc: Stop printing about unsupported HuC on MTL
On MTL, the HuC is only supported on the media GT, so our validation
check on the module parameter detects an inconsistency on the root GT
(the modparams asks to enable HuC, but the support is not there) and
prints the following info message:

[drm] GT0: Incompatible option enable_guc=3 - HuC is not supported!

This can be confusing to the user and make them think that something is
wrong when it isn't, so we need to silence it.
Given that any platform that supports HuC also supports GuC, if a user
tries to enable HuC on a platform that really doesn't support it they'll
already see a message about GuC not being supported, so instead of just
silencing the HuC message on newer platforms we can just get rid of it
entirely.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231109235436.2349963-1-daniele.ceraolospurio@intel.com
2023-11-16 13:48:46 -08:00
Dave Airlie
55b728555d Merge tag 'drm-intel-gt-next-2023-10-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:

Fixes/improvements/new stuff:

- Retry gtt fault when out of fence registers (Ville Syrjälä)
- Determine context valid in OA reports [perf] (Umesh Nerlige Ramappa)

Future platform enablement:

- GuC based TLB invalidation for Meteorlake (Jonathan Cavitt, Prathap Kumar Valsan)
- Don't set PIPE_CONTROL_FLUSH_L3 [mtl] (Vinay Belgaumkar)

Miscellaneous:

- Clean up zero initializers [guc,pxp] (Ville Syrjälä)
- Prevent potential null-ptr-deref in engine_init_common (Nirmoy Das)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZTFDFSbd/U7YP+hI@tursulin-desk
2023-10-20 16:15:25 +10:00
Jonathan Cavitt
55ac6ea7ff drm/i915: No TLB invalidation on wedged GT
It is not an error for GuC TLB invalidations to fail when the GT is
wedged or disabled, so do not process a wait failure as one in
guc_send_invalidate_tlb.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
CC: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-6-jonathan.cavitt@intel.com
2023-10-18 06:01:12 +02:00
Jonathan Cavitt
2202eca003 drm/i915: No TLB invalidation on suspended GT
In case of GT is suspended, don't allow submission of new TLB invalidation
request and cancel all pending requests. The TLB entries will be
invalidated either during GuC reload or on system resume.

Signed-off-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
CC: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-5-jonathan.cavitt@intel.com
2023-10-18 06:01:11 +02:00
Prathap Kumar Valsan
af58ee2276 drm/i915: Define and use GuC and CTB TLB invalidation routines
The GuC firmware had defined the interface for Translation Look-Aside
Buffer (TLB) invalidation.  We should use this interface when
invalidating the engine and GuC TLBs.
Add additional functionality to intel_gt_invalidate_tlb, invalidating
the GuC TLBs and falling back to GT invalidation when the GuC is
disabled.
The invalidation is done by sending a request directly to the GuC
tlb_lookup that invalidates the table.  The invalidation is submitted as
a wait request and is performed in the CT event handler.  This means we
cannot perform this TLB invalidation path if the CT is not enabled.
If the request isn't fulfilled in two seconds, this would constitute
an error in the invalidation as that would constitute either a lost
request or a severe GuC overload.

With this new invalidation routine, we can perform GuC-based GGTT
invalidations.  GuC-based GGTT invalidation is incompatible with
MMIO invalidation so we should not perform MMIO invalidation when
GuC-based GGTT invalidation is expected.

The additional complexity incurred in this patch will be necessary for
range-based tlb invalidations, which will be platformed in the future.

Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
CC: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-4-jonathan.cavitt@intel.com
2023-10-18 06:01:09 +02:00
Jonathan Cavitt
ff0dac080a drm/i915/guc: Add CT size delay helper
As of now, there is no mechanism for tracking a given request's
progress through the queue.  Instead, add a helper that returns
an estimated maximum time the queue should take to drain if
completely full.

Suggested-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-3-jonathan.cavitt@intel.com
2023-10-18 06:01:09 +02:00
Ville Syrjälä
d3110f0758 drm/i915/guc: Clean up zero initializers
Just use a simple {} to zero initialize arrays/structs instead
of the hodgepodge of stuff we are using currently.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231012122442.15718-4-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
2023-10-17 19:42:21 +03:00
Dave Airlie
614351f41e Merge tag 'drm-intel-gt-next-2023-10-12' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:

Fixes/improvements/new stuff:

- Register engines early to avoid type confusion (Mathias Krause)
- Suppress 'ignoring reset notification' message [guc] (John Harrison)
- Update 'recommended' version to 70.12.1 for DG2/ADL-S/ADL-P/MTL [guc] (John Harrison)
- Enable WA 14018913170 [guc, dg2] (Daniele Ceraolo Spurio)

Future platform enablement:

- Clean steer semaphore on resume (Nirmoy Das)
- Skip MCR ops for ring fault register [mtl] (Nirmoy Das)
- Make i915_gem_shrinker multi-gt aware [gem] (Jonathan Cavitt)
- Enable GGTT updates with binder in MTL (Nirmoy Das, Chris Wilson)
- Invalidate the TLBs on each GT (Chris Wilson)

Miscellaneous:

- Clarify type evolution of uabi_node/uabi_engines (Mathias Krause)
- Annotate struct ct_incoming_msg with __counted_by [guc] (Kees Cook)
- More use of GT specific print helpers [gt] (John Harrison)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZSfKotZVdypU6NaX@tursulin-desk
2023-10-17 13:46:21 +10:00
John Harrison
039adf3947 drm/i915: More use of GT specific print helpers
Update a bunch of GT related print messages in non-GT files to use the
GT specific helpers.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231009183802.673882-3-John.C.Harrison@Intel.com
2023-10-10 15:40:26 -07:00
Daniele Ceraolo Spurio
ca1e2a8339 drm/i915/guc: Enable WA 14018913170
The GuC handles the WA, the KMD just needs to set the flag to enable
it on the appropriate platforms.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231006013553.1339418-1-John.C.Harrison@Intel.com
2023-10-09 16:35:04 -07:00
Kees Cook
3e78f77121 drm/i915/guc: Annotate struct ct_incoming_msg with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct ct_incoming_msg.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-hardening@vger.kernel.org
Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231006201744.work.135-kees@kernel.org
2023-10-09 13:59:18 +02:00
John Harrison
1621a8edc2 drm/i915/guc: Update 'recommended' version to 70.12.1 for DG2/ADL-S/ADL-P/MTL
The latest GuC has new features and new workarounds that we wish to
enable. So let the universe know that it is useful to update their
firmware.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231006145801.161868-1-John.C.Harrison@Intel.com
2023-10-07 23:06:48 +02:00
John Harrison
6a3ecfd4a0 drm/i915/guc: Suppress 'ignoring reset notification' message
If an active context has been banned (e.g. Ctrl+C killed) then it is
likely to be reset as part of evicting it from the hardware. That
results in a 'ignoring context reset notification: banned = 1'
message at info level. This confuses/concerns people and makes them
think something has gone wrong when it hasn't.

There is already a debug level message with essentially the same
information. So drop the 'ignore' info level one and just add the
'ignore' flag to the debug level one instead (which will therefore not
appear by default but will still show up in CI runs).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230921182033.135448-1-John.C.Harrison@Intel.com
2023-10-03 18:48:07 -07:00
Dave Airlie
caacbdc28f Merge tag 'drm-intel-gt-next-2023-09-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:

Fixes/improvements/new stuff:

- Fix TLB-Invalidation seqno store [mtl] (Alan Previn)
- Force a reset on internal GuC error [guc] (John Harrison)
- Define GSC fw [gsc] (Daniele Ceraolo Spurio)
- Update workaround 14016712196 [dg2/mtl] (Tejas Upadhyay)
- Mark requests for GuC virtual engines to avoid use-after-free (Andrzej Hajda)
- Add Wa_14015150844 [dg2/mtl] (Shekhar Chauhan)
- Prevent error pointer dereference (Dan Carpenter)
- Add Wa_18022495364 [tgl,adl,rpl] (Dnyaneshwar Bhadane)
- Fix GuC PMU by moving execlist stats initialization to execlist specific setup (Umesh Nerlige Ramappa)
- Fix PXP firmware load [pxp/mtl] (Alan Previn)
- Fix execution/context state of PXP contexts (Alan Previn)
- Limit the length of an sg list to the requested length (Matthew Wilcox)
- Fix reservation address in ggtt_reserve_guc_top [guc] (Javier Pello)
- Add Wa_18028616096 [dg2] (Shekhar Chauhan)
- Get runtime pm in busyness worker only if already active [guc/pmu] (Umesh Nerlige Ramappa)
- Don't set PIPE_CONTROL_FLUSH_L3 for aux inval (Nirmoy Das)

Future platform enablement:

- Fix and consolidate some workaround checks, make others IP version based [mtl] (Matt Roper)
- Replace Meteorlake subplatforms with IP version checks (Matt Roper)
- Adding DeviceID for Arrowlake-S under MTL [mtl] (Nemesa Garg)
- Run relevant bits of debugfs drop_caches per GT (Tvrtko Ursulin)

Miscellaneous:

- Remove Wa_15010599737 [dg2] (Shekhar Chauhan)
- Align igt_spinner_create_request with hangcheck [selftests] (Jonathan Cavitt)
- Remove pre-production workarounds [dg2] (Matt Roper)
- Tidy some workaround definitions (Matt Roper)
- Wait longer for tasks in migrate selftest [gt] (Jonathan Cavitt)
- Skip WA verification for GEN7_MISCCPCTL on DG2 [gt] (Andrzej Hajda)
- Silence injected failure in the load via GSC path [huc] (Daniele Ceraolo Spurio)
- Refactor deprecated strncpy (Justin Stitt)
- Update RC6 mask for mtl_drpc [debugfs/mtl] (Badal Nilawar)
- Remove a static inline that requires including i915_drv.h [gt] (Jani Nikula)
- Remove inlines from i915_gem_execbuffer.c [gem] (Jani Nikula)
- Remove gtt_offset from stream->oa_buffer.head/.tail [perf] (Ashutosh Dixit)
- Do not disable preemption for resets (Tvrtko Ursulin)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZRVzL02VFuwIkcGl@tursulin-desk
2023-10-03 06:02:30 +10:00
Umesh Nerlige Ramappa
e2f99b79d4 i915/guc: Get runtime pm in busyness worker only if already active
Ideally the busyness worker should take a gt pm wakeref because the
worker only needs to be active while gt is awake. However, the gt_park
path cancels the worker synchronously and this complicates the flow if
the worker is also running at the same time. The cancel waits for the
worker and when the worker releases the wakeref, that would call gt_park
and would lead to a deadlock.

The resolution is to take the global pm wakeref if runtime pm is already
active. If not, we don't need to update the busyness stats as the stats
would already be updated when the gt was parked.

Note:
- We do not requeue the worker if we cannot take a reference to runtime
  pm since intel_guc_busyness_unpark would requeue the worker in the
  resume path.

- If the gt was parked longer than time taken for GT timestamp to roll
  over, we ignore those rollovers since we don't care about tracking the
  exact GT time. We only care about roll overs when the gt is active and
  running workloads.

- There is a window of time between gt_park and runtime suspend, where
  the worker may run. This is acceptable since the worker will not find
  any new data to update busyness.

v2: (Daniele)
- Edit commit message and code comment
- Use runtime pm in the worker
- Put runtime pm after enabling the worker
- Use Link tag and add Fixes tag

v3: (Daniele)
- Reword commit and comments and add details

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077
Fixes: 77cdd054dd ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230925192117.2497058-1-umesh.nerlige.ramappa@intel.com
2023-09-26 08:42:57 -07:00
Alan Previn
8ae2723481 drm/i915/pxp/mtl: Update pxp-firmware response timeout
Update the max GSC-fw response time to match updated internal
fw specs. Because this response time is an SLA on the firmware,
not inclusive of i915->GuC->HW handoff latency, when submitting
requests to the GSC fw via intel_gsc_uc_heci_cmd_submit helpers,
start the count after the request hits the GSC command streamer.
Also, move GSC_REPLY_LATENCY_MS definition from pxp header to
intel_gsc_uc_heci_cmd_submit.h since its for any GSC HECI packet.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Vivaik Balasubrawmanian <vivaik.balasubrawmanian@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230917211933.1407559-2-alan.previn.teres.alexis@intel.com
2023-09-19 12:11:18 -07:00