- Remove unused flags (Francois Dugast)
- Extend uAPI to query HuC micro-controler firmware version (Francois Dugast)
- drm/xe/uapi: Define topology types as indexes rather than masks
(Francois Dugast)
- drm/xe/uapi: Restore flags VM_BIND_FLAG_READONLY and VM_BIND_FLAG_IMMEDIATE
(Francois Dugast)
- devcoredump updates. Some touching the output format.
(José Roberto de Souza, Matthew Brost)
- drm/xe/hwmon: Add infra to support card power and energy attributes
- Improve LRC, HWSP and HWCTX error capture. (Maarten Lankhorst)
- drm/xe/uapi: Add IP version and stepping to GT list query (Matt roper)
- Invalidate userptr VMA on page pin fault (Matthew Brost)
- Improve xe_bo_move tracepoint (Priyanka Danamudi)
- Align fence output format in ftrace log
Cross-driver Changes:
- drm/i915/hwmon: Get rid of devm (Ashutosh Dixit)
(Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>)
- drm/i915/display: convert inner wakeref get towards get_if_in_use
(SOB Rodrigo Vivi)
- drm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref
(Committer, SOB Jani Nikula)
Driver Changes:
- Fix for unneeded CCS metadata allocation (Akshata Jahagirdar)
- Fix for fix multicast support for Xe_LP platforms (Andrzej Hajda)
- A couple of build fixes (Arnd Bergmann)
- Fix register definition (Ashutosh Dixit)
- Add BMG mocs table (Balasubramani Vivekanandan)
- Replace sprintf() across driver (Bommu Krishnaiah)
- Add an xe2 workaround (Bommu Krishnaiah)
- Makefile fix (Dafna Hirschfeld)
- force_wake_get error value check (Daniele Ceraolo Spurio)
- Handle GSCCS ER interrupt (Daniele Ceraolo Spurio)
- GSC Workaround (Daniele Ceraolo Spurio)
- Build error fix (Dawei Li)
- drm/xe/gt: Add L3 bank mask to GT topology (Francois Dugast)
- Implement xe2- and GuC workarounds (Gustavo Sousa, Haridhar Kalvala,
Himal rasad Ghimiray, John Harrison, Matt Roper, Radhakrishna Sripada,
Vinay Belgaumkar, Badal Nilawar)
- xe2hpg compression (Himal Ghimiray Prasad)
- Error code cleanups and fixes (Himal Prasad Ghimiray)
- struct xe_device cleanup (Jani Nikula)
- Avoid validating bos when only requesting an exec dma-fence
(José Roberto de Souza)
- Remove debug message from migrate_clear (José Roberto de Souza)
- Nuke EXEC_QUEUE_FLAG_PERSISTENT leftover internal flag (José Roberto de Souza)
- Mark dpt and related vma as uncached (Juha-Pekka Heikkila)
- Hwmon updates (Karthik Poosa)
- KConfig fix when ACPI_WMI selcted (Lu Yao)
- Update intel_uncore_read*() return types (Luca Coelho)
- Mocs updates (Lucas De Marchi, Matt Roper)
- Drop dynamic load-balancing workaround (Lucas De Marchi)
- Fix a PVC workaround (Lucas De Marchi)
- Group live kunit tests into a single module (Lucas De Marchi)
- Various code cleanups (Lucas De Marchi)
- Fix a ggtt init error patch and move ggtt invalidate out of ggtt lock
(Maarten Lankhorst)
- Fix a bo leak (Marten Lankhorst)
- Add LRC parsing for more GPU instructions (Matt Roper)
- Add various definitions for hardware and IP (Matt Roper)
- Define all possible engines in media IP descriptors (Matt Roper)
- Various cleanups, asserts and code fixes (Matthew Auld)
- Various cleanups and code fixes (Matthew Brost)
- Increase VM_BIND number of per-ioctl Ops (Matthew Brost, Paulo Zanoni)
- Don't support execlists in xe_gt_tlb_invalidation layer (Matthew Brost)
- Handle timing out of already signaled jobs gracefully (Matthew Brost)
- Pipeline evict / restore of pinned BOs during suspend / resume (Matthew Brost)
- Do not grab forcewakes when issuing GGTT TLB invalidation via GuC
(Matthew Brost)
- Drop ggtt invalidate from display code (Matthew Brost)
- drm/xe: Add XE_BO_GGTT_INVALIDATE flag (Matthew Brost)
- Add debug messages for MMU notifier and VMA invalidate (Matthew Brost)
- Use ordered wq for preempt fence waiting (Matthew Brost)
- Initial development for SR-IOV support including some refactoring
(Michal Wajdeczko)
- Various GuC- and GT- related cleanups and fixes (Michal Wajdeczko)
- Move userptr over to start using hmm_range_fault (Oak Zeng)
- Add new PCI IDs to DG2 platform (Ravi Kumar Vodapalli)
- Pcode - and VRAM initialization check update (Riana Tauro)
- Large PM update including i915 display patches, and a fix for one of those.
(Rodrigo Vivi)
- Introduce performance tuning changes for Xe2_HPG (Shekhar Chauhan)
- GSC / HDCP updates (Suraj Kandpal)
- Minor code cleanup (Tejas Upadhyay)
- Rework / fix rebind TLB flushing and move rebind into the drm_exec locking loop
(Thomas Hellström)
- Backmerge (Thomas Hellström)
- GuC updates and fixes (Vinay Belgaumkar, Zhanjun Dong)
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRskUM7w1oG5rx2IZO4FpNVCsYGvwUCZiestQAKCRC4FpNVCsYG
v8dLAQCDFUR7R5rwSdfqzNy+Djg+9ZgmtzVEfHZ+rI2lTReaCwEAhWeK7UooIMV0
vGsSdsqGsJQm4VLRzE6H1yemCCQOBgM=
=HouD
-----END PGP SIGNATURE-----
Merge tag 'drm-xe-next-2024-04-23' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Remove unused flags (Francois Dugast)
- Extend uAPI to query HuC micro-controler firmware version (Francois Dugast)
- drm/xe/uapi: Define topology types as indexes rather than masks
(Francois Dugast)
- drm/xe/uapi: Restore flags VM_BIND_FLAG_READONLY and VM_BIND_FLAG_IMMEDIATE
(Francois Dugast)
- devcoredump updates. Some touching the output format.
(José Roberto de Souza, Matthew Brost)
- drm/xe/hwmon: Add infra to support card power and energy attributes
- Improve LRC, HWSP and HWCTX error capture. (Maarten Lankhorst)
- drm/xe/uapi: Add IP version and stepping to GT list query (Matt roper)
- Invalidate userptr VMA on page pin fault (Matthew Brost)
- Improve xe_bo_move tracepoint (Priyanka Danamudi)
- Align fence output format in ftrace log
Cross-driver Changes:
- drm/i915/hwmon: Get rid of devm (Ashutosh Dixit)
(Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>)
- drm/i915/display: convert inner wakeref get towards get_if_in_use
(SOB Rodrigo Vivi)
- drm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref
(Committer, SOB Jani Nikula)
Driver Changes:
- Fix for unneeded CCS metadata allocation (Akshata Jahagirdar)
- Fix for fix multicast support for Xe_LP platforms (Andrzej Hajda)
- A couple of build fixes (Arnd Bergmann)
- Fix register definition (Ashutosh Dixit)
- Add BMG mocs table (Balasubramani Vivekanandan)
- Replace sprintf() across driver (Bommu Krishnaiah)
- Add an xe2 workaround (Bommu Krishnaiah)
- Makefile fix (Dafna Hirschfeld)
- force_wake_get error value check (Daniele Ceraolo Spurio)
- Handle GSCCS ER interrupt (Daniele Ceraolo Spurio)
- GSC Workaround (Daniele Ceraolo Spurio)
- Build error fix (Dawei Li)
- drm/xe/gt: Add L3 bank mask to GT topology (Francois Dugast)
- Implement xe2- and GuC workarounds (Gustavo Sousa, Haridhar Kalvala,
Himal rasad Ghimiray, John Harrison, Matt Roper, Radhakrishna Sripada,
Vinay Belgaumkar, Badal Nilawar)
- xe2hpg compression (Himal Ghimiray Prasad)
- Error code cleanups and fixes (Himal Prasad Ghimiray)
- struct xe_device cleanup (Jani Nikula)
- Avoid validating bos when only requesting an exec dma-fence
(José Roberto de Souza)
- Remove debug message from migrate_clear (José Roberto de Souza)
- Nuke EXEC_QUEUE_FLAG_PERSISTENT leftover internal flag (José Roberto de Souza)
- Mark dpt and related vma as uncached (Juha-Pekka Heikkila)
- Hwmon updates (Karthik Poosa)
- KConfig fix when ACPI_WMI selcted (Lu Yao)
- Update intel_uncore_read*() return types (Luca Coelho)
- Mocs updates (Lucas De Marchi, Matt Roper)
- Drop dynamic load-balancing workaround (Lucas De Marchi)
- Fix a PVC workaround (Lucas De Marchi)
- Group live kunit tests into a single module (Lucas De Marchi)
- Various code cleanups (Lucas De Marchi)
- Fix a ggtt init error patch and move ggtt invalidate out of ggtt lock
(Maarten Lankhorst)
- Fix a bo leak (Marten Lankhorst)
- Add LRC parsing for more GPU instructions (Matt Roper)
- Add various definitions for hardware and IP (Matt Roper)
- Define all possible engines in media IP descriptors (Matt Roper)
- Various cleanups, asserts and code fixes (Matthew Auld)
- Various cleanups and code fixes (Matthew Brost)
- Increase VM_BIND number of per-ioctl Ops (Matthew Brost, Paulo Zanoni)
- Don't support execlists in xe_gt_tlb_invalidation layer (Matthew Brost)
- Handle timing out of already signaled jobs gracefully (Matthew Brost)
- Pipeline evict / restore of pinned BOs during suspend / resume (Matthew Brost)
- Do not grab forcewakes when issuing GGTT TLB invalidation via GuC
(Matthew Brost)
- Drop ggtt invalidate from display code (Matthew Brost)
- drm/xe: Add XE_BO_GGTT_INVALIDATE flag (Matthew Brost)
- Add debug messages for MMU notifier and VMA invalidate (Matthew Brost)
- Use ordered wq for preempt fence waiting (Matthew Brost)
- Initial development for SR-IOV support including some refactoring
(Michal Wajdeczko)
- Various GuC- and GT- related cleanups and fixes (Michal Wajdeczko)
- Move userptr over to start using hmm_range_fault (Oak Zeng)
- Add new PCI IDs to DG2 platform (Ravi Kumar Vodapalli)
- Pcode - and VRAM initialization check update (Riana Tauro)
- Large PM update including i915 display patches, and a fix for one of those.
(Rodrigo Vivi)
- Introduce performance tuning changes for Xe2_HPG (Shekhar Chauhan)
- GSC / HDCP updates (Suraj Kandpal)
- Minor code cleanup (Tejas Upadhyay)
- Rework / fix rebind TLB flushing and move rebind into the drm_exec locking loop
(Thomas Hellström)
- Backmerge (Thomas Hellström)
- GuC updates and fixes (Vinay Belgaumkar, Zhanjun Dong)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQRskUM7w1oG5rx2IZO4FpNVCsYGvwUCZiestQAKCRC4FpNVCsYG
# v8dLAQCDFUR7R5rwSdfqzNy+Djg+9ZgmtzVEfHZ+rI2lTReaCwEAhWeK7UooIMV0
# vGsSdsqGsJQm4VLRzE6H1yemCCQOBgM=
# =HouD
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 23 Apr 2024 22:42:29 AEST
# gpg: using EDDSA key 6C91433BC35A06E6BC762193B81693550AC606BF
# gpg: Can't check signature: No public key
# Conflicts:
# drivers/gpu/drm/xe/xe_device_types.h
# drivers/gpu/drm/xe/xe_vm.c
# drivers/gpu/drm/xe/xe_vm_types.h
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zievlb1wvqDg1ovi@fedora
When both hwmon and hwmon drvdata (on which hwmon depends) are device
managed resources, the expectation, on device unbind, is that hwmon will be
released before drvdata. However, in i915 there are two separate code
paths, which both release either drvdata or hwmon and either can be
released before the other. These code paths (for device unbind) are as
follows (see also the bug referenced below):
Call Trace:
release_nodes+0x11/0x70
devres_release_group+0xb2/0x110
component_unbind_all+0x8d/0xa0
component_del+0xa5/0x140
intel_pxp_tee_component_fini+0x29/0x40 [i915]
intel_pxp_fini+0x33/0x80 [i915]
i915_driver_remove+0x4c/0x120 [i915]
i915_pci_remove+0x19/0x30 [i915]
pci_device_remove+0x32/0xa0
device_release_driver_internal+0x19c/0x200
unbind_store+0x9c/0xb0
and
Call Trace:
release_nodes+0x11/0x70
devres_release_all+0x8a/0xc0
device_unbind_cleanup+0x9/0x70
device_release_driver_internal+0x1c1/0x200
unbind_store+0x9c/0xb0
This means that in i915, if use devm, we cannot gurantee that hwmon will
always be released before drvdata. Which means that we have a uaf if hwmon
sysfs is accessed when drvdata has been released but hwmon hasn't.
The only way out of this seems to be do get rid of devm_ and release/free
everything explicitly during device unbind.
v2: Change commit message and other minor code changes
v3: Cleanup from i915_hwmon_register on error (Armin Wolf)
v4: Eliminate potential static analyzer warning (Rodrigo)
Eliminate fetch_and_zero (Jani)
v5: Restore previous logic for ddat_gt->hwmon_dev error return (Andi)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10366
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417145646.793223-1-ashutosh.dixit@intel.com
In i915 hwmon sysfs getter path we now take a hwmon_lock, then acquire an
rpm wakeref. That results in lock inversion:
<4> [197.079335] ======================================================
<4> [197.085473] WARNING: possible circular locking dependency detected
<4> [197.091611] 6.8.0-rc7-Patchwork_129026v7-gc4dc92fb1152+ #1 Not tainted
<4> [197.098096] ------------------------------------------------------
<4> [197.104231] prometheus-node/839 is trying to acquire lock:
<4> [197.109680] ffffffff82764d80 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x9a/0x350
<4> [197.116939]
but task is already holding lock:
<4> [197.122730] ffff88811b772a40 (&hwmon->hwmon_lock){+.+.}-{3:3}, at: hwm_energy+0x4b/0x100 [i915]
<4> [197.131543]
which lock already depends on the new lock.
...
<4> [197.507922] Chain exists of:
fs_reclaim --> >->reset.mutex --> &hwmon->hwmon_lock
<4> [197.518528] Possible unsafe locking scenario:
<4> [197.524411] CPU0 CPU1
<4> [197.528916] ---- ----
<4> [197.533418] lock(&hwmon->hwmon_lock);
<4> [197.537237] lock(>->reset.mutex);
<4> [197.543376] lock(&hwmon->hwmon_lock);
<4> [197.549682] lock(fs_reclaim);
...
<4> [197.632548] Call Trace:
<4> [197.634990] <TASK>
<4> [197.637088] dump_stack_lvl+0x64/0xb0
<4> [197.640738] check_noncircular+0x15e/0x180
<4> [197.652968] check_prev_add+0xe9/0xce0
<4> [197.656705] __lock_acquire+0x179f/0x2300
<4> [197.660694] lock_acquire+0xd8/0x2d0
<4> [197.673009] fs_reclaim_acquire+0xa1/0xd0
<4> [197.680478] __kmalloc+0x9a/0x350
<4> [197.689063] acpi_ns_internalize_name.part.0+0x4a/0xb0
<4> [197.694170] acpi_ns_get_node_unlocked+0x60/0xf0
<4> [197.720608] acpi_ns_get_node+0x3b/0x60
<4> [197.724428] acpi_get_handle+0x57/0xb0
<4> [197.728164] acpi_has_method+0x20/0x50
<4> [197.731896] acpi_pci_set_power_state+0x43/0x120
<4> [197.736485] pci_power_up+0x24/0x1c0
<4> [197.740047] pci_pm_default_resume_early+0x9/0x30
<4> [197.744725] pci_pm_runtime_resume+0x2d/0x90
<4> [197.753911] __rpm_callback+0x3c/0x110
<4> [197.762586] rpm_callback+0x58/0x70
<4> [197.766064] rpm_resume+0x51e/0x730
<4> [197.769542] rpm_resume+0x267/0x730
<4> [197.773020] rpm_resume+0x267/0x730
<4> [197.776498] rpm_resume+0x267/0x730
<4> [197.779974] __pm_runtime_resume+0x49/0x90
<4> [197.784055] __intel_runtime_pm_get+0x19/0xa0 [i915]
<4> [197.789070] hwm_energy+0x55/0x100 [i915]
<4> [197.793183] hwm_read+0x9a/0x310 [i915]
<4> [197.797124] hwmon_attr_show+0x36/0x120
<4> [197.800946] dev_attr_show+0x15/0x60
<4> [197.804509] sysfs_kf_seq_show+0xb5/0x100
Acquire the wakeref before the lock and hold it as long as the lock is
also held. Follow that pattern across the whole source file where similar
lock inversion can happen.
v2: Keep hardware read under the lock so the whole operation of updating
energy from hardware is still atomic (Guenter),
- instead, acquire the rpm wakeref before the lock and hold it as long
as the lock is held,
- use the same aproach for other similar places across the i915_hwmon.c
source file (Rodrigo).
Fixes: 1b44019a93 ("drm/i915/guc: Disable PL1 power limit when loading GuC firmware")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: <stable@vger.kernel.org> # v6.5+
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240311203500.518675-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 71b2187714)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
PCI IDs for XEHPSDV were never added and platform always marked with
force_probe. Drop what's not used and rename some places to either be
xehp or dg2, depending on the platform/IP checks.
The registers not used anymore are also removed.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
In i915 hwmon sysfs getter path we now take a hwmon_lock, then acquire an
rpm wakeref. That results in lock inversion:
<4> [197.079335] ======================================================
<4> [197.085473] WARNING: possible circular locking dependency detected
<4> [197.091611] 6.8.0-rc7-Patchwork_129026v7-gc4dc92fb1152+ #1 Not tainted
<4> [197.098096] ------------------------------------------------------
<4> [197.104231] prometheus-node/839 is trying to acquire lock:
<4> [197.109680] ffffffff82764d80 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x9a/0x350
<4> [197.116939]
but task is already holding lock:
<4> [197.122730] ffff88811b772a40 (&hwmon->hwmon_lock){+.+.}-{3:3}, at: hwm_energy+0x4b/0x100 [i915]
<4> [197.131543]
which lock already depends on the new lock.
...
<4> [197.507922] Chain exists of:
fs_reclaim --> >->reset.mutex --> &hwmon->hwmon_lock
<4> [197.518528] Possible unsafe locking scenario:
<4> [197.524411] CPU0 CPU1
<4> [197.528916] ---- ----
<4> [197.533418] lock(&hwmon->hwmon_lock);
<4> [197.537237] lock(>->reset.mutex);
<4> [197.543376] lock(&hwmon->hwmon_lock);
<4> [197.549682] lock(fs_reclaim);
...
<4> [197.632548] Call Trace:
<4> [197.634990] <TASK>
<4> [197.637088] dump_stack_lvl+0x64/0xb0
<4> [197.640738] check_noncircular+0x15e/0x180
<4> [197.652968] check_prev_add+0xe9/0xce0
<4> [197.656705] __lock_acquire+0x179f/0x2300
<4> [197.660694] lock_acquire+0xd8/0x2d0
<4> [197.673009] fs_reclaim_acquire+0xa1/0xd0
<4> [197.680478] __kmalloc+0x9a/0x350
<4> [197.689063] acpi_ns_internalize_name.part.0+0x4a/0xb0
<4> [197.694170] acpi_ns_get_node_unlocked+0x60/0xf0
<4> [197.720608] acpi_ns_get_node+0x3b/0x60
<4> [197.724428] acpi_get_handle+0x57/0xb0
<4> [197.728164] acpi_has_method+0x20/0x50
<4> [197.731896] acpi_pci_set_power_state+0x43/0x120
<4> [197.736485] pci_power_up+0x24/0x1c0
<4> [197.740047] pci_pm_default_resume_early+0x9/0x30
<4> [197.744725] pci_pm_runtime_resume+0x2d/0x90
<4> [197.753911] __rpm_callback+0x3c/0x110
<4> [197.762586] rpm_callback+0x58/0x70
<4> [197.766064] rpm_resume+0x51e/0x730
<4> [197.769542] rpm_resume+0x267/0x730
<4> [197.773020] rpm_resume+0x267/0x730
<4> [197.776498] rpm_resume+0x267/0x730
<4> [197.779974] __pm_runtime_resume+0x49/0x90
<4> [197.784055] __intel_runtime_pm_get+0x19/0xa0 [i915]
<4> [197.789070] hwm_energy+0x55/0x100 [i915]
<4> [197.793183] hwm_read+0x9a/0x310 [i915]
<4> [197.797124] hwmon_attr_show+0x36/0x120
<4> [197.800946] dev_attr_show+0x15/0x60
<4> [197.804509] sysfs_kf_seq_show+0xb5/0x100
Acquire the wakeref before the lock and hold it as long as the lock is
also held. Follow that pattern across the whole source file where similar
lock inversion can happen.
v2: Keep hardware read under the lock so the whole operation of updating
energy from hardware is still atomic (Guenter),
- instead, acquire the rpm wakeref before the lock and hold it as long
as the lock is held,
- use the same aproach for other similar places across the i915_hwmon.c
source file (Rodrigo).
Fixes: 1b44019a93 ("drm/i915/guc: Disable PL1 power limit when loading GuC firmware")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: <stable@vger.kernel.org> # v6.5+
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240311203500.518675-2-janusz.krzysztofik@linux.intel.com
Statically allocated array of pointers to hwmon_channel_info can be made
const for safety.
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230511175446.282041-1-krzysztof.kozlowski@linaro.org
Instead of erroring out when GuC reset is in progress, block waiting for
GuC reset to complete which is a more reasonable uapi behavior.
v2: Avoid race between wake_up_all and waiting for wakeup (Rodrigo)
v3: Remove timeout when blocked (Tvrtko)
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230420164041.1428455-4-ashutosh.dixit@intel.com
On dGfx, the PL1 power limit being enabled and set to a low value results
in a low GPU operating freq. It also negates the freq raise operation which
is done before GuC firmware load. As a result GuC firmware load can time
out. Such timeouts were seen in the GL #8062 bug below (where the PL1 power
limit was enabled and set to a low value). Therefore disable the PL1 power
limit when allowed by HW when loading GuC firmware.
v2:
- Take mutex (to disallow writes to power1_max) across GuC reset/fw load
- Add hwm_power_max_restore to error return code path
v3 (Jani N):
- Add/remove explanatory comments
- Function renames
- Type corrections
- Locking annotation
v4:
- Don't hold the lock across GuC reset (Rodrigo)
- New locking scheme (suggested by Rodrigo)
- Eliminate rpm_get in power_max_disable/restore, not needed (Tvrtko)
v5:
- Fix uninitialized pl1en variable compile warning reported by kernel
build robot by creating new err_rps label
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230420164041.1428455-3-ashutosh.dixit@intel.com
In preparation for follow-on patches, refactor hwm_power_max_write to take
hwmon_lock and runtime pm wakeref at start of the function and release them
at the end, therefore acquiring these just once each.
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230420164041.1428455-2-ashutosh.dixit@intel.com
On ATSM the PL1 limit is disabled at power up. The previous uapi assumed
that the PL1 limit is always enabled and therefore did not have a notion of
a disabled PL1 limit. This results in erroneous PL1 limit values when the
PL1 limit is disabled. For example at power up, the disabled ATSM PL1 limit
was previously shown as 0 which means a low PL1 limit whereas the limit
being disabled actually implies a high effective PL1 limit value.
To get round this problem, the PL1 limit uapi is expanded to include a
special value 0 to designate a disabled PL1 limit. A read value of 0 means
that the PL1 power limit is disabled, writing 0 disables the limit.
The link between this patch and the bugs mentioned below is as follows:
* Because on ATSM the PL1 power limit is disabled on power up and there
were no means to enable it, we previously implemented the means to
enable the limit when the PL1 hwmon entry (power1_max) was written to.
* Now there is a IGT igt@i915_hwmon@hwmon_write which (a) reads orig value
from all hwmon sysfs (b) does a bunch of random writes and finally (c)
restores the orig value read. On ATSM since the orig value is 0, when
the IGT restores the 0 value, the PL1 limit is now enabled with a value
of 0.
* PL1 limit of 0 implies a low PL1 limit which causes GPU freq to fall to
100 MHz. This causes GuC FW load and several IGT's to start timing out
and gives rise to these Intel CI bugs. After this patch, writing 0 would
disable the PL1 limit instead of enabling it, avoiding the freq drop
issue.
v2: Add explanation for bugs mentioned below (Rodrigo)
v3: Eliminate race during PL1 disable and verify (Tvrtko)
Change return to -ENODEV if verify fails (Tvrtko)
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8060
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230401024146.1826092-1-ashutosh.dixit@intel.com
The value shown by power1_max_interval in millisec is essentially:
((1.x * power(2,y)) * 1000) >> 10
Where x and y are read from a HW register. On ATSM, x and y are 0 on
power-up so the value shown is 0.
Writes of 0 to power1_max_interval had previously been disallowed to avoid
computing ilog2(0) but this resulted in the corner-case bug
below. Therefore allow writes of 0 now but special case that write to
x = y = 0.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7754
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230228044334.3630391-1-ashutosh.dixit@intel.com
Previous documentation suggested that the PL1 power limit is always enabled
in HW. However we now find this not to be the case on some platforms (such
as ATSM). Therefore enable the PL1 power limit (by setting the enable bit)
when writing the PL1 limit value to HW.
Bspec: 51864
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230216164944.2366150-3-ashutosh.dixit@intel.com
hwm_field_scale_and_write has a single caller hwm_power_write and is
specific to hwm_power_write but makes it appear that it is a general
function which can have multiple callers. Replace the function with
hwm_power_max_write which is specific to hwm_power_write and use that in
future patches where the function needs to be extended.
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230216164944.2366150-2-ashutosh.dixit@intel.com
This reverts commit 0349c41b05.
0349c41b05 ("drm/i915/hwmon: Enable PL1 power limit") is incorrect and
caused a major regression on ATSM. The change enabled the PL1 power limit
but FW sets the default value of the PL1 limit to 0 which implies HW now
works at minimum power and therefore the lowest effective frequency. This
means all workloads now run slower resulting in even GuC FW load operations
timing out, rendering ATSM unusable.
A different solution to the original issue of the PL1 limit being disabled
on ATSM is needed but till that is developed, revert 0349c41b05.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230208190312.1611335-1-ashutosh.dixit@intel.com
Previous documentation suggested that PL1 power limit is always
enabled. However we now find this not to be the case on some
platforms (such as ATSM). Therefore enable PL1 power limit during hwmon
initialization.
Bspec: 51864
v2: Add Bspec reference (Gwan-gyeong)
v3: Add Fixes tag
Fixes: 99f55efb79 ("drm/i915/hwmon: Power PL1 limit and TDP setting")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230203155309.1042297-1-ashutosh.dixit@intel.com
HW allows arbitrary PL1 limits to be set but silently clamps these values
to "typical but not guaranteed" min/max values in pkg_power_sku
register. Follow the same pattern for sysfs, allow arbitrary PL1 limits to
be set but display clamped values when read, so that users see PL1 limits
HW is likely using. Otherwise users think HW is using arbitrarily high/low
PL1 limits they might have set. The previous write/read I1 power1_crit
limit also follows the same clamping pattern.
v2: Explain "why" in commit message and include bug link (Jani Nikula)
Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7704
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221215191727.2468770-1-ashutosh.dixit@intel.com
hwm_pcode_read_i1 is called during i915 load. This results in the following
warning from snb_pcode_read because POWER_SETUP_SUBCOMMAND_READ_I1 is
unsupported on DG1/DG2.
[drm:snb_pcode_read [i915]] warning: pcode (read from mbox 47c) \
mailbox access failed for snb_pcode_read_p [i915]: -6
The code handles the unsupported command but the warning in dmesg is
a red herring which has resulted in a couple of bugs being filed.
Therefore silence the warning by avoiding calling snb_pcode_read_p
for DG1/DG2.
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221203031454.1280538-1-ashutosh.dixit@intel.com
Use REG_FIELD_PREP() and a constant value for hwm_field_scale_and_write()
If the first argument of FIELD_PREP() is not a compile-time constant value
or unsigned long long type, this routine of the __BF_FIELD_CHECK() macro
used internally by the FIELD_PREP() macro always returns false.
BUILD_BUG_ON_MSG(__bf_cast_unsigned(_mask, _mask) > \
__bf_cast_unsigned(_reg, ~0ull), \
_pfx "type of reg too small for mask"); \
And it returns a build error by the option among the clang
compilation options. [-Werror,-Wtautological-constant-out-of-range-compare]
Reported build error while using clang compiler:
drivers/gpu/drm/i915/i915_hwmon.c:115:16: error: result of comparison of
constant 18446744073709551615 with expression of type 'typeof (_Generic((field_msk),
char: (unsigned char)0, unsigned char: (unsigned char)0, signed char: (unsigned char)0,
unsigned short: (unsigned short)0, short: (unsigned short)0, unsigned int:
(unsigned int)0, int: (unsigned int)0, unsigned long: (unsigned long)0, long:
(unsigned long)0, unsigned long long: (unsigned long long)0, long long:
(unsigned long long)0, default: (field_msk)))' (aka 'unsigned int') is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
bits_to_set = FIELD_PREP(field_msk, nval);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/bitfield.h:114:3: note: expanded from macro 'FIELD_PREP'
__BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/bitfield.h:71:53: note: expanded from macro '__BF_FIELD_CHECK'
BUILD_BUG_ON_MSG(__bf_cast_unsigned(_mask, _mask) > \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
./include/linux/build_bug.h:39:58: note: expanded from macro 'BUILD_BUG_ON_MSG'
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
./include/linux/compiler_types.h:357:22: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/compiler_types.h:345:23: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/compiler_types.h:337:9: note: expanded from macro '__compiletime_assert'
if (!(condition)) \
v2: Use REG_FIELD_PREP() macro instead of FIELD_PREP() (Jani)
Fixes: 99f55efb79 ("drm/i915/hwmon: Power PL1 limit and TDP setting")
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
[Joonas: Wrapped commit message error line length to be more reasonable]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221029044230.32128-1-gwan-gyeong.mun@intel.com
Expose power1_max_interval, that is the tau corresponding to PL1, as a
custom hwmon attribute. Some bit manipulation is needed because of the
format of PKG_PWR_LIM_1_TIME in
GT0_PACKAGE_RAPL_LIMIT register (1.x * power(2,y)).
v2: Update date and kernel version in Documentation (Badal)
v3: Cleaned up hwm_power1_max_interval_store() (Badal)
v4:
- Fixed review comments (Anshuman)
- In hwm_power1_max_interval_store() get PKG_MAX_WIN from
pkg_power_sku when it is valid (Ashutosh)
- KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v5: On some of the DGFX setups it is seen that although pkg_power_sku
is valid the field PKG_WIN_MAX is not populated. So it is
decided to stick to default value of PKG_WIN_MAX (Ashutosh)
v6: Change contact to intel-gfx (Rodrigo)
Fixed variable types in hwm_power1_max_interval_store (Andi)
Documented PKG_MAX_WIN_DEFAULT (Andi)
Removed else in hwm_attributes_visible (Andi)
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-7-ashutosh.dixit@intel.com
Use i915 HWMON to display device level energy input.
v2: Updated the date and kernel version in feature description
v3:
- Cleaned up hwm_energy function and removed unused function
i915_hwmon_energy_status_get (Ashutosh)
v4: KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v5: Change contact to intel-gfx (Rodrigo)
Change return type of hwm_energy to void (Andi)
Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-5-ashutosh.dixit@intel.com
Use i915 HWMON to display/modify dGfx power PL1 limit and TDP setting.
v2:
- Fix review comments (Ashutosh)
- Do not restore power1_max upon module unload/load sequence
because on production systems modules are always loaded
and not unloaded/reloaded (Ashutosh)
- Fix review comments (Jani)
- Remove endianness conversion (Ashutosh)
v3: Add power1_rated_max (Ashutosh)
v4:
- Use macro HWMON_CHANNEL_INFO to define power channel (Guenter)
- Update the date and kernel version in Documentation (Badal)
v5: Use hwm_ prefix for static functions (Ashutosh)
v6: Fix review comments (Ashutosh)
v7:
- Define PCU_PACKAGE_POWER_SKU for DG1,DG2 and move
PKG_PKG_TDP to intel_mchbar_regs.h (Anshuman)
- KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v8: Change contact to intel-gfx (Rodrigo)
Minor change to val_sku_unit init (Andi)
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-4-ashutosh.dixit@intel.com
Use i915 HWMON subsystem to display current input voltage.
v2:
- Updated date and kernel version in feature description
- Fixed review comments (Ashutosh)
v3: Use macro HWMON_CHANNEL_INFO to define hwmon channel (Guenter)
v4:
- Fixed review comments (Ashutosh)
- Use hwm_ prefix for static functions (Ashutosh)
v5: Added unit of voltage as millivolts (Ashutosh)
v6: KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v7: Change contact to intel-gfx (Rodrigo)
GEN12_RPSTAT1 is available for all Gen12+ (Andi)
Added Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
to MAINTAINERS
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-3-ashutosh.dixit@intel.com
The i915 HWMON module will be used to expose voltage, power and energy
values for dGfx. Here we set up i915 hwmon infrastructure including i915
hwmon registration, basic data structures and functions.
v2:
- Create HWMON infra patch (Ashutosh)
- Fixed review comments (Jani)
- Remove "select HWMON" from i915/Kconfig (Jani)
v3: Use hwm_ prefix for static functions (Ashutosh)
v4: s/#ifdef CONFIG_HWMON/#if IS_REACHABLE(CONFIG_HWMON)/ since the former
doesn't work if hwmon is compiled as a module (Guenter)
v5: Fixed review comments (Jani)
v6: s/kzalloc/devm_kzalloc/ (Andi)
v7: s/hwmon_device_register_with_info/
devm_hwmon_device_register_with_info/ (Ashutosh)
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-2-ashutosh.dixit@intel.com