As the smu_context will be invisible from outside(of power). Also,
the smu_debug_mask can be shared around all power code instead of
some specific framework(swSMU) only.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmG2fU0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC7EH/3R7Rt+OD8Wn8Ss3
w8V+dBxVwa2u2oMTyUHPxaeOXZ7bi38XlUdLFPOK/76bGwO0a5TmYZqsWdRbGyT0
HfcYjHsQ0lbJXk/nh2oM47oJxJXVpThIHXJEk0FZ0Y5t+DYjIYlNHzqZymUyhLem
St74zgWcyT+MXuqY34vB827FJDUnOxhhhi85tObeunaSPAomy9aiYidSC1ARREnz
iz2VUntP/QnRnKVvL2nUZNzcz1xL5vfCRSKsRGRSv3qW1Y/1M71ylt6JVmSftWq+
VmMdFxFhdrb1OK/1ct/930Un/UP2NG9EJsWxote2XYlnVSZHzDqH7lUhbqgdCcLz
1m2tVNY=
=7wRd
-----END PGP SIGNATURE-----
Merge v5.16-rc5 into drm-next
Thomas Zimmermann requested a fixes backmerge, specifically also for
96c5f82ef0 ("drm/vc4: fix error code in vc4_create_object()")
Just a bunch of adjacent changes conflicts, even the big pile of them
in vc4.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Update smu_v13 to match smu_v12 and smu_v11 behavior where this is
fetched from debugfs rather than in kernel logs on every boot.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This value does not get cached into adev->pm.fw_version during
startup for smu13 like it does for other SMU like smu12.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SMU firmware expects the driver maintains error context
and doesn't interact with SMU any more when SMU errors
occurred. That will aid in debugging SMU firmware issues.
Add SMU debug option support for this request, it can be
enabled or disabled via amdgpu_smu_debug debugfs file.
Use a 32-bit mask to indicate corresponding debug modes.
Currently, only one mode(HALT_ON_ERROR) is supported.
When enabled, it brings hardware to a kind of halt state
so that no one can touch it any more in the envent of SMU
errors.
The dirver interacts with SMU via sending messages. And
threre are three ways to sending messages to SMU in current
implementation. Handle them respectively as following:
1, smu_cmn_send_smc_msg_with_param() for normal timeout cases
Halt on any error.
2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
for longer timeout cases
Halt on errors apart from ETIME. Otherwise this way won't work.
Let the user handle ETIME error in such a case.
3, smu_cmn_send_msg_without_waiting() for no waiting cases
Halt on errors apart from ETIME. Otherwise second way won't work.
== Command Guide ==
1, enable HALT_ON_ERROR mode
# echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
2, disable HALT_ON_ERROR mode
# echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
v5:
- Use bit mask to allow more debug features.(Evan)
- Use WRAN() instead of BUG().(Evan)
v4:
- Set to halt state instead of a simple hang.(Christian)
v3:
- Use debugfs_create_bool().(Christian)
- Put variable into smu_context struct.
- Don't resend command when timeout.
v2:
- Resend command when timeout.(Lijo)
- Use debugfs file instead of module parameter.
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently, we don't find some neccesities to power on/off
SDMA in SMU hw_init/fini(). It makes more sense in SDMA
hw_init/fini().
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Kevin Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes various warnings relating to erroneous docstring syntax, of
which some are listed below:
warning: Function parameter or member 'adev' not described in
'amdgpu_atomfirmware_ras_rom_addr'
...
warning: expecting prototype for amdgpu_atpx_validate_functions().
Prototype was for amdgpu_atpx_validate() instead
...
warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new'
...
warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of
waves in-flight on this device
...
warning: This comment starts with '/**', but isn't a kernel-doc
comment. Refer Documentation/doc-guide/kernel-doc.rst
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd-drm-next-5.17-2021-12-02:
amdgpu:
- Use generic drm fb helpers
- PSR fixes
- Rework DCN3.1 clkmgr
- DPCD 1.3 fixes
- Misc display fixes can cleanups
- Clock query fixes for APUs
- LTTPR fixes
- DSC fixes
- Misc PM fixes
- RAS fixes
- OLED backlight fix
- SRIOV fixes
- Add STB (Smart Trace Buffer) for supported dGPUs
- IH rework
- Enable seamless boot for DCN3.01
amdkfd:
- Rework more stuff around IP discovery enumeration
- Further clean up of interfaces with amdgpu
- SVM fixes
radeon:
- Indentation fixes
UAPI:
- Add a new KFD header that defines some of the sysfs bitfields and enums that userspace has been using for a while
The corresponding bit-fields and enums in user mode are defined in
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/include/hsakmttypes.h
Signed-off-by: Dave Airlie <airlied@redhat.com>
# Conflicts:
# drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211202191643.5970-1-alexander.deucher@amd.com
== Description ==
All the power profile modes use the same strings (or a subset of)
Creating a public array of the strings will allow sharing rather than
duplicating for each chip
First patch only implements change for navi10, followup with other chips
== Changes ==
Create a declaration of the public array in kgd_pp_interface.h
Define the public array in amdgpu_pm.c
Modify the implementaiton of navi10_get_power_profile_mode to use new array
== Test ==
LOGFILE=pp_profile_strings.test.log
AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1`
AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'`
HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON}
lspci -nn | grep "VGA\|Display" > $LOGFILE
FILES="pp_power_profile_mode "
for f in $FILES
do
echo === $f === >> $LOGFILE
cat $HWMON_DIR/device/$f >> $LOGFILE
done
cat $LOGFILE
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v1: Ideally power gate/ungate requests shouldn't come when smu block is
uninitialized. Add a WARN message to check the origins if such a thing
ever happens.
v2: Use dev_WARN to log device info (Felix/Guchun).
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kevin Yang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fill voltage fields in metrics table.
Signed-off-by: Surbhi Kakarya <Surbhi.Kakarya@amd.com>
Acked-by: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Print the error on command submission immediately after submitting to
the SMU. This is rate-limited. It helps to immediately know there was an
error on command submission, rather than leave it up to clients to report
the error, as sometimes they do not.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a print in sienna_cichlid_run_btc() to help debug and to mirror other
platforms, as no print is present in the caller, smu_smc_hw_setup().
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add prints where there are none and none are printed in the callee.
Remove the word "previous" from comment and print to make it shorter and
avoid confusion in various prints.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
support ECC TABLE message, this table include umc ras error count
and error address
v2:
add smu version check to query whether support ecctable
call smu_cmn_update_table to get ecctable directly
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
reset message.
v2: move mode-1 reset function to aldebaran_ppt.c since it's aldebaran
specific currently.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Print Navi1x fine grained clocks in a consistent manner with other SOCs.
Don't show aritificial DPM level when the current clock equals min or max.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add support that allow the userspace tool like RGP to get the GFX clock
value at runtime, the fix follow the old way to show the min/current/max
clocks level for compatible consideration.
=== Test ===
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 200Mhz *
1: 1100Mhz
2: 1600Mhz
then run stress test on one APU system.
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 200Mhz
1: 1040Mhz *
2: 1600Mhz
The current GFXCLK value is updated at runtime.
BugLink: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5260
Reviewed-by: Huang Ray <Ray.Huang@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Print Navi1x fine grained clocks in a consistent manner with other SOCs.
Don't show aritificial DPM level when the current clock equals min or max.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Also print the message index and parameter of the stuck command.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add support that allow the userspace tool like RGP to get the GFX clock
value at runtime, the fix follow the old way to show the min/current/max
clocks level for compatible consideration.
=== Test ===
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 200Mhz *
1: 1100Mhz
2: 1600Mhz
then run stress test on one APU system.
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 200Mhz
1: 1040Mhz *
2: 1600Mhz
The current GFXCLK value is updated at runtime.
BugLink: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5260
Reviewed-by: Huang Ray <Ray.Huang@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Also print the message index and parameter of the stuck command.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the following coccicheck review:
./drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c:1174:14-18
:Unneeded variable
Remove unneeded variable used to store return value.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ran jianping <ran.jianping@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Previously there was a check based on chip # for chips that aligned to
>=CHIP_NAVI10 to have RLC stopped as part of DPMS check. This was because
of gfxclk being controlled by RLC in the newer designs.
As part of IP version checking though, this got changed to match IP
version for SMU. Because Renoir designs also include smu11 that meant
that even GFX9 started to stop RLC earlier.
Adjust to match GFX IP version instead of SMU IP version to restore the
previous behavior.
Fixes: a8967967f6 ("drm/amdgpu/amdgpu_smu: convert to IP version checking")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For ASICs not supporting power profile mode, don't show the attribute.
Verify that the function has been implemented by the subsystem.
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This was added by commit bd8dcea93a ("drm/amd/pm: add callbacks to
read/write sysfs file pp_power_profile_mode") but the feature was
deprecated from PMFW. Remove it from the driver.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On arcturus, not all platforms use PMFW based fan control. On such
ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon
knobs for fan control also as it is not possible to report or control
fan speed on such platforms through driver.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add missing check in smu_v11_0_init_display_count(),
Fixes: af3b89d3a6 ("drm/amdgpu/smu11.0: convert to IP version checking")
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check if VCN instances are harvested when controlling
VCN power gating and setting up VCN clocks.
Fixes: 1b592d00b4 ("drm/amdgpu/vcn: remove manual instance setting")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When selecting between levels in the force performance levels interface
sclk (gfxclk) was not set correctly for all levels. Select the proper
sclk settings for all levels.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1726
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
VEGA20 is 11.0.2, but it's handled by powerplay, not
swsmu.
Fixes: a8967967f6 ("drm/amdgpu/amdgpu_smu: convert to IP version checking")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Was missed in the conversion to IP version checking.
Fixes: af3b89d3a6 ("drm/amdgpu/smu11.0: convert to IP version checking")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when smu->adev->pm.ac_power == 0, message parameter with bit 16 set is saved
to smu->current_power_limit.
Fixes: 0cb4c62125 ("drm/amd/pm: correct power limit setting for SMU V11)"
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allow us to query instances versions more cleanly.
Instancing support is not consistent unfortunately. SDMA is a
good example. Sienna cichlid has 4 total SDMA instances, each
enumerated separately (HWIDs 42, 43, 68, 69). Arcturus has 8
total SDMA instances, but they are enumerated as multiple
instances of the same HWIDs (4x HWID 42, 4x HWID 43). UMC
is another example. On most chips there are multiple
instances with the same HWID. This allows us to support both
forms.
v2: rebase
v3: clarify instancing support
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use IP versions rather than asic_type to differentiate
IP version specific features.
v2: switch if statement to a switch statement
Acked-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use IP versions rather than asic_type to differentiate
IP version specific features.
v2: rebase
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use IP versions rather than asic_type to differentiate
IP version specific features.
v2: rebase
v3: switch some if statements to switch statements
v4: add yellow carp fix (Yifan)
v5: squash in fixes for YC and GS (Alex)
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Current RUNPM mechanism relies on PMFW to master the timing for BACO
in/exit. And that needs cooperation from sound driver for dstate
change notification for function 1(audio). Otherwise(on sound driver
missing), BACO cannot be kicked in correctly and hang will be observed
on RUNPM exit.
By switching back to legacy message way on sound driver missing,
we are able to fix the runpm hang observed for the scenario below:
amdgpu driver loaded -> runpm suspend kicked -> sound driver loaded
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reported-and-tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>