The whole approach wasn't thought through till the end.
We already had a reset lock like this in the past and it caused the same problems like this one.
Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.
This reverts commit df9c8d1aa2.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If multiple process share system memory through /dev/shm, KFD allocate
memory should not fail if it reaches the system memory limit because
one copy of physical system memory are shared by multiple process.
Add module parameter no_system_mem_limit to provide user option to
disable system memory limit check at runtime using sysfs or during
driver module init using kernel boot argument. By default the system
memory limit is on.
Print out debug message to warn user if KFD allocate memory failed
because system memory reaches limit.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's related to the memory manager so move it there.
v2: inline the structure
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's related to the memory manager so move it there.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since that is where we store the other data related to
the stolen vga memory.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rather than open coding it everywhere.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
what:
the MQD's save and restore of KCQ (kernel compute queue)
cost lots of clocks during world switch which impacts a lot
to multi-VF performance
how:
introduce a paramter to control the number of KCQ to avoid
performance drop if there is no kernel compute queue needed
notes:
this paramter only affects gfx 8/9/10
v2:
refine namings
v3:
choose queues for each ring to that try best to cross pipes evenly.
v4:
fix indentation
some cleanupsin the gfx_compute_queue_acquire()
v5:
further fix on indentations
more cleanupsin gfx_compute_queue_acquire()
TODO:
in the future we will let hypervisor driver to set this paramter
automatically thus no need for user to configure it through
modprobe in virtual machine
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
bad_page_threshold could be configured to enable/disable the
associated bad page retirement feature in RAS.
When it's -1, ras will use typical bad page failure value to
handle bad page retirement.
When it's 0, disable bad page retirement, and no bad page
will be recorded and saved.
For other valid value, driver will use this manual value
as the threshold value of totoal bad pages.
v2: correct documentation of this parameter.
v3: remove confused statement in documentation.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.
During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.
v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.
v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;
[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249] dump_stack+0x98/0xd5
[ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833] free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831] ksys_ioctl+0x98/0xb0
[ 1230.204004] __x64_sys_ioctl+0x1a/0x20
[ 1230.205174] do_syscall_64+0x5f/0x250
[ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe
2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.
v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset
v5:
1. Fix some style issues.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: Luben Tukov <luben.tuikov@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 2eee0229f6.
Fallback to a stable base until we have a correct new one
Signed-off-by:Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Port functionality from the Radeon driver to support
VCE clock control.
Signed-off-by: Alex Jivin <alex.jivin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use WARN to print messages with backtrace when evictions are triggered.
This can help determine the root cause of evictions and help spot driver
bugs triggering evictions unintentionally, or help with performance tuning
by avoiding conditions that cause evictions in a specific workload.
The messages are controlled by a new module parameter that can be changed
at runtime:
echo Y > /sys/module/amdgpu/parameters/debug_evictions
echo N > /sys/module/amdgpu/parameters/debug_evictions
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The comments say that the serial number is a 16-digit HEX string so the
buffer needs to be at least 17 characters to hold the NUL terminator.
The other issue is that "size" returned from sprintf() is the number of
characters before the NUL terminator so the memcpy() wasn't copying the
terminator. The serial number needs to be NUL terminated so that it
doesn't lead to a read overflow in amdgpu_device_get_serial_number().
Also it's just cleaner and faster to sprintf() directly to adev->serial[]
instead of using a temporary buffer.
Fixes: 81a1624111 ("drm/amdgpu: Add unique_id and serial_number for Arcturus v3")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
a.Check whether mem train support when try to reserve related memory.
b.Remove ASIC check and atom firmware table version check as the check
of firmware capability is enough to achieve that purpose.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
User can check and set the enablement of throttling logging and
the interval between each logging.
V2: simplify the sysfs interface(no string parsing)
V3: add proper lock protection on updating throttling_logging_rs.interval
V4: documentation cosmetic per Luben's suggestion
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add some APU flags to simplify handling of different APU
variants. It's easier to understand the special cases
if we use names flags rather than checking device ids and
silicon revisions.
v2: rebase on latest code
Acked-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
At bringup we want to be able to disable various power features.
[How]
These features are already exposed as dc_debug_options and exercised
on other OSes. Create a new dc_debug_mask module parameter and expose
relevant bits, in particular
* DC_DISABLE_PIPE_SPLIT
* DC_DISABLE_STUTTER
* DC_DISABLE_DSC
* DC_DISABLE_CLOCK_GATING
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.
A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.
After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().
There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.
v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition
v3 (chk): grab the reset lock to avoid race in autodump_open,
rename debugfs file to amdgpu_autodump,
provide autodump_read as well,
style and code cleanups
v4: add 'bool app_listening' to differentiate situations, so that
the node can be reopened; also, there is no need to wait for
completion when no app is waiting for a dump.
v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
add 'app_state_mutex' for race conditions:
(1)Only 1 user can open this file node
(2)wait_dump() can only take effect after poll() executed.
(3)eliminated the race condition between release() and
wait_dump()
v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
removed state checking in amdgpu_debugfs_wait_dump
Improve on top of version 3 so that the node can be reopened.
v7: move reinit_completion into open() so that only one user
can open it.
v8: remove complete_all() from amdgpu_debugfs_wait_dump().
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
BACO is needed to support hibernate on Navi1X.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is to prepare for initializing discovery tmr size per
ASIC type
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the coding style, move and rename the definitions to
better match what they are supposed to be doing.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Implement an accessor of adev->tmz.enabled. Let not
code around access it as "if (adev->tmz.enabled)"
as the organization may change. Instead...
Recruit "bool amdgpu_is_tmz(adev)" to return
exactly this Boolean value. That is, this function
is now an accessor of an already initialized and
set adev and adev->tmz.
Add "void amdgpu_gmc_tmz_set(adev)" to check and
set adev->gmc.tmz_enabled at initialization
time. After which one uses "bool
amdgpu_is_tmz(adev)" to query whether adev
supports TMZ.
Also, remove circular header file include.
v2: Remove amdgpu_tmz.[ch] as requested.
v3: Move TMZ into GMC.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch to add amdgpu_tmz structure which stores all tmz related fields.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds tmz parameter to enable/disable
the feature in the amdgpu kernel module. Nomally,
by default, it should be auto (rely on the
hardware capability).
But right now, it need to set "off" to avoid
breaking other developers' work because it's not
totally completed.
Will set "auto" till the feature is stable and
completely verified.
v2: add "auto" option for future use.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
According to the current kiq read register method,
there will be race condition when using KIQ to read
register if multiple clients want to read at same time
just like the expample below:
1. client-A start to read REG-0 throguh KIQ
2. client-A poll the seqno-0
3. client-B start to read REG-1 through KIQ
4. client-B poll the seqno-1
5. the kiq complete these two read operation
6. client-A to read the register at the wb buffer and
get REG-1 value
Therefore, use amdgpu_device_wb_get() to request reg_val_offs
for each kiq read register.
v2: fix the error remove
v3: fix the print typo
v4: remove unused variables
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
clean up unused variable:
1. ring_lru_list
2. ring_lru_list_lock
related-commit:
drm/amdgpu: remove ring lru handling
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Vega20 arbitrates pstate at hive level and not device level. Last peer to
remote buffer unmap could drop P-State while another process is still
remote buffer mapped.
With this fix, P-States still needs to be disabled for now as SMU bug
was discovered on synchronous P2P transfers. This should be fixed in the
next FW update.
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu uses lots of pr_* calls for printing error messages.
With this prefix, errors shall be more obvious to the end
use regarding its origin, and may help debugging.
Prefix format:
[xxx.xxxxx] amdgpu: ...
Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add indirect access support to registers outside of
mmio bar.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
all the register access through kiq is redirected
to amdgpu_kiq_rreg/amdgpu_kiq_wreg
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
those are not needed anymore
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
the workaround is not needed for soc15 ASICs except
for vega10. it is even not needed with latest vega10
vbios.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.
v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum
v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list
v4:
use array_index_nospec for user space controlled variables
fix possible checkpatch.pl warnings
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We have three ib pools, they are normal, VM, direct pools.
Any jobs which schedule IBs without dependence on gpu scheduler should
use DIRECT pool.
Any jobs schedule direct VM update IBs should use VM pool.
Any other jobs use NORMAL pool.
v2: squash in coding style fix
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allow for reading of information like manufacturer, product number
and serial number from the FRU chip. Report the serial number as
the new sysfs file serial_number. Note that this only works on
server cards, as consumer cards do not feature the FRU chip, which
contains this information.
v2: Add documentation to amdgpu.rst, add helper functions,
rename functions for consistency, fix bad starting offset
v3: Remove testing definitions
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
what changed:
1)provide new implementation interface for the rlcg access path
2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op
function can access reg that need RLCG path help
now even debugfs's reg_op can used to dump wave.
tested-by: Monk Liu <monk.liu@amd.com>
tested-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
HDP ras error counters are dirty ones after cold reboot
Read operation is needed to reset them to 0
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5JsVQeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHZEH+wddtJO4dZk5TZdF
KZB2w2ldsCvrBZAmas1TVvm8ncvUf+ATUZcSIzvZ3YbLmxsuLF2Cz3kD+n+36Mvy
ejwq8Scl7jwnouYps/Gfd6rRj/uCafqST4qp15GMGeiy2ST4A8dJrv5IAgZhD8/N
SN1bSr1AXpZ2JlEzzLDQ/NdVoNMS6IzCOsaINZcc60/XQoQZFRBWamMJFqu+CmXD
SBJOybQNFJhziy45cGZSAl+67sSCcoPftwTs0Stu4CJsvFWRb3MsbNTDS51Hjcc4
3tdgGOhoNXzyzZr96MEAHmiaW4VLQv0PGgUfOajE35viMz48OrwCTru8Kbuae3XM
YHV4qJk=
=yiem
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXkpecgAKCRDj7w1vZxhR
xU4CAQD/8ZzfYroiiEI7FHEgvnuldGYZqA9kbNAO2Vueeac0OgD+Jojewdxy+pJ9
dxfA/POdtzx3hOdN+U1YgaDC0hbXwww=
=XgKq
-----END PGP SIGNATURE-----
Merge v5.6-rc2 into drm-misc-next
Lyude needs some patches in 5.6-rc2 and we didn't bring drm-misc-next
forward yet, so it looks like a good occasion.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
VBLANK callbacks in struct drm_driver are deprecated in favor of
their equivalents in struct drm_crtc_funcs. Convert amdgpu over.
v2:
* don't wrap existing functions; change signature instead
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123135943.24140-6-tzimmermann@suse.de
So we know whether we in are in runtime suspend or
system suspend.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reading some registers by mmio will result in hang when GPU is in
"gfxoff" state.This problem can be solved by GPU in "ring command
packages" way.
Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The only data fabric information the adev struct currently
contains is a function pointer table. In the near future,
we will be adding some cached DF information into adev. As
such, this patch creates a new amdgpu_df struct for adev.
Right now, it only containst the old function pointer table,
but new stuff will be added soon.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The method of getting fb_loc changed from parsing VBIOS to
taking certain offset from top of VRAM
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In preparation for doing XGMI reset synchronization using task barrier.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-next-5.6-2019-12-11:
amdgpu:
- Add MST atomic routines
- Add support for DMCUB (new helper microengine for displays)
- Add OEM i2c support in DC
- Use vstartup for vblank events on DCN
- Simplify Kconfig for DC
- Renoir fixes for DC
- Clean up function pointers in DC
- Initial support for HDCP 2.x
- Misc code cleanups
- GFX10 fixes
- Rework JPEG engine handling for VCN
- Add clock and power gating support for JPEG
- BACO support for Arcturus
- Cleanup PSP ring handling
- Add framework for using BACO with runtime pm to save power
- Move core pci state handling out of the driver for pm ops
- Allow guest power control in 1 VF case with SR-IOV
- SR-IOV fixes
- RAS fixes
- Support for power metrics on renoir
- Golden settings updates for gfx10
- Enable gfxoff on supported navi10 skus
- Update MAINTAINERS
amdkfd:
- Clean up generational gfx code
- Fixes for gfx10
- DIQ fixes
- Share more code with amdgpu
radeon:
- PPC DMA fix
- Register checker fixes for r1xx/r2xx
- Misc cleanups
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191211223020.7510-1-alexander.deucher@amd.com
Currently each XGMI node reset wq does not run in parrallel if bound to same
cpu. Make change to bound the xgmi_reset_work item to different cpus.
XGMI requires all nodes enter into baco within very close proximity before
any node exit baco. So schedule the xgmi_reset_work wq twice for enter/exit
baco respectively.
To use baco for XGMI, PMFW supported for baco on XGMI needs to be involved.
The case that PSP reset and baco reset coexist within an XGMI hive never exist
and is not in the consideration.
v2: define use_baco flag to simplify the code for xgmi baco sequence
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>