linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Christian König	77c092e054	drm/amdgpu: workaround for TLB seq race It can happen that we query the sequence value before the callback had a chance to run. Workaround that by grabbing the fence lock and releasing it again. Should be replaced by hw handling soon. Signed-off-by: Christian König <christian.koenig@amd.com> CC: stable@vger.kernel.org # 5.19+ Fixes: `5255e146c9` ("drm/amdgpu: rework TLB flushing") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113 Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Stefan Springer <stefanspr94@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:58:17 -05:00
Felix Kuehling	d1a372af1c	drm/amdgpu: Set MTYPE in PTE based on BO flags The same BO may need different MTYPEs and SNOOP flags in PTEs depending on its current location relative to the mapping GPU. Setting MTYPEs from clients ahead of time is not practical for coherent memory sharing. Instead determine the correct MTYPE for the desired coherence model and current BO location when updating the page tables. To maintain backwards compatibility with MTYPE-selection in AMDGPU_VA_OP_MAP, the coherence-model-based MTYPE selection is only applied if it chooses an MTYPE other than MTYPE_NC (the default). Add two AMDGPU_GEM_CREATE_... flags to indicate the coherence model. The default if no flag is specified is non-coherent (i.e. coarse-grained coherent at dispatch boundaries). Update amdgpu_amdkfd_gpuvm.c to use this new method to choose the correct MTYPE depending on the current memory location. v2: * check that bo is not NULL (e.g. PRT mappings) * Fix missing ~ bitmask in gmc_v11_0.c v3: * squash in "drm/amdgpu: Inherit coherence flags on dmabuf import" Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:41:42 -05:00
Harsh Jain	4b31b92b14	drm/amdgpu: complete gfxoff allow signal during suspend without delay change guarantees that gfxoff is allowed before moving further in s2idle sequence to add more reliablity about gfxoff in amdgpu IP's suspend flow Signed-off-by: Harsh Jain <harsh.jain@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:41:42 -05:00
Luben Tuikov	3b8164f808	drm/amdgpu: Decouple RAS EEPROM addresses from chips Abstract RAS I2C EEPROM addresses from chip names, and set their macro definition names to the address they set, not the chip they attach to. Since most chips either use I2C EEPROM address 0 or 40000h for the RAS table start offset, this leaves us with only two macro definitions as opposed to five, and removes the redundancy of four. Cc: Candice Li <candice.li@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:41:42 -05:00
Luben Tuikov	da858deab8	drm/amdgpu: Remove redundant I2C EEPROM address Remove redundant EEPROM_I2C_MADDR_54H address, since we already have it represented (ARCTURUS), and since we don't include the I2C device type identifier in EEPROM memory addresses, i.e. that high up in the device abstraction--we only use EEPROM memory addresses, as memory is continuously represented by EEPROM device(s) on the I2C bus. Add a comment describing what these memory addresses are, how they come about and how they're usually extracted from the device address byte. Cc: Candice Li <candice.li@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Fixes: `c9bdc6c3cf` ("drm/amdgpu: Add EEPROM I2C address support for ip discovery") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:41:42 -05:00
Kenneth Feng	60cfad329a	drm/amd/pm: enable mode1 reset on smu_v13_0_10 enable mode1 reset and prioritize debug port on smu_v13_0_10 as a more reliable message processing v2 - move mode1 reset callback to smu_v13_0_0_ppt.c Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:41:42 -05:00
Philip Yang	0bc71adc8b	drm/amdgpu: Drop eviction lock when allocating PT BO Re-take the eviction lock immediately again after the allocation is completed, to fix circular locking warning with drm_buddy allocator. Move amdgpu_vm_eviction_lock/unlock/trylock to amdgpu_vm.h as they are called from multiple files. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:41:41 -05:00
Philip Yang	c70e216696	drm/amdgpu: Unlock bo_list_mutex after error handling Get below kernel WARNING backtrace when pressing ctrl-C to kill kfdtest application. If amdgpu_cs_parser_bos returns error after taking bo_list_mutex, as caller amdgpu_cs_ioctl will not unlock bo_list_mutex, this generates the kernel WARNING. Add unlock bo_list_mutex after amdgpu_cs_parser_bos error handling to cleanup bo_list userptr bo. WARNING: kfdtest/2930 still has locks held! 1 lock held by kfdtest/2930: (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0xce5/0x1f10 [amdgpu] stack backtrace: dump_stack_lvl+0x44/0x57 get_signal+0x79f/0xd00 arch_do_signal_or_restart+0x36/0x7b0 exit_to_user_mode_prepare+0xfd/0x1b0 syscall_exit_to_user_mode+0x19/0x40 do_syscall_64+0x40/0x80 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:41:41 -05:00
Rajneesh Bhardwaj	adf65dff5d	drm/amdgpu: Fix the kerneldoc description amdgpu_ttm_tt_set_userptr() is also called by the KFD as part of initializing the user pages for userptr BOs and also while initializing the GPUVM for a KFD process so update the function description. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:41:41 -05:00
Christian König	f9e6949645	drm/amdgpu: workaround for TLB seq race It can happen that we query the sequence value before the callback had a chance to run. Workaround that by grabbing the fence lock and releasing it again. Should be replaced by hw handling soon. Signed-off-by: Christian König <christian.koenig@amd.com> CC: stable@vger.kernel.org # 5.19+ Fixes: `5255e146c9` ("drm/amdgpu: rework TLB flushing") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113 Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Stefan Springer <stefanspr94@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-09 17:22:48 -05:00
Ma Jun	e0b26b9482	drm/amdgpu: Fix the lpfn checking condition in drm buddy Because the value of man->size is changed during suspend/resume process, use mgr->mm.size instead of man->size here for lpfn checking. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220914125331.2467162-1-Jun.Ma2@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2022-11-08 14:00:18 +01:00
Dave Airlie	49e8e6343d	Merge tag 'amd-drm-next-6.2-2022-11-04' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.2-2022-11-04: amdgpu: - Add TMZ support for GC 11.0.1 - More IP version check conversions - Mode2 reset fixes for sienna cichlid - SMU 13.x fixes - RAS enablement on MP 13.x - Replace kmap with kmap_local_page() - Misc Clang warning fixes - SR-IOV fixes for GC 11.x - PCI AER fix - DCN 3.2.x commit sequence rework - SDMA 4.x doorbell fix - Expose additional new GC 11.x firmware versions - Misc code cleanups - S0i3 fixes - More DC FPU cleanup - Add more DC kerneldoc - Misc spelling and grammer fixes - DCN 3.1.x fixes - Plane modifier fix - MCA RAS enablement - Secure display locking fix - RAS TA rework - RAS EEPROM fixes - Fail suspend if eviction fails - Drop AMD specific DSC workarounds in favor of drm EDID quirks - SR-IOV suspend/resume fixes - Enable DCN support for ARM - Enable secure display on DCN 2.1 amdkfd: - Cache size fixes for GC 10.3.x - kfd_dev struct cleanup - GC11.x CWSR trap handler fix - Userptr fixes - Warning fixes radeon: - Replace kmap with kmap_local_page() UAPI: - Expose additional new GC 11.x firmware versions via the existing INFO query drm: - Add some new EDID DSC quirks Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221104205827.6008-1-alexander.deucher@amd.com	2022-11-08 16:32:31 +10:00
Thomas Zimmermann	45b64fd9f7	drm/fb-helper: Remove unnecessary include statements Remove include statements for <drm/drm_fb_helper.h> where it is not required (i.e., most of them). In a few places include other header files that are required by the source code. v3: * fix amdgpu include statements * fix rockchip include statements Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221103151446.2638-23-tzimmermann@suse.de	2022-11-05 17:12:04 +01:00
Thomas Zimmermann	8ab59da26b	drm/fb-helper: Move generic fbdev emulation into separate source file Move the generic fbdev implementation into its own source and header file. Adapt drivers. No functional changes, but some of the internal helpers have been renamed to fit into the drm_fbdev_ naming scheme. v3: * rename drm_fbdev.{c,h} to drm_fbdev_generic.{c,h} * rebase onto vmwgfx changes * rebase onto xlnx changes * fix include statements in amdgpu Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221103151446.2638-22-tzimmermann@suse.de	2022-11-05 17:12:04 +01:00
Thomas Zimmermann	0e3172bac3	drm/amdgpu: Don't set struct drm_driver.output_poll_changed Don't set struct drm_driver.output_poll_changed. It's used to restore the fbdev console. But as amdgpu uses generic fbdev emulation, the console is being restored by the DRM client helpers already. See the functions drm_kms_helper_hotplug_event() and drm_kms_helper_connector_hotplug_event() in drm_probe_helper.c. v2: * fix commit description (Christian) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221103151446.2638-5-tzimmermann@suse.de	2022-11-05 17:05:53 +01:00
Thomas Zimmermann	8e4e4c2f53	Merge drm/drm-next into drm-misc-next Backmerging drm/drm-next to get the latest changes in the xlnx driver. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2022-11-05 16:08:36 +01:00
Xiaogang Chen	fcf00f8d29	drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid mmu notifier does not always hold mm->sem during call back. That causes a race condition between kfd userprt buffer mapping and mmu notifier which leds to gpu shadder or SDMA access userptr buffer before it has been mapped to gpu VM. Always map userptr buffer to avoid that though it may make some userprt buffers mapped two times. Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:54 -04:00
Victor Zhao	ec4927d463	drm/amdgpu: fix for suspend/resume sequence under sriov - clear kiq ring after suspend/resume under sriov to aviod kiq ring test failure - update irq after resume to fix kiq interrput loss Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Kenneth Feng	5cefe31b2a	drm/amd/amdgpu: temporary workaround to skip ras error for gc_v11_0_3 temporary workaround to skip ras error for gc_v11_0_3 until IFWI release later Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Hawking Zhang	cfa61b8f9e	drm/amdgpu: switch to select_se_sh wrapper for gfx v9_0 To allow invoking ip specific callbacks Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Nathan Chancellor	f0d0f10873	drm/amdgpu: Fix type of second parameter in trans_msg() callback With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed. A proposed warning in clang aims to catch these at compile time, which reveals: drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c:412:15: error: incompatible function pointer types initializing 'void ()(struct amdgpu_device , u32, u32, u32, u32)' (aka 'void ()(struct amdgpu_device , unsigned int, unsigned int, unsigned int, unsigned int)') with an expression of type 'void (struct amdgpu_device , enum idh_request, u32, u32, u32)' (aka 'void (struct amdgpu_device , enum idh_request, unsigned int, unsigned int, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict] .trans_msg = xgpu_ai_mailbox_trans_msg, ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:435:15: error: incompatible function pointer types initializing 'void ()(struct amdgpu_device , u32, u32, u32, u32)' (aka 'void ()(struct amdgpu_device , unsigned int, unsigned int, unsigned int, unsigned int)') with an expression of type 'void (struct amdgpu_device , enum idh_request, u32, u32, u32)' (aka 'void (struct amdgpu_device , enum idh_request, unsigned int, unsigned int, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict] .trans_msg = xgpu_nv_mailbox_trans_msg, ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. The type of the second parameter in the prototype should be 'enum idh_request' instead of 'u32'. Update it to clear up the warnings. Link: https://github.com/ClangBuiltLinux/linux/issues/1750 Reported-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Paulo Miguel Almeida	320e2590e2	drm/amdgpu: Replace one-element array with flexible-array member One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element array with flexible-array member in struct _ATOM_FAKE_EDID_PATCH_RECORD and refactor the rest of the code accordingly. Important to mention is that doing a build before/after this patch results in no binary output differences. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [1]. Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/238 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 [1] Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Alex Deucher	e053d71f8c	drm/amdgpu/gfx11: set gfx.funcs in early init So the callbacks are set early in case we need them. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Alex Deucher	105195af02	drm/amdgpu/gfx10: set gfx.funcs in early init So the callbacks are set early in case we need them. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Alex Deucher	33034c5c2e	drm/amdgpu/gfx9: set gfx.funcs in early init So the callbacks are set before we use them. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Peng Ju Zhou	f8638ad7fc	drm/amdgpu: Remove unnecessary register program in SRIOV Remove unnecessary register program in SRIOV Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Yiqing Yao	8a1fbb4a5e	drm/amdgpu: Disable MCBP from soc21 for SRIOV [why] Start from soc21, CP does not support MCBP, so disable it. [how] Used amgpu_mcbp flag alone instead of checking if is in SRIOV to enable/disable MCBP. Only set flag to enable on asic_type prior to soc21 in SRIOV. Signed-off-by: Yiqing Yao <yiqing.yao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Yiqing Yao	0cfce2401e	drm/amdgpu: Clean up soc21 early init for SRIOV Use virt_init_setting instead of per ip version setting. Signed-off-by: Yiqing Yao <yiqing.yao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Graham Sider	9a1662f549	drm/amdgpu: extend halt_if_hws_hang to MES Hang on MES timeout if halt_if_hws_hang is set to 1. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Dave Airlie	441f0ec0ae	drm-misc-next for 6.2: UAPI Changes: Cross-subsystem Changes: - dma-buf: locking improvements - firmware: New API in the RaspberryPi firmware driver used by vc4 Core Changes: - client: Null pointer dereference fix in drm_client_buffer_delete() - mm/buddy: Add back random seed log - ttm: Convert ttm_resource to use size_t for its size, fix for an undefined behaviour Driver Changes: - bridge: - adv7511: use dev_err_probe - it6505: Fix return value check of pm_runtime_get_sync - panel: - sitronix: Fixes and clean-ups - lcdif: Increase DMA burst size - rockchip: runtime_pm improvements - vc4: Fix for a regression preventing the use of 4k @ 60Hz, and further HDMI rate constraints check. - vmwgfx: Cursor improvements -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCY2N8/gAKCRDj7w1vZxhR xT1TAQDiwdSyL/bzOk/WYggmj5wmFqb2PPFbFRt/DOTLl52ZdwEAmtE9I2WKvY8w sxZYF9gfFnQC3id7YwNs+CDb1kAangc= =6GBd -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2022-11-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 6.2: UAPI Changes: Cross-subsystem Changes: - dma-buf: locking improvements - firmware: New API in the RaspberryPi firmware driver used by vc4 Core Changes: - client: Null pointer dereference fix in drm_client_buffer_delete() - mm/buddy: Add back random seed log - ttm: Convert ttm_resource to use size_t for its size, fix for an undefined behaviour Driver Changes: - bridge: - adv7511: use dev_err_probe - it6505: Fix return value check of pm_runtime_get_sync - panel: - sitronix: Fixes and clean-ups - lcdif: Increase DMA burst size - rockchip: runtime_pm improvements - vc4: Fix for a regression preventing the use of 4k @ 60Hz, and further HDMI rate constraints check. - vmwgfx: Cursor improvements Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20221103083437.ksrh3hcdvxaof62l@houat	2022-11-04 12:33:04 +10:00
Christian König	a82f30b04c	drm/scheduler: rename dependency callback into prepare_job This now matches much better what this is doing. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-14-christian.koenig@amd.com	2022-11-03 12:45:20 +01:00
Christian König	1728baa7e4	drm/amdgpu: use scheduler dependencies for CS Entirely remove the sync obj in the job. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-11-christian.koenig@amd.com	2022-11-03 12:45:20 +01:00
Christian König	46e0270c71	drm/amdgpu: use scheduler dependencies for UVD msgs Instead of putting that into the job sync object. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-10-christian.koenig@amd.com	2022-11-03 12:45:20 +01:00
Christian König	aab9cf7b69	drm/amdgpu: use scheduler dependencies for VM updates Instead of putting that into the job sync object. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-9-christian.koenig@amd.com	2022-11-03 12:45:20 +01:00
Christian König	1b2d5eda5a	drm/amdgpu: move explicit sync check into the CS This moves the memory allocation out of the critical code path. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-8-christian.koenig@amd.com	2022-11-03 12:45:20 +01:00
Christian König	f7d66fb2ea	drm/amdgpu: cleanup scheduler job initialization v2 Init the DRM scheduler base class while allocating the job. This makes the whole handling much more cleaner. v2: fix coding style Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-7-christian.koenig@amd.com	2022-11-03 12:45:20 +01:00
Christian König	940ca22b7e	drm/amdgpu: drop amdgpu_sync from amdgpu_vmid_grab v2 Instead return the fence directly. Avoids memory allocation to store the fence. v2: cleanup coding style as well Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-6-christian.koenig@amd.com	2022-11-03 12:45:19 +01:00
Christian König	c5093cddf5	drm/amdgpu: drop the fence argument from amdgpu_vmid_grab This is always the job anyway. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-5-christian.koenig@amd.com	2022-11-03 12:45:19 +01:00
Christian König	4f91790b42	drm/amdgpu: use drm_sched_job_add_resv_dependencies for moves Use the new common scheduler functions to figure out what to wait for. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-4-christian.koenig@amd.com	2022-11-03 12:45:19 +01:00
Gavin Wan	8fe8ce896c	drm/amdgpu: Disable GPU reset on SRIOV before remove pci. The recent change brought a bug on SRIOV envrionment. It caused unloading amdgpu failed on Guest VM. The reason is that the VF FLR was requested while unloading amdgpu driver, but the VF FLR of SRIOV sequence is wrong while removing PCI device. For SRIOV, the guest driver should not trigger the whole XGMI hive to do the reset. Host driver control how the device been reset. Fixes: `f5c7e77970` ("drm/amdgpu: Adjust removal control flow for smu v13_0_2") Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-02 17:16:25 -04:00
Graham Sider	a3e5ce56f3	drm/amdgpu: disable GFXOFF during compute for GFX11 Temporary workaround to fix issues observed in some compute applications when GFXOFF is enabled on GFX11. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x	2022-11-02 17:16:11 -04:00
Mario Limonciello	8d4de331f1	drm/amd: Fail the suspend if resources can't be evicted If a system does not have swap and memory is under 100% usage, amdgpu will fail to evict resources. Currently the suspend carries on proceeding to reset the GPU: ``` [drm] evicting device resources failed [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR suspend of IP block <vcn_v3_0> failed -12 [drm] free PSP TMR buffer [TTM] Failed allocating page table [drm] evicting device resources failed amdgpu 0000:03:00.0: amdgpu: MODE1 reset amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset ``` At this point if the suspend actually succeeded I think that amdgpu would have recovered because the GPU would have power cut off and restored. However the kernel fails to continue the suspend from the memory pressure and amdgpu fails to run the "resume" from the aborted suspend. ``` ACPI: PM: Preparing to enter system sleep state S3 SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL\|__GFP_ZERO) cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0 node 0: slabs: 22, objs: 1122, free: 0 ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651) [drm:psp_hw_start [amdgpu]] ERROR PSP load kdb failed! [drm:psp_resume [amdgpu]] ERROR PSP resume failed [drm:amdgpu_device_fw_loading [amdgpu]] ERROR resume of IP block <psp> failed -62 amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62). PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62 amdgpu 0000:03:00.0: PM: failed to resume async: error -62 ``` To avoid this series of unfortunate events, fail amdgpu's suspend when the memory eviction fails. This will let the system gracefully recover and the user can try suspend again when the memory pressure is relieved. Reported-by: post@davidak.de Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-02 17:00:39 -04:00
Graham Sider	e542ca6e3e	drm/amdgpu: correct MES debugfs versions Use mes.sched_version, mes.kiq_version for debugfs as mes.ucode_fw_version does not contain correct versioning information. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-02 16:57:32 -04:00
Yifan Zhang	89b3554782	drm/amdgpu: set fb_modifiers_not_supported in vkms This patch to fix the gdm3 start failure with virual display: /usr/libexec/gdm-x-session[1711]: (II) AMDGPU(0): Setting screen physical size to 270 x 203 /usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): Failed to make import prime FD as pixmap: 22 /usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): failed to set mode: Invalid argument /usr/libexec/gdm-x-session[1711]: (WW) AMDGPU(0): Failed to set mode on CRTC 0 /usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): Failed to enable any CRTC gnome-shell[1840]: Running GNOME Shell (using mutter 42.2) as a X11 window and compositing manager /usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): failed to set mode: Invalid argument vkms doesn't have modifiers support, set fb_modifiers_not_supported to bring the gdm back. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-02 16:56:13 -04:00
Yifan Zha	2c763f37d0	drm/amdgpu: Skip program gfxhub_v3_0_3 system aperture registers under SRIOV [Why] gfxhub_v3_0_3 system aperture registers are removed from RLCG register access range. [How] Skip access gfxhub_v3_0_3 system aperture registers under SRIOV VF. These registers will be programmed on host side. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-01 11:45:56 -04:00
Yifan Zha	e1a29b28e7	drm/amdgpu: Skip access SDMA0_F32_CNTL in sdma_v6_0_enable under SRIOV [Why] SDMA0_F32_CNTL is a PF_only regitser which will be blocked by L1. RLCG will not program the register as well. [How] Skip to program SDMA0_F32_CNTL under SRIOV VF. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-01 11:45:50 -04:00
Yifan Zha	47a7470bb2	drm/amdgpu: Skip access GRBM_CNTL under SRIOV on gfx_v11 [Why] GRBM_CNTL is a PF_only register on gfx_v11. RLCG interface will return "out of range" under SRIOV VF. [How] Skip access GRBM_CNTL under gfx_v11 SRIOV VF. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-01 11:45:44 -04:00
Gavin Wan	2103c42198	drm/amdgpu: Disable GPU reset on SRIOV before remove pci. The recent change brought a bug on SRIOV envrionment. It caused unloading amdgpu failed on Guest VM. The reason is that the VF FLR was requested while unloading amdgpu driver, but the VF FLR of SRIOV sequence is wrong while removing PCI device. For SRIOV, the guest driver should not trigger the whole XGMI hive to do the reset. Host driver control how the device been reset. Fixes: `f5c7e77970` ("drm/amdgpu: Adjust removal control flow for smu v13_0_2") Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-01 11:44:49 -04:00
Candice Li	9d1b073d01	drm/amdgpu: Enable GFX RAS feature for gfx v11_0_3 v1: Support gfx ras feature enablement for gfx v11_0_3. v2: Update function name and error message. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-01 11:44:33 -04:00
Mukul Joshi	d69a3b762d	drm/amdkfd: Cleanup kfd_dev struct Cleanup kfd_dev struct by removing ddev and pdev as both drm_device and pci_dev can be fetched from amdgpu_device. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-27 15:12:09 -04:00

... 3 4 5 6 7 ...

11743 commits