1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/drivers/gpu/drm/amd/amdkfd
Mukul Joshi b179fc28d5 drm/amdkfd: Fix circular lock dependency warning
[  168.544078] ======================================================
[  168.550309] WARNING: possible circular locking dependency detected
[  168.556523] 5.16.0-kfd-fkuehlin #148 Tainted: G            E
[  168.562558] ------------------------------------------------------
[  168.568764] kfdtest/3479 is trying to acquire lock:
[  168.573672] ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at:
		kfd_topology_device_by_id+0x16/0x60 [amdgpu] [  168.583663]
                but task is already holding lock:
[  168.589529] ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at:
		vm_mmap_pgoff+0xa9/0x180 [  168.597755]
                which lock already depends on the new lock.

[  168.605970]
                the existing dependency chain (in reverse order) is:
[  168.613487]
                -> #3 (&mm->mmap_lock#2){++++}-{3:3}:
[  168.619700]        lock_acquire+0xca/0x2e0
[  168.623814]        down_read+0x3e/0x140
[  168.627676]        do_user_addr_fault+0x40d/0x690
[  168.632399]        exc_page_fault+0x6f/0x270
[  168.636692]        asm_exc_page_fault+0x1e/0x30
[  168.641249]        filldir64+0xc8/0x1e0
[  168.645115]        call_filldir+0x7c/0x110
[  168.649238]        ext4_readdir+0x58e/0x940
[  168.653442]        iterate_dir+0x16a/0x1b0
[  168.657558]        __x64_sys_getdents64+0x83/0x140
[  168.662375]        do_syscall_64+0x35/0x80
[  168.666492]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  168.672095]
                -> #2 (&type->i_mutex_dir_key#6){++++}-{3:3}:
[  168.679008]        lock_acquire+0xca/0x2e0
[  168.683122]        down_read+0x3e/0x140
[  168.686982]        path_openat+0x5b2/0xa50
[  168.691095]        do_file_open_root+0xfc/0x190
[  168.695652]        file_open_root+0xd8/0x1b0
[  168.702010]        kernel_read_file_from_path_initns+0xc4/0x140
[  168.709542]        _request_firmware+0x2e9/0x5e0
[  168.715741]        request_firmware+0x32/0x50
[  168.721667]        amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu]
[  168.730060]        smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu]
[  168.738414]        fiji_start_smu+0xcf/0x4e0 [amdgpu]
[  168.745539]        pp_dpm_load_fw+0x21/0x30 [amdgpu]
[  168.752503]        amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu]
[  168.760698]        amdgpu_device_fw_loading+0xb8/0x140 [amdgpu]
[  168.768412]        amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu]
[  168.776285]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[  168.784034]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[  168.791161]        local_pci_probe+0x40/0x80
[  168.797027]        work_for_cpu_fn+0x10/0x20
[  168.802839]        process_one_work+0x273/0x5b0
[  168.808903]        worker_thread+0x20f/0x3d0
[  168.814700]        kthread+0x176/0x1a0
[  168.819968]        ret_from_fork+0x1f/0x30
[  168.825563]
                -> #1 (&adev->pm.mutex){+.+.}-{3:3}:
[  168.834721]        lock_acquire+0xca/0x2e0
[  168.840364]        __mutex_lock+0xa2/0x930
[  168.846020]        amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu]
[  168.853257]        amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu]
[  168.861547]        kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu]
[  168.869478]        kfd_create_crat_image_virtual+0x447/0x510 [amdgpu]
[  168.877884]        kfd_topology_add_device+0x5c8/0x6f0 [amdgpu]
[  168.885556]        kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu]
[  168.893347]        amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu]
[  168.901177]        amdgpu_device_init.cold+0x141b/0x1716 [amdgpu]
[  168.909025]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[  168.916458]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[  168.923442]        local_pci_probe+0x40/0x80
[  168.929249]        work_for_cpu_fn+0x10/0x20
[  168.935008]        process_one_work+0x273/0x5b0
[  168.940944]        worker_thread+0x20f/0x3d0
[  168.946623]        kthread+0x176/0x1a0
[  168.951765]        ret_from_fork+0x1f/0x30
[  168.957277]
                -> #0 (&topology_lock){++++}-{3:3}:
[  168.965993]        check_prev_add+0x8f/0xbf0
[  168.971613]        __lock_acquire+0x1299/0x1ca0
[  168.977485]        lock_acquire+0xca/0x2e0
[  168.982877]        down_read+0x3e/0x140
[  168.987975]        kfd_topology_device_by_id+0x16/0x60 [amdgpu]
[  168.995583]        kfd_device_by_id+0xa/0x20 [amdgpu]
[  169.002180]        kfd_mmap+0x95/0x200 [amdgpu]
[  169.008293]        mmap_region+0x337/0x5a0
[  169.013679]        do_mmap+0x3aa/0x540
[  169.018678]        vm_mmap_pgoff+0xdc/0x180
[  169.024095]        ksys_mmap_pgoff+0x186/0x1f0
[  169.029734]        do_syscall_64+0x35/0x80
[  169.035005]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  169.041754]
                other info that might help us debug this:

[  169.053276] Chain exists of:
                  &topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2

[  169.068389]  Possible unsafe locking scenario:

[  169.076661]        CPU0                    CPU1
[  169.082383]        ----                    ----
[  169.088087]   lock(&mm->mmap_lock#2);
[  169.092922]                                lock(&type->i_mutex_dir_key#6);
[  169.100975]                                lock(&mm->mmap_lock#2);
[  169.108320]   lock(&topology_lock);
[  169.112957]
                 *** DEADLOCK ***

This commit fixes the deadlock warning by ensuring pm.mutex is not
held while holding the topology lock. For this, kfd_local_mem_info
is moved into the KFD dev struct and filled during device init.
This cached value can then be used instead of querying the value
again and again.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-28 17:45:40 -04:00
..
cik_event_interrupt.c drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid 2022-02-09 14:14:53 -05:00
cik_int.h drm/amdkfd: Clean up reference of radeon 2018-07-11 22:33:08 -04:00
cik_regs.h drm/amdkfd: Delete a duplicate statement in set_pasid_vmid_mapping() 2018-11-05 14:21:13 -05:00
cwsr_trap_handler.h drm/amdkfd: Fix saving the ACC vgprs for Aldebaran 2021-03-23 22:56:55 -04:00
cwsr_trap_handler_gfx8.asm drm/amdkfd: Add aldebaran trap handler support 2021-03-10 00:02:24 -05:00
cwsr_trap_handler_gfx9.asm drm/amdkfd: Fix saving the ACC vgprs for Aldebaran 2021-03-23 22:56:55 -04:00
cwsr_trap_handler_gfx10.asm drm/amdkfd: Fix spurious debug exception on gfx10 2020-08-10 17:26:51 -04:00
Kconfig drm/amdkfd: Add CONFIG_HSA_AMD_SVM 2021-04-20 21:50:35 -04:00
kfd_chardev.c drm/amdkfd: Fix circular lock dependency warning 2022-04-28 17:45:40 -04:00
kfd_crat.c drm/amdkfd: Fix circular lock dependency warning 2022-04-28 17:45:40 -04:00
kfd_crat.h drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_debugfs.c drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_device.c drm/amdkfd: Fix circular lock dependency warning 2022-04-28 17:45:40 -04:00
kfd_device_queue_manager.c drm/amdkfd: Fix GWS queue count 2022-04-21 11:30:42 -04:00
kfd_device_queue_manager.h drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_device_queue_manager_cik.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_device_queue_manager_v9.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_device_queue_manager_v10.c drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_device_queue_manager_vi.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_doorbell.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_events.c drm/amdkfd: Ignore bogus signals from MEC efficiently 2022-04-25 17:05:48 -04:00
kfd_events.h drm/amdkfd: Asynchronously free events 2022-04-12 14:20:13 -04:00
kfd_flat_memory.c drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_int_process_v9.c drm/amdkfd: Ignore bogus signals from MEC efficiently 2022-04-25 17:05:48 -04:00
kfd_interrupt.c drm/amdkfd: Improve concurrency of event handling 2022-04-07 16:34:24 -04:00
kfd_iommu.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_iommu.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_kernel_queue.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_kernel_queue.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_migrate.c drm/amdkfd: use kvcalloc() instead of kvmalloc() in kfd_migrate 2022-04-22 14:49:54 -04:00
kfd_migrate.h drm/amdkfd: fix svm_migrate_fini warning 2021-09-23 16:34:57 -04:00
kfd_module.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_mqd_manager.c drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_mqd_manager.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_mqd_manager_cik.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_mqd_manager_v9.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_mqd_manager_v10.c drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_mqd_manager_vi.c drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_packet_manager.c drm/amdkfd: remove unneeded unmap single queue option 2022-02-14 15:08:41 -05:00
kfd_packet_manager_v9.c drm/amdkfd: Use proper enum in pm_unmap_queues_v9() 2022-02-17 15:59:06 -05:00
kfd_packet_manager_vi.c drm/amdkfd: remove unneeded unmap single queue option 2022-02-14 15:08:41 -05:00
kfd_pasid.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_pm4_headers.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_pm4_headers_ai.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_pm4_headers_aldebaran.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_pm4_headers_diq.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_pm4_headers_vi.h drm/amdkfd: Fix leftover errors and warnings 2022-02-14 15:08:40 -05:00
kfd_pm4_opcodes.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_priv.h drm/amdkfd: Fix circular lock dependency warning 2022-04-28 17:45:40 -04:00
kfd_process.c drm/amdkfd: Ignore bogus signals from MEC efficiently 2022-04-25 17:05:48 -04:00
kfd_process_queue_manager.c drm/amdkfd: CRIU add support for GWS queues 2022-04-21 11:32:38 -04:00
kfd_queue.c drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_smi_events.c drm/amdkfd: Create file descriptor after client is added to smi_clients list 2022-03-31 23:05:55 -04:00
kfd_smi_events.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
kfd_svm.c drm/amdkfd: Update mapping if range attributes changed 2022-04-26 11:42:59 -04:00
kfd_svm.h drm/amdkfd: Add SVM range mapped_to_gpu flag 2022-04-26 11:42:44 -04:00
kfd_topology.c drm/amdkfd: Fix circular lock dependency warning 2022-04-28 17:45:40 -04:00
kfd_topology.h drm/amdkfd: update SPDX license header 2022-02-14 15:08:40 -05:00
Makefile drm/amdkfd: Remove unused old debugger implementation 2022-02-09 16:57:51 -05:00
soc15_int.h drm/amdkfd: add sdma poison consumption handling 2021-06-07 14:57:24 -04:00