1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/drivers/infiniband/hw/hns
Yangyang Li b0969f8389 RDMA/hns: Do not destroy QP resources in the hw resetting phase
When hns_roce_v2_destroy_qp() is called, the brief calling process of the
driver is as follows:

 ......
 hns_roce_v2_destroy_qp
 hns_roce_v2_qp_modify
	   hns_roce_cmd_mbox
 hns_roce_qp_destroy

If hns_roce_cmd_mbox() detects that the hardware is being reset during the
execution of the hns_roce_cmd_mbox(), the driver will not be able to get
the return value from the hardware (the firmware cannot respond to the
driver's mailbox during the hardware reset phase).

The driver needs to wait for the hardware reset to complete before
continuing to execute hns_roce_qp_destroy(), otherwise it may happen that
the driver releases the resources but the hardware is still accessing. In
order to fix this problem, HNS RoCE needs to add a piece of code to wait
for the hardware reset to complete.

The original interface get_hw_reset_stat() is the instantaneous state of
the hardware reset, which cannot accurately reflect whether the hardware
reset is completed, so it needs to be replaced with the ae_dev_reset_cnt
interface.

The sign that the hardware reset is complete is that the return value of
the ae_dev_reset_cnt interface is greater than the original value
reset_cnt recorded by the driver.

Fixes: 6a04aed6af ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset")
Link: https://lore.kernel.org/r/20211123142402.26936-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:20:24 -04:00
..
hns_roce_ah.c RDMA/hns: Avoid filling sl in high 3 bits of vlan_id 2020-12-11 15:21:34 -04:00
hns_roce_alloc.c RDMA/hns: Delete unused hns bitmap interface 2021-08-24 09:15:17 -03:00
hns_roce_cmd.c RDMA/hns: Fix the double unlock problem of poll_sem 2021-08-03 13:42:44 -03:00
hns_roce_cmd.h RDMA/hns: Fix incorrect symbol types 2020-12-11 15:21:35 -04:00
hns_roce_common.h RDMA/hns: Add hr_reg_write_bool() 2021-06-21 15:03:41 -03:00
hns_roce_cq.c RDMA/hns: Add the check of the CQE size of the user space 2021-09-27 14:49:49 -03:00
hns_roce_db.c RDMA: Use the sg_table directly and remove the opencoded version from umem 2021-08-24 19:52:40 -03:00
hns_roce_device.h RDMA/hns: Use the core code to manage the fixed mmap entries 2021-10-29 14:07:31 -03:00
hns_roce_hem.c RDMA: Fix kernel-doc warnings about wrong comment 2021-06-21 20:32:50 -03:00
hns_roce_hem.h RDMA/hns: Clean the hardware related code for HEM 2021-05-28 20:13:58 -03:00
hns_roce_hw_v1.c RDMA: Constify netdev->dev_addr accesses 2021-10-25 14:33:09 -03:00
hns_roce_hw_v1.h RDMA/hns: Clean the hardware related code for HEM 2021-05-28 20:13:58 -03:00
hns_roce_hw_v2.c RDMA/hns: Do not destroy QP resources in the hw resetting phase 2021-11-25 13:20:24 -04:00
hns_roce_hw_v2.h RDMA/hns: Delete unnecessary blank lines. 2021-08-26 12:12:21 -03:00
hns_roce_hw_v2_dfx.c RDMA/hns: Dump detailed driver-specific CQ 2019-04-08 13:05:25 -03:00
hns_roce_main.c RDMA/hns: Use the core code to manage the fixed mmap entries 2021-10-29 14:07:31 -03:00
hns_roce_mr.c RDMA/hns: Fix return in hns_roce_rereg_user_mr() 2021-08-19 11:12:04 -03:00
hns_roce_pd.c RDMA/hns: Use IDA interface to manage uar index 2021-08-24 09:15:16 -03:00
hns_roce_qp.c RDMA/hns: Delete unnecessary blank lines. 2021-08-26 12:12:21 -03:00
hns_roce_restrack.c RDMA: Add a dedicated CQ resource tracker function 2020-06-23 11:46:27 -03:00
hns_roce_srq.c RDMA/hns: Use IDA interface to manage srq index 2021-08-24 09:15:16 -03:00
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile RDMA/hns: Fix build error again 2019-10-29 16:16:54 -03:00