linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Bob Pearson	9cc6290991	RDMA/rxe: Get rid of pkt resend on err Currently the rxe_driver detects packet drops by ip_local_out() which occur before the packet is sent on the wire and attempts to resend them. This is redundant with the usual retry mechanism which covers packets that get dropped in transit to or from the remote node. The way this is implemented is not robust since it sets need_req_skb and waits for the number of local skbs outstanding for this qp to drop below a low water mark. This is racy since the skb may be sent to the destructor before the requester can set the need_req_skb flag. This will cause a deadlock in the send path for that qp. This patch removes this mechanism since the normal retry path will correct the error and resend the packet and it makes no difference if the packet is dropped locally or later. Link: https://lore.kernel.org/r/20240329145513.35381-14-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:34 -03:00
Bob Pearson	4891f4fed0	RDMA/rxe: Don't schedule rxe_completer from rxe_requester Now that rxe_completer() is always called serially after rxe_requester() there is no reason to schedule rxe_completer() from rxe_requester(). Link: https://lore.kernel.org/r/20240329145513.35381-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	cd8aaddf0d	RDMA/rxe: Remove save/rollback_state in rxe_requester Now that req.task and comp.task are merged it is no longer necessary to call save_state() before calling rxe_xmit_pkt() and rollback_state() if rxe_xmit_pkt() fails. This was done originally to prevent races between rxe_completer() and rxe_requester() which now cannot happen. Link: https://lore.kernel.org/r/20240329145513.35381-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	67f57892f9	RDMA/rxe: Merge request and complete tasks Currently the rxe driver has three work queue tasks per qp. These are the req.task, comp.task and resp.task which call rxe_requester(), rxe_completer() and rxe_responder() respectively directly or on work queues. Each of these subroutines checks to see if there is work to be performed on the send queue or on the response packet queue or the request packet queue and will run until there is no work remaining or yield the cpu and reschedule itself until there is no work remaining. This commit combines the req.task and comp.task into a single send.task and renames the resp.task to the recv.task. The combined send.task calls rxe_requester() and rxe_completer() serially and continues until all work on both the send queue and the response packet queue are done. In various benchmarks the performance is either improved or left the same. At high scale there is a significant reduction in the load on the cpu. This is the first step in combining these two tasks. Once they are serialized cross rescheduling of req.task and comp.task can be more efficiently handled by just letting the send.task continue to run. This will be done in the next several patches. Link: https://lore.kernel.org/r/20240329145513.35381-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	5d122db2ff	RDMA/rxe: Fix incomplete state save in rxe_requester If a send packet is dropped by the IP layer in rxe_requester() the call to rxe_xmit_packet() can fail with err == -EAGAIN. To recover, the state of the wqe is restored to the state before the packet was sent so it can be resent. However, the routines that save and restore the state miss a significnt part of the variable state in the wqe, the dma struct which is used to process through the sge table. And, the state is not saved before the packet is built which modifies the dma struct. Under heavy stress testing with many QPs on a fast node sending large messages to a slow node dropped packets are observed and the resent packets are corrupted because the dma struct was not restored. This patch fixes this behavior and allows the test cases to succeed. Fixes: `3050b99850` ("IB/rxe: Fix race condition between requester and completer") Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-31 15:24:12 -03:00
Jason Gunthorpe	5f004bcaee	Linux 6.4 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmSYzfYeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/ucH/iOM/1Py/fSg0qSs 7NJ4XXlourT5zrnRMom3cm3d9gYqgTzgvKFL3kjMEexTRVYbhlcO4ZPRsiry8zxF ToGX+V8tDMqb8WSdFHzkljRY+zDRyfEUDMlTzROAD9DunLmQtkJKyrggkeGdjkpP OyfGqKpwlLXZRAXBil/U8Mx9MHdjJubloZwghLZr33VdUZa68+JJ9l6w163Oe/ET K264NM0wxN/kvN57JvePgqMccQwpINylg8IhRI+XelgczjUXeJBsOA8TDv4bDN4Q bjCLhkWbIaZtTYqvOXa/kD0T8wd7KETsMBQN8YzyDh6W0GmAlJjTawyAhA6jA5in x3uz2W8= =L3zp -----END PGP SIGNATURE----- Merge tag 'v6.4' into rdma.git for-next Linux 6.4 Resolve conflicts between rdma rc and next in rxe_cq matching linux-next: drivers/infiniband/sw/rxe/rxe_cq.c: https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-27 14:06:29 -03:00
Daisuke Matsuda	42b0a5e691	RDMA/rxe: Fix comments about removed tasklets The commit `9b4b7c1f9f` ("RDMA/rxe: Add workqueue support for rxe tasks") removed tasklets and replaced them with a workqueue, but relevant comments are still remaining in the source code. Fixes: `9b4b7c1f9f` ("RDMA/rxe: Add workqueue support for rxe tasks") Link: https://lore.kernel.org/r/20230518070027.942715-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 12:02:26 -03:00
Guoqing Jiang	b5f3fe27c5	RDMA/rxe: Convert spin_{lock_bh,unlock_bh} to spin_{lock_irqsave,unlock_irqrestore} We need to call spin_lock_irqsave()/spin_unlock_irqrestore() for state_lock in rxe, otherwsie the callchain: ib_post_send_mad -> spin_lock_irqsave -> ib_post_send -> rxe_post_send -> spin_lock_bh -> spin_unlock_bh -> spin_unlock_irqrestore Causes below traces during run block nvmeof-mp/001 test due to mismatched spinlock nesting: WARNING: CPU: 0 PID: 94794 at kernel/softirq.c:376 __local_bh_enable_ip+0xc2/0x140 [ ... ] CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G E 6.4.0-rc1 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Workqueue: rdma_cm cma_work_handler [rdma_cm] RIP: 0010:__local_bh_enable_ip+0xc2/0x140 Code: 48 85 c0 74 72 5b 41 5c 5d 31 c0 89 c2 89 c1 89 c6 89 c7 41 89 c0 e9 bd 0e 11 01 65 8b 05 f2 65 72 48 85 c0 0f 85 76 ff ff ff <0f> 0b e9 6f ff ff ff e8 d2 39 1c 00 eb 80 4c 89 e7 e8 68 ad 0a 00 RSP: 0018:ffffb7cf818539f0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffffc0f25f79 RBP: ffffb7cf81853a00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc0f25f79 R13: ffff8db1f0fa6000 R14: ffff8db2c63ff000 R15: 00000000000000e8 FS: 0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0 Call Trace: <TASK> _raw_spin_unlock_bh+0x31/0x40 rxe_post_send+0x59/0x8b0 [rdma_rxe] ib_send_mad+0x26b/0x470 [ib_core] ib_post_send_mad+0x150/0xb40 [ib_core] ? cm_form_tid+0x5b/0x90 [ib_cm] ib_send_cm_req+0x7c8/0xb70 [ib_cm] rdma_connect_locked+0x433/0x940 [rdma_cm] nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma] cma_cm_event_handler+0x4f/0x170 [rdma_cm] cma_work_handler+0x6a/0xe0 [rdma_cm] process_one_work+0x2a9/0x580 worker_thread+0x52/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x109/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 0 PID: 94794 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x37/0x60 [ ... ] CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G W E 6.4.0-rc1 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Workqueue: rdma_cm cma_work_handler [rdma_cm] RIP: 0010:warn_bogus_irq_restore+0x37/0x60 Code: fb 01 77 36 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 89 f7 e9 ac ea 01 00 48 c7 c7 e0 52 33 b9 c6 05 bb 1c 69 01 01 e8 39 24 f0 fe <0f> 0b 48 8b 5d f8 c9 31 f6 89 f7 e9 89 ea 01 00 0f b6 f3 48 c7 c7 RSP: 0018:ffffb7cf81853a58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb7cf81853a60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8db2cfb1a9e8 R13: ffff8db2cfb1a9d8 R14: ffff8db2c63ff000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0 Call Trace: <TASK> _raw_spin_unlock_irqrestore+0x91/0xa0 ib_send_mad+0x1e3/0x470 [ib_core] ib_post_send_mad+0x150/0xb40 [ib_core] ? cm_form_tid+0x5b/0x90 [ib_cm] ib_send_cm_req+0x7c8/0xb70 [ib_cm] rdma_connect_locked+0x433/0x940 [rdma_cm] nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma] cma_cm_event_handler+0x4f/0x170 [rdma_cm] cma_work_handler+0x6a/0xe0 [rdma_cm] process_one_work+0x2a9/0x580 worker_thread+0x52/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x109/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Fixes: `f605f26ea1` ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230510035056.881196-1-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-16 21:07:33 -03:00
Daisuke Matsuda	10af303192	RDMA/rxe: Fix spinlock recursion deadlock on requester The following deadlock is observed: Call Trace: <IRQ> _raw_spin_lock_bh+0x29/0x30 check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe] rxe_rcv+0x173/0x3d0 [rdma_rxe] rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe] ? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe] udp_queue_rcv_one_skb+0x258/0x520 udp_unicast_rcv_skb+0x75/0x90 __udp4_lib_rcv+0x364/0x5c0 ip_protocol_deliver_rcu+0xa7/0x160 ip_local_deliver_finish+0x73/0xa0 ip_sublist_rcv_finish+0x80/0x90 ip_sublist_rcv+0x191/0x220 ip_list_rcv+0x132/0x160 __netif_receive_skb_list_core+0x297/0x2c0 netif_receive_skb_list_internal+0x1c5/0x300 napi_complete_done+0x6f/0x1b0 virtnet_poll+0x1f4/0x2d0 [virtio_net] __napi_poll+0x2c/0x1b0 net_rx_action+0x293/0x350 ? __napi_schedule+0x79/0x90 __do_softirq+0xcb/0x2ab __irq_exit_rcu+0xb9/0xf0 common_interrupt+0x80/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:_raw_spin_lock+0x17/0x30 rxe_requester+0xe4/0x8f0 [rdma_rxe] ? xas_load+0x9/0xa0 ? xa_load+0x70/0xb0 do_task+0x64/0x1f0 [rdma_rxe] rxe_post_send+0x54/0x110 [rdma_rxe] ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs] ? netif_receive_skb_list_internal+0x1e3/0x300 ib_uverbs_write+0x3c8/0x500 [ib_uverbs] vfs_write+0xc5/0x3b0 ksys_write+0xab/0xe0 ? syscall_trace_enter.constprop.0+0x126/0x1a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc </TASK> The deadlock is easily reproducible with perftest. Fix it by disabling softirq when acquiring the lock in process context. Fixes: `f605f26ea1` ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230418090642.1849358-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-21 12:33:00 -03:00
Bob Pearson	f605f26ea1	RDMA/rxe: Protect QP state with qp->state_lock Currently the rxe driver makes little effort to make the changes to qp state (which includes qp->attr.qp_state, qp->attr.sq_draining and qp->valid) atomic between different client threads and IO threads. In particular a common template is for an RDMA application to call ib_modify_qp() to move a qp to ERR state and then wait until all the packet and work queues have drained before calling ib_destroy_qp(). None of these state changes are protected by locks to assure that the changes are executed atomically and that memory barriers are included. This has been observed to lead to incorrect behavior around qp cleanup. This patch continues the work of the previous patches in this series and adds locking code around qp state changes and lookups. Link: https://lore.kernel.org/r/20230405042611.6467-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-17 16:34:04 -03:00
Bob Pearson	7b560b89a0	RDMA/rxe: Move code to check if drained to subroutine Move two blocks of code in rxe_comp.c and rxe_req.c to subroutines that check if draining is complete in the SQD state and, if so, generate a SQ_DRAINED event. Link: https://lore.kernel.org/r/20230405042611.6467-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-17 16:01:44 -03:00
Bob Pearson	98e891b5e4	RDMA/rxe: Remove qp->req.state The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->req.state by qp->attr.qp_state and enum rxe_qp_state. This is the third of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-17 16:01:44 -03:00
Bob Pearson	f455a1bc97	RDMA/rxe: Make tasks schedule each other Replace rxe_run_task() by rxe_sched_task() when tasks call each other. These are not performance critical and mainly involve error paths but they run the risk of causing deadlocks. Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:21:36 -03:00
Bob Pearson	a246aa2e8a	RDMA/rxe: Remove qp reference counting in tasks Currently each of the three tasklets requester, completer and responder in the rxe driver take and release a reference to the qp argument at the beginning and end of the subroutines. The caller passing in the qp argument should be responsible for holding a reference to qp so these are not required. Further doing so breaks the qp cleanup code in rxe_qp_do_cleanup which calls these routines after all the references have been dropped so they cannot drain the packet and work request queues as intended. In fact if these routines are deferred by calling tasklet_schedule there is no guarantee that the calling code does have a qp reference. That is a bug in rxe_task.c which will be fixed later in this series. Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:21:36 -03:00
Bob Pearson	3946fc2a42	RDMA/rxe: Convert tasklet args to queue pairs Originally is was thought that the tasklet machinery in rxe_task.c would be used in other applications but that has not happened for years. This patch replaces the 'void arg' by struct 'rxe_qp qp' in the parameters to the tasklet calls. This change will have no affect on performance but may make the code a little clearer. Link: https://lore.kernel.org/r/20230304174533.11296-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:14:38 -03:00
Li Zhijian	fa1fd682ad	RDMA/rxe: Implement RC RDMA FLUSH service in requester side Implement FLUSH request operation in the requester. Link: https://lore.kernel.org/r/20221206130201.30986-7-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:36:02 -04:00
Xiao Yang	abb633cf28	RDMA/rxe: Make requester support atomic write on RC service Make requester process and send an atomic write request on RC service. Link: https://lore.kernel.org/r/1669905568-62-1-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-01 19:51:09 -04:00
Bob Pearson	0edfb15e30	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_req.c Replace calls to pr_xxx() in rxe_req.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:04 -04:00
Bob Pearson	dccb23f6c3	RDMA/rxe: Split rxe_run_task() into two subroutines Split rxe_run_task(task, sched) into rxe_run_task(task) and rxe_sched_task(task). Link: https://lore.kernel.org/r/20221021200118.2163-5-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 13:47:15 -03:00
Bob Pearson	62494ec7fb	RDMA/rxe: Split qp state for requester and completer Currently the requester can continue to process send wqes after an local qp operation error is detected because the setting of the qp state to the error state is deferred until later. This patch splits the qp state for the completer and requester into two separate states and sets qp->req.state = QP_STATE_ERROR as soon as the error is detected before another wqe can be executed. Link: https://lore.kernel.org/r/1658307368-1851-4-git-send-email-lizhijian@fujitsu.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-08-02 13:53:36 -03:00
Li Zhijian	ae720bdb70	RDMA/rxe: Generate error completion for error requester QP state As per IBTA specification, all subsequent WQEs while QP is in error state should be completed with a flush error. Here we check QP_STATE_ERROR after req_next_wqe() so that rxe_completer() has chance to be called where it will set CQ state to FLUSH ERROR and the completion can associate with its WQE. Link: https://lore.kernel.org/r/1658307368-1851-3-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-08-02 13:45:12 -03:00
Li Zhijian	dea4266f7b	RDMA/rxe: Update wqe_index for each wqe error completion Previously, if user space keeps sending abnormal wqe, queue.index will keep increasing while qp->req.wqe_index doesn't. Once qp->req.wqe_index==queue.index in next round, req_next_wqe() will treat queue as empty. In such case, no new completion would be generated. Update wqe_index for each wqe completion so that req_next_wqe() can get next wqe properly. Link: https://lore.kernel.org/r/1658307368-1851-2-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-08-02 13:41:36 -03:00
Bob Pearson	c2ea08ca5e	RDMA/rxe: Replace __rxe_do_task by rxe_run_task In rxe_req.c replace calls to __rxe_do_task() by calls to rxe_run_task(.., 0). Using __rxe_do_task is an error because the completer tasklet is not designed to be re-entrant and __rxe_do_task() should only be called when it is clear that no one else could be calling the completer tasklet as is the case in rxe_qp.c where this call is used in safe environments. Link: https://lore.kernel.org/r/20220630190425.2251-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
Bob Pearson	8bb143c534	RDMA/rxe: Make the tasklet exits the same Make changes to the three tasklets so that the exit logic from each is the same. This makes the code easier to understand. Link: https://lore.kernel.org/r/20220630190425.2251-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
Bob Pearson	445fd4f4fb	RDMA/rxe: Fix rnr retry behavior Currently the completer tasklet when retransmit timer or the rnr timer fires the same flag (qp->req.need_retry) is set so that if either timer fires it will attempt to perform a retry flow on the send queue. This has the effect of responding to an RNR NAK at the first retransmit timer event which might not allow the requested rnr timeout. This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set, prevents a retry flow until the rnr nak timer fires. This patch fixes rnr retry errors which can be observed by running the pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this patch applied they do not occur. Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/ Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/ Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
Bob Pearson	930119a172	RDMA/rxe: Add rxe_is_fenced() subroutine The code thc that decides whether to defer execution of a wqe in rxe_requester.c is isolated into a subroutine rxe_is_fenced() and removed from the call to req_next_wqe(). The condition whether a wqe should be fenced is changed to comply with the IBA. Currently an operation is fenced if the fence bit is set in the wqe flags and the last wqe has not completed. For normal operations the IBA actually only requires that the last read or atomic operation is complete. Link: https://lore.kernel.org/r/20220630190425.2251-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
lizhijian@fujitsu.com	03905ac285	RDMA/rxe: Remove unused mask parameter This parameter had been deprecated since below commit: `1a7085b342` ("RDMA/rxe: Skip adjusting remote addr for write in retry operation") Link: https://lore.kernel.org/r/20220715035340.1900168-1-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-18 14:50:35 +03:00
Bob Pearson	7cb33d1bc1	RDMA/rxe: Fix deadlock in rxe_do_local_ops() When a local operation (invalidate mr, reg mr, bind mw) is finished there will be no ack packet coming from a responder to cause the wqe to be completed. This may happen anyway if a subsequent wqe performs IO. Currently if the wqe is signalled the completer tasklet is scheduled immediately but not otherwise. This leads to a deadlock if the next wqe has the fence bit set in send flags and the operation is not signalled. This patch removes the condition that the wqe must be signalled in order to schedule the completer tasklet which is the simplest fix for this deadlock and is fairly low cost. This is the analog for local operations of always setting the ackreq bit in all last or only request packets even if the operation is not signalled. Link: https://lore.kernel.org/r/20220523223251.15350-1-rpearsonhpe@gmail.com Reported-by: Jenny Hack <jhack@hpe.com> Fixes: `c1a411268a` ("RDMA/rxe: Move local ops to subroutine") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 17:00:23 -03:00
Bob Pearson	4e05a4b329	RDMA/rxe: Check rxe_get() return value In the tasklets (completer, responder, and requester) check the return value from rxe_get() to detect failures to get a reference. This only occurs if the qp has had its reference count drop to zero which indicates that it no longer should be used. The ref is never 0 today because the tasklets are flushed before the ref is dropped. The next patch changes this so that the ref is dropped then the tasklets are flushed. Link: https://lore.kernel.org/r/20220421014042.26985-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-09 09:03:45 -03:00
Chengguang Xu	1a7085b342	RDMA/rxe: Skip adjusting remote addr for write in retry operation For write request the remote addr will be sent only with first packet so we don't have to adjust wqe->iova in retry operation. Link: https://lore.kernel.org/r/20220502053907.6388-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-06 13:12:56 -03:00
Bob Pearson	e7734156b0	RDMA/rxe: Replace paylen by payload In finish_packet() in rxe_req.c a variable was incorrectly called paylen instead of payload. Elsewhere in the rxe source payload is always used for the RoCE payload length and paylen is always used for the UDP payload length. This will cause unnecessary confusion. Replace paylen by payload in finish_packet(). Link: https://lore.kernel.org/r/20220420172316.5465-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-04 11:21:10 -03:00
Li Zhijian	0f328c7034	RDMA/rxe: Remove useless parameters for update_state() wqe was not used by update_state() so far. Commit `aaaf62e066` ("RDMA/rxe: Remove useless argument for update_state()") just did a partial fixes. Link: https://lore.kernel.org/r/20220412022903.574238-1-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-25 15:51:32 -03:00
Xiao Yang	2f917af777	RDMA/rxe: Generate a completion for unsupported/invalid opcode Current rxe_requester() doesn't generate a completion when processing an unsupported/invalid opcode. If rxe driver doesn't support a new opcode (e.g. RDMA Atomic Write) and RDMA library supports it, an application using the new opcode can reproduce this issue. Fix the issue by calling "goto err;". Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220410113513.27537-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-12 11:16:24 -03:00
Bob Pearson	98c8026331	RDMA/rxe: Remove reliable datagram support The rdma_rxe driver does not actually support the reliable datagram transport but contains two references to RD opcodes in driver code. This commit removes these references to RD transport opcodes which are never used. Link: https://lore.kernel.org/r/cce0f07d-25fc-5880-69e7-001d951750b7@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-08 14:38:50 -03:00
Bob Pearson	409baed5d7	RDMA/rxe: Remove support for SMI QPs from rdma_rxe Currently the rdma_rxe driver supports SMI type QPs in a few places which is incorrect. RoCE devices never should support SMI QPs. This commit removes SMI QP support from the driver. Link: https://lore.kernel.org/r/20220407185416.16372-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-08 14:38:33 -03:00
Bob Pearson	3197706abd	RDMA/rxe: Use standard names for ref counting Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put(). Significantly improves readability for new readers. Link: https://lore.kernel.org/r/20220304000808.225811-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-16 10:34:42 -03:00
Bob Pearson	63221acb0c	RDMA/rxe: Fix ref error in rxe_av.c The commit referenced below can take a reference to the AH which is never dropped. This only happens in the UD request path. This patch optionally passes that AH back to the caller so that it can hold the reference while the AV is being accessed and then drop it. Code to do this is added to rxe_req.c. The AV is also passed to rxe_prepare in rxe_net.c as an optimization. Fixes: `e2fe06c908` ("RDMA/rxe: Lookup kernel AH from ah index in UD WQEs") Link: https://lore.kernel.org/r/20220304000808.225811-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-15 20:49:56 -03:00
Chengguang Xu	aaaf62e066	RDMA/rxe: Remove useless argument for update_state() The argument 'payload' is not used in update_state(), so just remove it. Link: https://lore.kernel.org/r/20220307145047.3235675-2-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-14 20:36:01 -03:00
Chengguang Xu	7e8e611d6a	RDMA/rxe: Change variable and function argument to proper type The type of wqe length is u32 so in order to avoid overflow and shadow casting change variable and relevant function argument to proper type. Link: https://lore.kernel.org/r/20220307145047.3235675-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-14 20:36:01 -03:00
Zhu Yanjun	3fe6d228a0	RDMA/rxe: Remove the unnecessary variable The variable pkey is assigned from a macro. Then this variable is passed to a function bth_init directly, and pkey is not used again. So remove it and use the macro directly. Fixes: `76251e15ea` ("RDMA/rxe: Remove pkey table") Link: https://lore.kernel.org/r/20211207194057.713289-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-12-07 13:56:22 -04:00
Bob Pearson	21adfa7a3c	RDMA/rxe: Replace irqsave locks with bh locks Most of the locks in the rxe driver are _irqsave/restore locks but in fact there are no interrupt threads that run rxe code or share data with rxe. There are softirq threads and data sharing so the appropriate lock type is _bh. This patch replaces all irqsave type locks with bh type locks. Link: https://lore.kernel.org/r/20211103050241.61293-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-11-19 13:29:14 -04:00
Bob Pearson	e2fe06c908	RDMA/rxe: Lookup kernel AH from ah index in UD WQEs Add code to rxe_get_av in rxe_av.c to use the AH index in UD send WQEs to lookup the kernel AH. For old user providers continue to use the AV passed in WQEs. Move setting pkt->rxe to before the call to rxe_get_av() to get access to the AH pool. Link: https://lore.kernel.org/r/20211007204051.10086-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-10-12 13:25:26 -03:00
Xiao Yang	45216d6363	RDMA/rxe: Add MASK suffix for RXE_READ_OR_ATOMIC and RXE_WRITE_OR_SEND To reflect the intention, since it is not just a single bit. Link: https://lore.kernel.org/r/20210914080253.1145353-3-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-09-28 11:42:24 -03:00
Bob Pearson	001345339f	RDMA/rxe: Separate HW and SW l/rkeys Separate software and simulated hardware lkeys and rkeys for MRs and MWs. This makes struct ib_mr and struct ib_mw isolated from hardware changes triggered by executing work requests. This change fixes a bug seen in blktest. Link: https://lore.kernel.org/r/20210914164206.19768-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-09-24 10:14:59 -03:00
Bob Pearson	ae6e843fe0	RDMA/rxe: Add memory barriers to kernel queues Earlier patches added memory barriers to protect user space to kernel space communications. The user space queues were previously shown to have occasional memory synchonization errors which were removed by adding smp_load_acquire, smp_store_release barriers. This patch extends that to the case where queues are used between kernel space threads. This patch also extends the queue types to include kernel ULP queues which access the other end of the queues in kernel verbs calls like poll_cq and post_send/recv. Link: https://lore.kernel.org/r/20210914164206.19768-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-09-24 10:14:59 -03:00
Bob Pearson	1117f26ea7	RDMA/rxe: Move ICRC generation to a subroutine Isolate ICRC generation into a single subroutine named rxe_generate_icrc() in rxe_icrc.c. Remove scattered crc generation code from elsewhere. Link: https://lore.kernel.org/r/20210707040040.15434-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-07-16 12:43:34 -03:00
Bob Pearson	3902b429ca	RDMA/rxe: Implement invalidate MW operations Implement invalidate MW and cleaned up invalidate MR operations. Added code to perform remote invalidate for send with invalidate. Added code to perform local invalidation. Deleted some blank lines in rxe_loc.h. Link: https://lore.kernel.org/r/20210608042552.33275-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:51:18 -03:00
Bob Pearson	32a577b4c3	RDMA/rxe: Add support for bind MW work requests Add support for bind MW work requests from user space. Since rdma/core does not support bind mw in ib_send_wr there is no way to support bind mw in kernel space. Added bind_mw local operation in rxe_req.c. Added bind_mw WR operation in rxe_opcode.c. Added bind_mw WC in rxe_comp.c. Added additional fields to rxe_mw in rxe_verbs.h. Added rxe_do_dealloc_mw() subroutine to cleanup an mw when rxe_dealloc_mw is called. Added code to implement bind_mw operation in rxe_mw.c Link: https://lore.kernel.org/r/20210608042552.33275-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:51:18 -03:00
Bob Pearson	c1a411268a	RDMA/rxe: Move local ops to subroutine Simplify rxe_requester() by moving the local operations to a subroutine. Add an error return for illegal send WR opcode. Moved next_index ahead of rxe_run_task which fixed a small bug where work completions were delayed until after the next wqe which was not the intended behavior. Let errors return their own WC status. Previously all errors were reported as protection errors which was incorrect. Changed the return of errors from rxe_do_local_ops() to err: which causes an immediate completion. Without this an error on a last WR may get lost. Changed fill_packet() to finish_packet() which is more accurate. Fixes: 8700e2e7c485 ("The software RoCE driver") Link: https://lore.kernel.org/r/20210608042552.33275-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:51:18 -03:00
Bob Pearson	886441fb2e	RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK Rxe has two mask bits WR_LOCAL_MASK and WR_REG_MASK with WR_REG_MASK used to indicate any local operation and WR_LOCAL_MASK unused. This patch replaces both of these with one mask bit WR_LOCAL_OP_MASK which is clearer. Link: https://lore.kernel.org/r/20210608042552.33275-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:51:18 -03:00

1 2

84 commits