Only the requested placement types that also registered in the destination
memory region are acceptable.
Otherwise, responder will also reply NAK "Remote Access Error" if it
found a placement type violation.
We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.
This commit also adds 2 helpers to update qp.resp from the incoming packet.
Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The code in rxe_resp.c at check_length() is incorrect as it compares
pkt->opcode an 8 bit value against various mask bits which are all higher
than 256 so nothing is ever reported.
This patch rewrites this to compare against pkt->mask which is
correct. However this now exposes another error. For UD send packets the
value of the pmtu cannot be determined from qp->mtu. All that is required
here is to later check if the payload fits into the posted receive buffer
in that case.
Fixes: 837a55847e ("RDMA/rxe: Implement packet length validation on responder")
Link: https://lore.kernel.org/r/20221208210945.28607-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmONI6weHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG9xgH/jqXGuMoO1ikfmGb
7oY0W/f69G9V/e0DxFLvnIjhFgCUzdnNsmD4jQJA4x6QsxwLWuvpI282Ez+bHV5T
U4RPsxJZIIMsXE2lKM9BRgeLzDdCt0aK4Pj+3x2x7NZC5cWFSQ8PyQJkCwg+0PQo
u8Ly+GO8c4RUMf4/rrAZQq16qZUqGDaGm1EJhtSoa+KiR81LmUUmbDIK9Mr53rmQ
wou+95XhibwMWr17WgXA28bTgYqn9UGr67V3qvTH2LC7GW8BCoKvn+3wh6TVhlWj
dsWplXgcOP0/OHvSC5Sb1Uibk5Gx3DlIzYa6OfNZQuZ5xmQqm9kXjW8lmYpWFHy/
38/5HWc=
=EuoA
-----END PGP SIGNATURE-----
Merge tag 'v6.1-rc8' into rdma.git for-next
For dependencies in following patches
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The function check_length() is supposed to check the length of inbound
packets on responder, but it actually has been a stub since the driver was
born. Let it check the payload length and the DMA length.
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20221107055338.357184-1-matsuda-daisuke@fujitsu.com
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Split rxe_run_task(task, sched) into rxe_run_task(task) and
rxe_sched_task(task).
Link: https://lore.kernel.org/r/20221021200118.2163-5-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Before the testing, we already passed it to rxe_mr_copy() where mr could
be dereferenced. so this checking is not needed.
The only way that mr is NULL is when it reaches below line 780 with
'qp->resp.mr = NULL', which is not possible in Bob's explanation[1].
778 if (res->state == rdatm_res_state_new) {
779 if (!res->replay) {
780 mr = qp->resp.mr;
781 qp->resp.mr = NULL;
782 } else {
[1] https://lore.kernel.org/lkml/30ff25c4-ce66-eac4-eaa2-64c0db203a19@gmail.com/
Link: https://lore.kernel.org/r/1666582315-2-1-git-send-email-lizhijian@fujitsu.com
CC: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Currently, responder can reply packets with invalid payloads if it fails
to copy messages to the packets. Add an error handling in read_reply() to
inform a requesting node of the failure.
Link: https://lore.kernel.org/r/20221013014724.3786212-1-matsuda-daisuke@fujitsu.com
Suggested-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Most code in send_ack() and send_atomic_ack() are duplicate, move them to
a new helper send_common_ack().
In newer IBA spec, some opcodes require acknowledge with a zero-length
read response, with this new helper, we can easily implement it later.
Link: https://lore.kernel.org/r/1659335010-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
An incoming Read request causes multiple Read responses. If a user MR to
copy data from is unavailable or responder cannot send a reply, then the
error messages can be printed for each response attempt, resulting in
message overflow.
Link: https://lore.kernel.org/r/20220829071218.1639065-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Make changes to the three tasklets so that the exit logic from each is the
same. This makes the code easier to understand.
Link: https://lore.kernel.org/r/20220630190425.2251-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The qp parameter in free_rd_atomic_resource() has become
unused so remove it directly.
Fixes: 15ae1375ea ("RDMA/rxe: Fix qp reference counting for atomic ops")
Link: https://lore.kernel.org/all/20220708035547.6592-1-yangx.jy@fujitsu.com/
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
It's redundant to prepare resources for Read and Atomic
requests by different functions. Replace them by a common
rxe_prepare_res() with different parameters. In addition,
the common rxe_prepare_res() can also be used by new Flush
and Atomic Write requests in the future.
Link: https://lore.kernel.org/r/20220705145212.12014-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The pyverbs test suite generates a few dmesg traces from intentional error
tests. This patch replaces those messages with pr_debug() calls which
improves the usefullness of the tests.
Link: https://lore.kernel.org/r/20220630190425.2251-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Make the execution of the atomic operation in rxe_atomic_reply()
conditional on res->replay and make duplicate_request() call into
rxe_atomic_reply() to merge the two flows. This is modeled on the behavior
of read reply. Delete the skb from the atomic responder resource since it
is no longer used. Adjust the reference counting of the qp in
send_atomic_ack() for this flow.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220606143836.3323-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Move the saved original value to the atomic responder resource. This
replaces saving it in the qp. In preparation for merging the normal and
retry atomic responder flows.
Link: https://lore.kernel.org/r/20220606143836.3323-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Move the allocation of the atomic responder resource up into
rxe_atomic_reply() from send_atomic_ack(). In preparation for merging the
normal and retry atomic responder flows.
Link: https://lore.kernel.org/r/20220606143836.3323-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add a responder state for atomic reply similar to read reply and rename
process_atomic() rxe_atomic_reply(). In preparation for merging the normal
and retry atomic responder flows.
Link: https://lore.kernel.org/r/20220606143836.3323-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Separate the code that prepares the atomic responder resource into a
subroutine. This is preparation for merging the normal and retry atomic
responder flows.
Link: https://lore.kernel.org/r/20220606143836.3323-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The pkt parameters in prepare_ack_packet(), send_ack() and
send_atomic_ack() have become useless by the following commits. So remove
them directly.
Fixes: bf139b58af ("RDMA/rxe: Remove unused pkt->offset")
Fixes: 3896bde92d ("RDMA/rxe: Fix extra copy in prepare_ack_packet")
Link: https://lore.kernel.org/r/20220623131627.18903-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmKKlIAeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC3oH/iPm/fLG2sJut8My
sU0RC9K+6ESV5h2Qy6k00/lqKstlu4EvBjw4V8vYpx3Q2+hbSFMn2SeWqqqT3Lkk
Zb8KINCFuuyMtdCBb42PV0zhUf5pCQF7ocm/Ae4jllDHtPmqk3WJ6IGtZBK5JBlw
z6RR/wKt0y0MRj9eZyPyYjOee2L2vuVh4tgnexK/4L8g2ZtMMRThhvUzSMWG4zxR
STYYNp0uFcfT1Vt85+ODevFH4TvdECAj+SqAegN+seHLM17YY7M0/WiIYpxGRv8P
lIpDQl4PBU8EBkpI5hkpJ/3qPincbuVOMLsYfxFtpcjjG12vGjFp2krGpS3TedZQ
3mvaJ7c=
=vLke
-----END PGP SIGNATURE-----
Merge tag 'v5.18' into rdma.git for-next
Following patches have dependencies.
Resolve the merge conflict in
drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names
for the fs functions following linux-next:
https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In the tasklets (completer, responder, and requester) check the return
value from rxe_get() to detect failures to get a reference. This only
occurs if the qp has had its reference count drop to zero which indicates
that it no longer should be used.
The ref is never 0 today because the tasklets are flushed before the ref
is dropped. The next patch changes this so that the ref is dropped then
the tasklets are flushed.
Link: https://lore.kernel.org/r/20220421014042.26985-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The rping benchmark fails on long runs. The root cause of this failure has
been traced to a failure to compute a nonzero value of mr in rare
situations.
Fix this failure by correctly handling the computation of mr in
read_reply() in rxe_resp.c in the replay flow.
Fixes: 8a1a0be894 ("RDMA/rxe: Replace mr by rkey in responder resources")
Link: https://lore.kernel.org/r/20220418174103.3040-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The referenced commit generates a reference counting error if the rkey has
the same index but the wrong key. In this case the reference taken by
rxe_pool_get_index() is not dropped.
Drop the reference if the keys don't match in rxe_recheck_mr(). Check
that the mw and mr are still valid.
Fixes: 8a1a0be894 ("RDMA/rxe: Replace mr by rkey in responder resources")
Link: https://lore.kernel.org/r/20220411030647.20011-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The rdma_rxe driver does not actually support the reliable datagram
transport but contains two references to RD opcodes in driver code. This
commit removes these references to RD transport opcodes which are never
used.
Link: https://lore.kernel.org/r/cce0f07d-25fc-5880-69e7-001d951750b7@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the rdma_rxe driver supports SMI type QPs in a few places which
is incorrect. RoCE devices never should support SMI QPs. This commit
removes SMI QP support from the driver.
Link: https://lore.kernel.org/r/20220407185416.16372-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put().
Significantly improves readability for new readers.
Link: https://lore.kernel.org/r/20220304000808.225811-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently rxe saves a copy of MR in responder resources for RDMA reads.
Since the responder resources are never freed just over written if more
are needed this MR may not have a reference freed until the QP is
destroyed. This patch uses the rkey instead of the MR and on subsequent
packets of a multipacket read reply message it looks up the MR from the
rkey for each packet. This makes it possible for a user to deregister an
MR or unbind a MW on the fly and get correct behaviour.
Link: https://lore.kernel.org/r/20220304000808.225811-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The commit referenced below can take a reference to the AH which is never
dropped. This only happens in the UD request path. This patch optionally
passes that AH back to the caller so that it can hold the reference while
the AV is being accessed and then drop it. Code to do this is added to
rxe_req.c. The AV is also passed to rxe_prepare in rxe_net.c as an
optimization.
Fixes: e2fe06c908 ("RDMA/rxe: Lookup kernel AH from ah index in UD WQEs")
Link: https://lore.kernel.org/r/20220304000808.225811-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
A previous patch replaced all irqsave locks in rxe with bh locks. This
ran into problems because rdmacm has a bad habit of calling rdma verbs
APIs while disabling irqs. This is not allowed during spin_unlock_bh()
causing programs that use rdmacm to fail. This patch reverts the changes
to locks that had this problem or got dragged into the same mess. After
this patch blktests/check -q srp now runs correctly.
Link: https://lore.kernel.org/r/20220215194448.44369-1-rpearsonhpe@gmail.com
Fixes: 21adfa7a3c ("RDMA/rxe: Replace irqsave locks with bh locks")
Reported-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
It's wrong to check the last packet by RXE_COMP_MASK because the flag is
to indicate if responder needs to generate a completion.
Fixes: 9fcd67d177 ("IB/rxe: increment msn only when completing a request")
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20211229034438.1854908-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
1) Replace (RXE_READ_MASK | RXE_WRITE_MASK) with RXE_READ_OR_WRITE_MASK.
2) Change (RXE_READ_MASK | RXE_WRITE_OR_SEND) to RXE_READ_OR_WRITE_MASK
because we don't need to check RETH for RXE_SEND_MASK.
Link: https://lore.kernel.org/r/20210914080253.1145353-2-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Earlier patches added memory barriers to protect user space to kernel
space communications. The user space queues were previously shown to have
occasional memory synchonization errors which were removed by adding
smp_load_acquire, smp_store_release barriers. This patch extends that to
the case where queues are used between kernel space threads.
This patch also extends the queue types to include kernel ULP queues which
access the other end of the queues in kernel verbs calls like poll_cq and
post_send/recv.
Link: https://lore.kernel.org/r/20210914164206.19768-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
From Maor Gottlieb
====================
Fix the use of nents and orig_nents in the sg table append helpers. The
nents should be used by the DMA layer to store the number of DMA mapped
sges, the orig_nents is the number of CPU sges.
Since the sg append logic doesn't always create a SGL with exactly
orig_nents entries store a total_nents as well to allow the table to be
properly free'd and reorganize the freeing logic to share across all the
use cases.
====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* 'sg_nents':
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
The memcpy() that copies a WQE from a SRQ the QP uses an incorrect size.
The size should have been the size of the rxe_send_wqe struct not the size
of a pointer to it. The result is that IO operations using a SRQ on the
responder side will fail.
Fixes: ec0fa2445c ("RDMA/rxe: Fix over copying in get_srq_wqe")
Link: https://lore.kernel.org/r/20210729220039.18549-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Isolate ICRC generation into a single subroutine named rxe_generate_icrc()
in rxe_icrc.c. Remove scattered crc generation code from elsewhere.
Link: https://lore.kernel.org/r/20210707040040.15434-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This error path needs to unlock before returning.
Fixes: ec0fa2445c ("RDMA/rxe: Fix over copying in get_srq_wqe")
Link: https://lore.kernel.org/r/YNXUCmnPsSkPyhkm@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Majd Dibbiny <majd@nvidia.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently prepare_ack_packet writes almost all the fields of the BTH in
the ack packet twice. Replace code with the subroutine init_bth().
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently get_srq_wqe() in rxe_resp.c copies the maximum possible number
of bytes from the wqe into the QPs copy of the SRQ wqe. This is usually
extra work and risks reading past the end of the SRQ circular buffer if
the SRQ is configured with less than the maximum possible number of SGEs.
Check the number of SGEs is not too large.
Compute the actual number of bytes in the WR and copy only those.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
build_rdma_network_hdr() in rxe_resp.c does more copying than is
needed. Remove this subroutine and eliminate the extra copies for IPV6 and
reduce the extra copying for IPV4.
Fixes: e404f945a6 ("IB/rxe: improved debug prints & code cleanup")
Link: https://lore.kernel.org/r/20210618045742.204195-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In send_atomic_ack() in rxe_resp.c there is code copying ack_pkt into the
skb->cb[]. This doesn't do anything useful because the cb[] is not used in
the transmit path by the rxe driver.
Remove this code.
Fixes: 4c93496f18 ("IB/rxe: do not copy extra stack memory to skb")
Link: https://lore.kernel.org/r/20210618045742.204195-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Implement invalidate MW and cleaned up invalidate MR operations.
Added code to perform remote invalidate for send with invalidate. Added
code to perform local invalidation. Deleted some blank lines in rxe_loc.h.
Link: https://lore.kernel.org/r/20210608042552.33275-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>