RXE_POOL_ALIGN is only used in rxe_pool.c so move RXE_POOL_ALIGN to
rxe_pool.c from rxe_pool.h. RXE_POOL_CACHE_FLAGS is never used so it is
deleted from rxe_pool.h
Link: https://lore.kernel.org/r/20211103050241.61293-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In rxe_pool.c currently there are many cases where it is necessary to
compute the offset from a pool element struct to the object containing it
in a type independent way where the offset is different for each type. By
saving a pointer to the object when they are created extra work can be
saved.
Link: https://lore.kernel.org/r/20211103050241.61293-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In rxe_pool.c copy remaining pool setup parameters from rxe_pool_info into
rxe_pool. This saves looking up rxe_pool_info in the performance path.
Link: https://lore.kernel.org/r/20211103050241.61293-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently three different names are used to describe rxe pool elements.
They are referred to as entries, elems or pelems. This patch chooses one
'elem' and changes the other ones.
Link: https://lore.kernel.org/r/20211103050241.61293-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Make struct rxe_type_info static const and local to the only uses. Moves
a bit of data to text.
$ size drivers/infiniband/sw/rxe/rxe_pool.o* (defconfig w/ infiniband swe)
text data bss dec hex filename
4456 12 0 4468 1174 drivers/infiniband/sw/rxe/rxe_pool.o.new
3817 652 0 4469 1175 drivers/infiniband/sw/rxe/rxe_pool.o.old
Link: https://lore.kernel.org/r/166b715d71f98336e8ecab72b0dbdd266eee9193.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Modify rxe_add_index() and rxe_add_key() to return an error if the index
or key is aleady present in the pool. Currently they print a warning and
silently fail with bad consequences to the caller.
Link: https://lore.kernel.org/r/20210608042552.33275-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rxe_pool.c uses the field pool->state to mark a pool as invalid when it is
shut down and checks it in several pool APIs to verify that the pool has
not been shut down. This is unneeded because the pools are not marked
invalid unless the entire driver is being removed at which point no
functional APIs should or could be executing. This patch removes this
field and associated code.
Link: https://lore.kernel.org/r/20210125211641.2694-6-rpearson@hpe.com
Suggested-by: zyjzyj2000@gmail.c
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rxe_pool.c takes references to the pool and ib_device structs for each
object allocated and also keeps an atomic num_elem count in each
pool. This is more work than is needed. Pool allocation is only called
from verbs APIs which already have references to ib_device and pools are
only diasbled when the driver is removed so no protection of the pool
addresses are needed. The elem count is used to warn if elements are still
present in a pool when it is cleaned up which is useful.
This patch eliminates the references to the ib_device and pool structs.
Link: https://lore.kernel.org/r/20210125211641.2694-5-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rxe_alloc() used the RXE_POOL_ATOMIC flag in rxe_type_info to select
GFP_ATOMIC in calls to kzalloc(). This was intended to handle cases where
an object could be created in interrupt context. This no longer occurs
since allocating those objects has moved into the core so this flag is not
necessary. An incorrect use of this flag was still present for rxe_mc_elem
objects and is removed.
Link: https://lore.kernel.org/r/20210125211641.2694-4-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The names and comments of the 'unlocked' pool APIs are very misleading and
not what was intended. This patch replaces 'rxe_xxx_nl' with
'rxe_xxx_locked' with comments indicating that the caller is expected to
hold the rxe pool lock.
Link: https://lore.kernel.org/r/20210125211641.2694-3-rpearson@hpe.com
Reported-by: Hillf Danton <hdanton@sina.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The existing pool APIs use the rw_lock pool_lock to protect critical
sections that change the pool state. This does not correctly implement a
typical sequence like the following
elem = <lookup key in pool>
if found use elem else
elem = <alloc new elem in pool>
<add key to elem>
Which is racy if multiple threads are attempting to perform this at the
same time. We want the second thread to use the elem created by the first
thread not create two equivalent elems.
This patch adds new APIs that are the same as existing APIs but do not
take the pool_lock. A caller can then take the lock and perform a sequence
of pool operations and then release the lock.
Link: https://lore.kernel.org/r/20201216231550.27224-7-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Replace 'void *' parameters with 'struct rxe_pool_entry *' and use a macro
to allow:
rxe_add_index,
rxe_drop_index,
rxe_add_key,
rxe_drop_key and
rxe_add_to_pool
APIs to be type safe against changing the position of pelem in the
objects.
Link: https://lore.kernel.org/r/20201216231550.27224-6-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The rxe verbs objects each include an rdma-core object 'ib_xxx'
and a rxe_pool_entry 'pelem' in addition to rxe specific data.
Originally these all had pelem first and ib_xxx second. Currently
about half have ib_xxx first and half have pelem first. Saving
the offset of the pelem field in rxe_type info will enable making
the rxe_pool APIs type safe as the pelem field continues to vary.
Link: https://lore.kernel.org/r/20201216231550.27224-4-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Allow both indices and keys to exist for objects in pools. Previously you
were limited to one or the other.
This is required for later implementing rxe memory windows.
Link: https://lore.kernel.org/r/20201216231550.27224-3-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Change rxe pools to use kzalloc instead of kmem_cache to allocate memory
for rxe objects. The pools are not really necessary and they trigger
hardened user copy warnings as the ioctl framework copies the QP number
directly to userspace.
Also the general project to move object alloation to the core code will
eventually clean these out anyhow.
Link: https://lore.kernel.org/r/20200827163535.2632-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The PD allocations in IB/core allows us to simplify drivers and their
error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
in had with relevant update in .dealloc_pd().
We will use this opportunity and convert .dealloc_pd() to don't fail, as
it was suggested a long time ago, failures are not happening as we have
never seen a WARN_ON print.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since the function always returns 0 make it void.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Normal practice is to have enum defines in capital letters.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Concurrent readers which read rb tree are protected using read lock.
Concurrent writers which add element to pool are protected
using write lock.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Change the argument type of these functions from void * into
struct rxe_pool_entry *.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Soft RoCE (RXE) - The software RoCE driver
ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device. This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.
The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.
A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices. The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.
Architecture:
+-----------------------------------------------------------+
| Application |
+-----------------------------------------------------------+
+-----------------------------------+
| libibverbs |
User +-----------------------------------+
+----------------+ +----------------+
| librxe | | HW RoCE lib |
+----------------+ +----------------+
+---------------------------------------------------------------+
+--------------+ +------------+
| Sockets | | RDMA ULP |
+--------------+ +------------+
+--------------+ +---------------------+
| TCP/IP | | ib_core |
+--------------+ +---------------------+
+------------+ +----------------+
Kernel | ib_rxe | | HW RoCE driver |
+------------+ +----------------+
+------------------------------------+
| NIC driver |
+------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------------------------------------------------------+
| Application |
+-----------------------------------------------------------+
+-----------------------------------+
| libibverbs |
User +-----------------------------------+
+----------------+ +----------------+
| librxe | | HW RoCE lib |
+----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+--------------+ +------------+
| Sockets | | RDMA ULP |
+--------------+ +------------+
+--------------+ +---------------------+
| TCP/IP | | ib_core |
+--------------+ +---------------------+
+------------+ +----------------+
Kernel | ib_rxe | | HW RoCE driver |
+------------+ +----------------+
+------------------------------------+
| NIC driver |
+------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Soft RoCE resources:
[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>