1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

56 commits

Author SHA1 Message Date
Vaishali Thakkar
f7ecac6a09 RDMA/rtrs-srv: Rename rtrs_srv to rtrs_srv_sess
Structure rtrs_srv is used for sessions so in order to avoid confusions
rename it to rtrs_srv_sess.

All changes were done with the help of following Coccinelle script:

@@
@@
struct
- rtrs_srv
+ rtrs_srv_sess

Link: https://lore.kernel.org/r/20220105180708.7774-5-jinpu.wang@ionos.com
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-07 10:07:31 -04:00
Vaishali Thakkar
ae4c81644e RDMA/rtrs-srv: Rename rtrs_srv_sess to rtrs_srv_path
rtrs_srv_sess is used for paths and not sessions on the server side. This
creates confusion so let's rename it to rtrs_srv_path. Also, rename
related variables and functions.

Coccinelle is used to do the transformations for most of the occurrences
and remaining ones were handled manually.

Link: https://lore.kernel.org/r/20220105180708.7774-3-jinpu.wang@ionos.com
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-07 10:07:31 -04:00
Vaishali Thakkar
d937279471 RDMA/rtrs: Rename rtrs_sess to rtrs_path
rtrs_sess is in fact a path. This makes it confusing and difficult to get
into the code. So let's rename the structure and related use cases of it.

Coccinelle was used to do the transformation for most of the occurrences
and remaining ones were handled manually.

Link: https://lore.kernel.org/r/20220105180708.7774-2-jinpu.wang@ionos.com
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-07 10:07:30 -04:00
Md Haris Iqbal
dea7bb3ad3 RDMA/rtrs: Do not allow sessname to contain special symbols / and .
Allowing these characters in sessname can lead to unexpected results,
particularly because / is used as a separator between files in a path, and
. points to the current directory.

Link: https://lore.kernel.org/r/20210922125333.351454-7-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04 16:46:12 -03:00
Jack Wang
4b6afe9bc9 RDMA/rtrs: Fix warning when use poll mode on client side.
When testing with poll mode, it will fail and lead to warning below on
client side:

$ echo "sessname=bla path=gid:fe80::2:c903:4e:d0b3@gid:fe80::2:c903:8:ca17 device_path=/dev/nullb2 nr_poll_queues=-1" | \
  sudo tee /sys/devices/virtual/rnbd-client/ctl/map_device

rnbd_client L597: Mapping device /dev/nullb2 on session bla, (access_mode: rw, nr_poll_queues: 8)
WARNING: CPU: 3 PID: 9886 at drivers/infiniband/core/cq.c:447 ib_cq_pool_get+0x26f/0x2a0 [ib_core]

The problem is in case of poll queue, we need to still call
ib_alloc_cq/ib_free_cq, we can't use cq_poll api for poll queue.

As both client and server use shared function from rtrs, set irq_con_num
to con_num on server side, which is number of total connection of the
session, this way we can differ if the rtrs_con requires pollqueue.

Following up patches will replace the duplicate code with helpers.

Link: https://lore.kernel.org/r/20210922125333.351454-4-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04 16:46:12 -03:00
Gioh Kim
cbe2de395c RDMA/rtrs: Remove (void) casting for functions
Casting to (void) does nothing, remove them.

Link: https://lore.kernel.org/r/20210806112112.124313-7-haris.iqbal@ionos.com
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22 19:22:59 -03:00
Gioh Kim
4693d6b767 RDMA/rtrs: Remove all likely and unlikely
The IO performance test with fio after swapping the likely and unlikely
macros in all if-statement shows no difference.  They do not help for the
performance of rtrs.

Thanks to Haakon Bugge for the test scenario.

The fio test did random read on 32 rnbd devices and 64 processes.
Test environment:
- Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
- 376G memory
- kernel version: 5.4.86
- gcc version: gcc (Debian 8.3.0-6) 8.3.0
- Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]

Test result:
- before swapping:       IOPS=829k, BW=3239MiB/s
- after swapping:        IOPS=829k, BW=3238MiB/s
- remove all (un)likely: IOPS=829k, BW=3238MiB/s

Link: https://lore.kernel.org/r/20210806112112.124313-5-haris.iqbal@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-22 19:22:59 -03:00
Jack Wang
cfcdbd9dd7 RDMA/rtrs: Move sq_wr_avail to rtrs_con
In order to account HB for sq_wr_avail properly, move sq_wr_avail from
rtrs_srv_con to rtrs_con.

Although rtrs-clt do not care sq_wr_avail, but still init it to
max_send_wr.

Fixes: b38041d50a ("RDMA/rtrs: Do not signal for heatbeat")
Link: https://lore.kernel.org/r/20210712060750.16494-7-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15 14:24:14 -03:00
Jack Wang
e2d98504c6 RDMA/rtrs: Enable the same selective signal for heartbeat and IO
On idle session, because we do not do signal for heartbeat, it will
overflow the send queue after sometime.

To avoid that, we need to enable the signal for heartbeat. To do that, add
a new member signal_interval in rtrs_path, which will set min of
queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat
and IO, so the sq queue full accounting is correct.

Fixes: b38041d50a ("RDMA/rtrs: Do not signal for heatbeat")
Link: https://lore.kernel.org/r/20210712060750.16494-4-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15 14:24:14 -03:00
Jack Wang
a10431eff1 RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con
We need to track also the wr used for heatbeat. This is a preparation for
that, will be used in later patch.

The io_cnt in rtrs_clt is removed, use wr_cnt instead.

Link: https://lore.kernel.org/r/20210712060750.16494-3-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15 14:24:14 -03:00
Leon Romanovsky
bf194997c7 RDMA: Fix kernel-doc warnings about wrong comment
Compilation with W=1 produces warnings similar to the below.

  drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment
	starts with '/**', but isn't a kernel-doc comment. Refer
	Documentation/doc-guide/kernel-doc.rst

All such occurrences were found with the following one line
 git grep -A 1 "\/\*\*" drivers/infiniband/

Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com
Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 20:32:50 -03:00
Jack Wang
a95fbe2aba RDMA/rtrs: Check device max_qp_wr limit when create QP
Currently we only check device max_qp_wr limit for IO connection, but not
for service connection. We should check for both.

So save the max_qp_wr device limit in wr_limit, and use it for both IO
connections and service connections.

While at it, also remove an outdated comments.

Link: https://lore.kernel.org/r/20210614090337.29557-6-jinpu.wang@ionos.com
Suggested-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-18 13:47:13 -03:00
Guoqing Jiang
354462eb7f RDMA/rtrs: Rename cq_size/queue_size to cq_num/queue_num
Those variables are passed to create_cq, create_qp, rtrs_iu_alloc and
rtrs_iu_free, so these *_size means the num of unit. And cq_size also
means number of cq element.

Also move the setting of cq_num to common path.

Link: https://lore.kernel.org/r/20210614090337.29557-5-jinpu.wang@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-18 13:47:13 -03:00
Md Haris Iqbal
b012f0ad53 RDMA/rtrs: RDMA_RXE requires more number of WR
When using rdma_rxe, post_one_recv() returns ENOMEM error due to the full
recv queue.  This patch increase the number of WR for receive queue to
support all devices.

Link: https://lore.kernel.org/r/20210614090337.29557-4-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-18 13:47:13 -03:00
Jack Wang
5e91eabf66 RDMA/rtrs-srv: Set minimal max_send_wr and max_recv_wr
Currently rtrs when create_qp use a coarse numbers (bigger in general),
which leads to hardware create more resources which only waste memory with
no benefits.

For max_send_wr, we don't really need alway max_qp_wr size when creating
qp, reduce it to cq_size.

For max_recv_wr,  cq_size is enough.

With the patch when sess_queue_depth=128, per session (2 paths) memory
consumption reduced from 188 MB to 65MB

When always_invalidate is enabled, we need send more wr, so treat it
special.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210614090337.29557-2-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-18 13:47:12 -03:00
Gioh Kim
2371c40354 RDMA/rtrs-srv: Fix memory leak of unfreed rtrs_srv_stats object
When closing a session, currently the rtrs_srv_stats object in the
closing session is freed by kobject release. But if it failed
to create a session by various reasons, it must free the rtrs_srv_stats
object directly because kobject is not created yet.

This problem is found by kmemleak as below:

1. One client machine maps /dev/nullb0 with session name 'bla':
root@test1:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb0" > /sys/devices/virtual/rnbd-client/ctl/map_device

2. Another machine failed to create a session with the same name 'bla':
root@test2:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb1" > /sys/devices/virtual/rnbd-client/ctl/map_device
-bash: echo: write error: Connection reset by peer

3. The kmemleak on server machine reported an error:
unreferenced object 0xffff888033cdc800 (size 128):
  comm "kworker/2:1", pid 83, jiffies 4295086585 (age 2508.680s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000a72903b2>] __alloc_sess+0x1d4/0x1250 [rtrs_server]
    [<00000000d1e5321e>] rtrs_srv_rdma_cm_handler+0xc31/0xde0 [rtrs_server]
    [<00000000bb2f6e7e>] cma_ib_req_handler+0xdc5/0x2b50 [rdma_cm]
    [<00000000e896235d>] cm_process_work+0x2d/0x100 [ib_cm]
    [<00000000b6866c5f>] cm_req_handler+0x11bc/0x1c40 [ib_cm]
    [<000000005f5dd9aa>] cm_work_handler+0xe65/0x3cf2 [ib_cm]
    [<00000000610151e7>] process_one_work+0x4bc/0x980
    [<00000000541e0f77>] worker_thread+0x78/0x5c0
    [<00000000423898ca>] kthread+0x191/0x1e0
    [<000000005a24b239>] ret_from_fork+0x3a/0x50

Fixes: 39c2d639ca ("RDMA/rtrs-srv: Set .release function for rtrs srv device during device init")
Link: https://lore.kernel.org/r/20210528113018.52290-18-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Gioh Kim
07c1402729 RDMA/rtrs-srv: Duplicated session name is not allowed
If two clients try to use the same session name, rtrs-server generates a
kernel error that it failed to create the sysfs because the filename
is duplicated.

This patch adds code to check if there already exists the same session
name with the different UUID. If a client tries to add more session,
it sends the UUID and the session name. Therefore it is ok if there is
already same session name with the same UUID. The rtrs-server must fail
only-if there is the same session name with the different UUID.

Link: https://lore.kernel.org/r/20210528113018.52290-17-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Md Haris Iqbal
0cdfb3b207 RDMA/rtrs-srv: Replace atomic_t with percpu_ref for ids_inflight
ids_inflight is used to track the inflight IOs. But the use of atomic_t
variable can cause performance drops and can also become a performance
bottleneck.

This commit replaces the use of atomic_t with a percpu_ref structure. The
advantage it offers is, it doesn't check if the reference has fallen to 0,
until the user explicitly signals it to; and that is done by the
percpu_ref_kill() function call. After that, the percpu_ref structure
behaves like an atomic_t and for every put call, checks whether the
reference has fallen to 0 or not.

rtrs_srv_stats_rdma_to_str shows the count of ids_inflight as 0
for user-mode tools not to be confused.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210528113018.52290-14-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Guoqing Jiang
0aedfb695f RDMA/rtrs-srv: Kill __rtrs_srv_change_state
No need since the only user is rtrs_srv_change_state.

Link: https://lore.kernel.org/r/20210528113018.52290-11-jinpu.wang@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Dima Stepanov
2d612f0d3d RDMA/rtrs: Use strscpy instead of strlcpy
During checkpatch analyzing the following warning message was found:
  WARNING:STRLCPY: Prefer strscpy over strlcpy - see:
  https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Fix it by using strscpy calls instead of strlcpy.

Link: https://lore.kernel.org/r/20210528113018.52290-8-jinpu.wang@ionos.com
Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Gioh Kim
3f3d0eabc1 RDMA/rtrs: Define MIN_CHUNK_SIZE
Define MIN_CHUNK_SIZE to replace the hard-coding number.
We need 4k for metadata, so MIN_CHUNK_SIZE should be at least 8k.

Link: https://lore.kernel.org/r/20210528113018.52290-7-jinpu.wang@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Guoqing Jiang
485f2fb1a0 RDMA/rtrs-srv: Clean up the code in __rtrs_srv_change_state
No need to use double switch to check the change of state everywhere,
let's change them to "if" to reduce size.

Link: https://lore.kernel.org/r/20210528113018.52290-5-jinpu.wang@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:58 -03:00
Md Haris Iqbal
6564b11031 RDMA/rtrs-srv: Add error messages for cases when failing RDMA connection
It was difficult to find out why it failed to establish RDMA
connection. This patch adds some messages to show which function
has failed why.

Link: https://lore.kernel.org/r/20210528113018.52290-4-jinpu.wang@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:52:41 -03:00
Guoqing Jiang
cfbeb0b9bb RDMA/rtrs-srv: Kill reject_w_econnreset label
We can goto reject_w_err label after initialize err with -ECONNRESET.

Link: https://lore.kernel.org/r/20210528113018.52290-2-jinpu.wang@ionos.com
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28 20:43:38 -03:00
Linus Torvalds
f34b2cf178 RDMA merge window pull request
This is significantly bug fixes and general cleanups. The noteworthy new
 features are fairly small:
 
 - XRC support for HNS and improves RQ operations
 
 - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and
   qib
 
 - Quite a few general cleanups on spelling, error handling, static checker
   detections, etc
 
 - Increase the number of device ports supported beyond 255. High port
   count software switches now exist
 
 - Several bug fixes for rtrs
 
 - mlx5 Device Memory support for host controlled atomics
 
 - Report SRQ tables through to rdma-tool
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmCMMHEACgkQOG33FX4g
 mxri3Q//RAgIExCGHebQ9xkptZHVyTLLJMpiMl2cqk3ZVRdDZ7QdiQjIqY2KqlUK
 nxBj7EXJeX6rV5a1xqCcOO1gBetB28TSwnCNE2ZqrXP5B59ISW8D052IWza3UkUz
 WmHLARxHQlyKBWA4+ZAgfoUGL0NmWA8QPf56t/RK/3/OsuYnGzcnWmmFbt8XKFcH
 NtO3KC45mKWDqqG0A0XRrLbEQz/ElO3OuPBqlBKgB3ZgGPzgsOUTOGkm1tCcZ89L
 /pvZGB7SklKZdCX8TxdpVGd9h0zHl8pqh1yEzvTA1ypNAYSUId2mvZXluU8J5yJl
 FLk7E1IxE5050FNEc7T5uZdUVntulYiqL2558coRI34l5w26pKGjIMxw/nTB8hg8
 4ZfBtKVemIG6yzW5Up6iBpK7qWYpvLWVShwYAWhbNsjN7JGzJuh1gJnjbmYgyz2P
 RTMU9wjFPLL2wZxg4LDHACVJNBb82j6KKuE+kZWpk11ro7INw9+7YwRuTo7/ezxC
 BwXKu8wF4igwSigV55jM+WnGXLhxdC3qmx/2cbtWyLM/PzdRL96tM0RWW5v8/Nv7
 teFhkt+f3RVqcfYH5K1qCXy3UFrxG6bxFSvcHHSBx2bdIrqhuTY5FqszAYImeW2j
 iHoyIsuSuGu79HQgOzAQZsEyksWi6OYDvA9Q9VBoPP4bJ3DOAa4=
 =vsXA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This is significantly bug fixes and general cleanups. The noteworthy
  new features are fairly small:

   - XRC support for HNS and improves RQ operations

   - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw
     and qib

   - Quite a few general cleanups on spelling, error handling, static
     checker detections, etc

   - Increase the number of device ports supported beyond 255. High port
     count software switches now exist

   - Several bug fixes for rtrs

   - mlx5 Device Memory support for host controlled atomics

   - Report SRQ tables through to rdma-tool"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (145 commits)
  IB/qib: Remove redundant assignment to ret
  RDMA/nldev: Add copy-on-fork attribute to get sys command
  RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res
  RDMA/siw: Fix a use after free in siw_alloc_mr
  IB/hfi1: Remove redundant variable rcd
  RDMA/nldev: Add QP numbers to SRQ information
  RDMA/nldev: Return SRQ information
  RDMA/restrack: Add support to get resource tracking for SRQ
  RDMA/nldev: Return context information
  RDMA/core: Add CM to restrack after successful attachment to a device
  RDMA/cma: Skip device which doesn't support CM
  RDMA/rxe: Fix a bug in rxe_fill_ip_info()
  RDMA/mlx5: Expose private query port
  RDMA/mlx4: Remove an unused variable
  RDMA/mlx5: Fix type assignment for ICM DM
  IB/mlx5: Set right RoCE l3 type and roce version while deleting GID
  RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails
  RDMA/cxgb4: add missing qpid increment
  IB/ipoib: Remove unnecessary struct declaration
  RDMA/bnxt_re: Get rid of custom module reference counting
  ...
2021-05-01 09:15:05 -07:00
Gioh Kim
c81cba8551 block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev
struct rtrs_srv is not used when handling rnbd_srv_rdma_ev messages, so
cleaned up
rdma_ev function pointer in rtrs_srv_ops also is changed.

Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-16-gi-oh.kim@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-20 08:59:04 -06:00
Gioh Kim
42cdc1909d RDMA/rtrs-srv: More debugging info when fail to send reply
It does not help to debug if it only print error message
without any debugging information which session and connection
the error happened.

Link: https://lore.kernel.org/r/20210406123639.202899-3-gi-oh.kim@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:51:34 -03:00
Jack Wang
11b74cbf8e RDMA/rtrs: Cleanup unused 's' variable in __alloc_sess
Link: https://lore.kernel.org/r/20210325153308.1214057-18-gi-oh.kim@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01 15:59:46 -03:00
Gioh Kim
88e2f10564 RDMA/rtrs-srv: Report temporary sessname for error message
Before receiving the session name, the error message cannot include any
information about which connection generates the error.

This patch stores the addresses of source and target in the sessname field
to show which generates the error. That field will be over-written
when receiving the session name from client.

Link: https://lore.kernel.org/r/20210325153308.1214057-17-gi-oh.kim@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01 15:59:46 -03:00
Guoqing Jiang
57dae8baa6 RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler
Let these cases share the same path since all of them need to close
session.

Link: https://lore.kernel.org/r/20210325153308.1214057-11-gi-oh.kim@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01 15:40:43 -03:00
Jack Wang
3b89e92c2a RDMA/rtrs: Use new shared CQ mechanism
Have the driver use shared CQs which provids a ~10%-20% improvement during
test.

Instead of opening a CQ for each QP per connection, a CQ for each QP will
be provided by the RDMA core driver that will be shared between the QPs on
that core reducing interrupt overhead.

Link: https://lore.kernel.org/r/20210222141551.54345-1-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 14:50:15 -04:00
Jack Wang
ed40852967 RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
smatch gives the warning:

  drivers/infiniband/ulp/rtrs/rtrs-srv.c:1805 rtrs_rdma_connect() warn: passing zero to 'PTR_ERR'

Which is trying to say smatch has shown that srv is not an error pointer
and thus cannot be passed to PTR_ERR.

The solution is to move the list_add() down after full initilization of
rtrs_srv. To avoid holding the srv_mutex too long, only hold it during the
list operation as suggested by Leon.

Fixes: 03e9b33a0f ("RDMA/rtrs: Only allow addition of path to an already established session")
Link: https://lore.kernel.org/r/20210216143807.65923-1-jinpu.wang@cloud.ionos.com
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16 20:01:56 -04:00
Gioh Kim
f7452a7e96 RDMA/rtrs-srv: fix memory leak by missing kobject free
kmemleak reported an error as below:

  unreferenced object 0xffff8880674b7640 (size 64):
    comm "kworker/4:1H", pid 113, jiffies 4296403507 (age 507.840s)
    hex dump (first 32 bytes):
      69 70 3a 31 39 32 2e 31 36 38 2e 31 32 32 2e 31  ip:192.168.122.1
      31 30 40 69 70 3a 31 39 32 2e 31 36 38 2e 31 32  10@ip:192.168.12
    backtrace:
      [<0000000054413611>] kstrdup+0x2e/0x60
      [<0000000078e3120a>] kobject_set_name_vargs+0x2f/0xb0
      [<00000000ca2be3ee>] kobject_init_and_add+0xb0/0x120
      [<0000000062ba5e78>] rtrs_srv_create_sess_files+0x14c/0x314 [rtrs_server]
      [<00000000b45b7217>] rtrs_srv_info_req_done+0x5b1/0x800 [rtrs_server]
      [<000000008fc5aa8f>] __ib_process_cq+0x94/0x100 [ib_core]
      [<00000000a9599cb4>] ib_cq_poll_work+0x32/0xc0 [ib_core]
      [<00000000cfc376be>] process_one_work+0x4bc/0x980
      [<0000000016e5c96a>] worker_thread+0x78/0x5c0
      [<00000000c20b8be0>] kthread+0x191/0x1e0
      [<000000006c9c0003>] ret_from_fork+0x3a/0x50

It is caused by the not-freed kobject of rtrs_srv_sess.  The kobject
embedded in rtrs_srv_sess has ref-counter 2 after calling
process_info_req(). Therefore it must call kobject_put twice.  Currently
it calls kobject_put only once at rtrs_srv_destroy_sess_files because
kobject_del removes the state_in_sysfs flag and then kobject_put in
free_sess() is not called.

This patch moves kobject_del() into free_sess() so that the kobject of
rtrs_srv_sess can be freed. And also this patch adds the missing call of
sysfs_remove_group() to clean-up the sysfs directory.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-4-jinpu.wang@cloud.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16 14:42:59 -04:00
Md Haris Iqbal
03e9b33a0f RDMA/rtrs: Only allow addition of path to an already established session
While adding a path from the client side to an already established
session, it was possible to provide the destination IP to a different
server. This is dangerous.

This commit adds an extra member to the rtrs_msg_conn_req structure, named
first_conn; which is supposed to notify if the connection request is the
first for that session or not.

On the server side, if a session does not exist but the first_conn
received inside the rtrs_msg_conn_req structure is 1, the connection
request is failed. This signifies that the connection request is for an
already existing session, and since the server did not find one, it is an
wrong connection request.

Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-3-jinpu.wang@cloud.ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16 14:42:59 -04:00
Jack Wang
e6daa8f61d RDMA/rtrs-srv: Fix stack-out-of-bounds
BUG: KASAN: stack-out-of-bounds in _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
  Read of size 4 at addr ffff8880d5a7f980 by task kworker/0:1H/565

  CPU: 0 PID: 565 Comm: kworker/0:1H Tainted: G           O      5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10
  Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
  Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
  Call Trace:
   dump_stack+0x96/0xe0
   print_address_description.constprop.4+0x1f/0x300
   ? irq_work_claim+0x2e/0x50
   __kasan_report.cold.8+0x78/0x92
   ? _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
   kasan_report+0x10/0x20
   _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib]
   ? check_chain_key+0x1d7/0x2e0
   ? _mlx4_ib_post_recv+0x630/0x630 [mlx4_ib]
   ? lockdep_hardirqs_on+0x1a8/0x290
   ? stack_depot_save+0x218/0x56e
   ? do_profile_hits.isra.6.cold.13+0x1d/0x1d
   ? check_chain_key+0x1d7/0x2e0
   ? save_stack+0x4d/0x80
   ? save_stack+0x19/0x80
   ? __kasan_slab_free+0x125/0x170
   ? kfree+0xe7/0x3b0
   rdma_write_sg+0x5b0/0x950 [rtrs_server]

The problem is when we send imm_wr, the type should be ib_rdma_wr, so hw
driver like mlx4 can do rdma_wr(wr), so fix it by use the ib_rdma_wr as
type for imm_wr.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210212134525.103456-2-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16 14:42:59 -04:00
Jack Wang
6f5d1b3016 RDMA/rtrs-srv: Init wr_cnt as 1
Fix up wr_avail accounting. if wr_cnt is 0, then we do SIGNAL for first
wr, in completion we add queue_depth back, which is not right in the
sense of tracking for available wr.

So fix it by init wr_cnt to 1.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20201217141915.56989-19-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15 15:25:10 -04:00
Jack Wang
e8ae7ddb48 RDMA/rtrs-srv: Do not signal REG_MR
We do not need to wait for REG_MR completion, so remove the
SIGNAL flag.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20201217141915.56989-18-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15 15:25:09 -04:00
Jack Wang
b38041d50a RDMA/rtrs: Do not signal for heatbeat
For HB, there is no need to generate signal for completion.

Also remove a comment accordingly.

Fixes: c0894b3ea6 ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20201217141915.56989-16-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reported-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15 15:25:09 -04:00
Jack Wang
8537f2de65 RDMA/rtrs-srv: Fix missing wr_cqe
We had a few places wr_cqe is not set, which could lead to NULL pointer
deref or GPF in error case.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20201217141915.56989-14-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15 15:25:09 -04:00
Guoqing Jiang
f77c4839ee RDMA/rtrs-srv: Jump to dereg_mr label if allocate iu fails
The rtrs_iu_free is called in rtrs_iu_alloc if memory is limited, so we
don't need to free the same iu again.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20201217141915.56989-7-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15 15:25:09 -04:00
Jack Wang
99f0c38079 RDMA/rtrs-srv: Release lock before call into close_sess
In this error case, we don't need hold mutex to call close_sess.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20201217141915.56989-4-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Tested-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15 15:25:08 -04:00
Jack Wang
7490fd1fe8 RDMA/rtrs: Extend ibtrs_cq_qp_create
rtrs does not have same limit for both max_send_wr and max_recv_wr,
To allow client and server set different values, export in a separate
parameter for rtrs_cq_qp_create.

Also fix the type accordingly, u32 should be used instead of u16.

Fixes: c0894b3ea6 ("RDMA/rtrs: core: lib functions shared between client and server modules")
Link: https://lore.kernel.org/r/20201217141915.56989-2-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15 15:24:50 -04:00
Guoqing Jiang
ffea6ad133 RDMA/rtrs-srv: Kill rtrs_srv_change_state_get_old
This function isn't needed since no caller checks the old_state of sess.

Link: https://lore.kernel.org/r/20201023074353.21946-11-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:41 -03:00
Gioh Kim
8bd372ace3 RDMA/rtrs: Remove unnecessary argument dir of rtrs_iu_free
The direction of DMA operation is already in the rtrs_iu

Link: https://lore.kernel.org/r/20201023074353.21946-8-jinpu.wang@cloud.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:40 -03:00
Guoqing Jiang
d715ff8acb RDMA/rtrs-srv: Don't guard the whole __alloc_srv with srv_mutex
The purpose of srv_mutex is to protect srv_list as in put_srv, so no need
to hold it when allocate memory for srv since it could be time consuming.

Otherwise if one machine has limited memory, rsrv_close_work could be
blocked for a longer time due to the mutex is held by get_or_create_srv
since it can't get memory in time.

  INFO: task kworker/1:1:27478 blocked for more than 120 seconds.
        Tainted: G           O    4.14.171-1-storage #4.14.171-1.3~deb9
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  kworker/1:1     D    0 27478      2 0x80000000
  Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server]
  Call Trace:
   ? __schedule+0x38c/0x7e0
   schedule+0x32/0x80
   schedule_preempt_disabled+0xa/0x10
   __mutex_lock.isra.2+0x25e/0x4d0
   ? put_srv+0x44/0x100 [rtrs_server]
   put_srv+0x44/0x100 [rtrs_server]
   rtrs_srv_close_work+0x16c/0x280 [rtrs_server]
   process_one_work+0x1c5/0x3c0
   worker_thread+0x47/0x3e0
   kthread+0xfc/0x130
   ? trace_event_raw_event_workqueue_execute_start+0xa0/0xa0
   ? kthread_create_on_node+0x70/0x70
   ret_from_fork+0x1f/0x30

Let's move all the logics from __find_srv_and_get and __alloc_srv to
get_or_create_srv, and remove the two functions. Then it should be safe
for multiple processes to access the same srv since it is protected with
srv_mutex.

And since we don't want to allocate chunks with srv_mutex held, let's
check the srv->refcount after get srv because the chunks could not be
allocated yet.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20201023074353.21946-6-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:40 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Md Haris Iqbal
558d52b297 RDMA/rtrs-srv: Incorporate ib_register_client into rtrs server init
The rnbd_server module's communication manager (cm) initialization depends
on the registration of the "network namespace subsystem" of the RDMA CM
agent module. As such, when the kernel is configured to load the
rnbd_server and the RDMA cma module during initialization; and if the
rnbd_server module is initialized before RDMA cma module, a null ptr
dereference occurs during the RDMA bind operation.

Call trace:

  Call Trace:
   ? xas_load+0xd/0x80
   xa_load+0x47/0x80
   cma_ps_find+0x44/0x70
   rdma_bind_addr+0x782/0x8b0
   ? get_random_bytes+0x35/0x40
   rtrs_srv_cm_init+0x50/0x80
   rtrs_srv_open+0x102/0x180
   ? rnbd_client_init+0x6e/0x6e
   rnbd_srv_init_module+0x34/0x84
   ? rnbd_client_init+0x6e/0x6e
   do_one_initcall+0x4a/0x200
   kernel_init_freeable+0x1f1/0x26e
   ? rest_init+0xb0/0xb0
   kernel_init+0xe/0x100
   ret_from_fork+0x22/0x30
  Modules linked in:
  CR2: 0000000000000015

All this happens cause the cm init is in the call chain of the module
init, which is not a preferred practice.

So remove the call to rdma_create_id() from the module init call chain.
Instead register rtrs-srv as an ib client, which makes sure that the
rdma_create_id() is called only when an ib device is added.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20200907103106.104530-1-haris.iqbal@cloud.ionos.com
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:31:08 -03:00
Md Haris Iqbal
39c2d639ca RDMA/rtrs-srv: Set .release function for rtrs srv device during device init
The device .release function was not being set during the device
initialization. This was leading to the below warning, in error cases when
put_srv was called before device_add was called.

Warning:

Device '(null)' does not have a release() function, it is broken and must
be fixed. See Documentation/kobject.txt.

So, set the device .release function during device initialization in the
__alloc_srv() function.

Fixes: baa5b28b7a ("RDMA/rtrs-srv: Replace device_register with device_initialize and device_add")
Link: https://lore.kernel.org/r/20200907102216.104041-1-haris.iqbal@cloud.ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:28:14 -03:00
Md Haris Iqbal
baa5b28b7a RDMA/rtrs-srv: Replace device_register with device_initialize and device_add
There are error cases when we will call free_srv before device kobject is
initialized; in such cases calling put_device generates the following
warning:

 kobject: '(null)' (000000009f5445ed): is not initialized, yet
 kobject_put() is being called.

So call device_initialize() only once when the server is allocated. If we
end up calling put_srv() and subsequently free_srv(), our call to
put_device() would result in deletion of the obj. Call device_add() later
when we actually have a connection. Correspondingly, call device_del()
instead of device_unregister() when srv->dev_ref falls to 0.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20200811092722.2450-1-haris.iqbal@cloud.ionos.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 13:44:53 -03:00
Jack Wang
03ed5a8cda RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
lockdep triggers a warning from time to time when running a regression
test:

 rnbd_client L685: </dev/nullb0@bla> Device disconnected.
 rnbd_client L1756: Unloading module

 workqueue: WQ_MEM_RECLAIM rtrs_client_wq:rtrs_clt_reconnect_work [rtrs_client] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
 WARNING: CPU: 2 PID: 18824 at kernel/workqueue.c:2517 check_flush_dependency+0xad/0x130

The root cause is workqueue core expect flushing should not be done for a
!WQ_MEM_RECLAIM wq from a WQ_MEM_RECLAIM workqueue.

In above case ib_addr workqueue without WQ_MEM_RECLAIM, but rtrs_wq
WQ_MEM_RECLAIM.

To avoid the warning, remove the WQ_MEM_RECLAIM flag.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20200724111508.15734-4-haris.iqbal@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:26:53 -03:00