1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

2975 commits

Author SHA1 Message Date
Jason Gunthorpe
a4e63bce14 RDMA/odp: Ensure the mm is still alive before creating an implicit child
Registration of a mmu_notifier requires the caller to hold a mmget() on
the mm as registration is not permitted to race with exit_mmap(). There is
a BUG_ON inside the mmu_notifier to guard against this.

Normally creating a umem is done against current which implicitly holds
the mmget(), however an implicit ODP child is created from a pagefault
work queue and is not guaranteed to have a mmget().

Call mmget() around this registration and abort faulting if the MM has
gone to exit_mmap().

Before the patch below the notifier was registered when the implicit ODP
parent was created, so there was no chance to register a notifier outside
of current.

Fixes: c571feca2d ("RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'")
Link: https://lore.kernel.org/r/20200227114118.94736-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:56:07 -04:00
Maor Gottlieb
e38b55ea04 RDMA/core: Fix protection fault in ib_mr_pool_destroy
Fix NULL pointer dereference in the error flow of ib_create_qp_user
when accessing to uninitialized list pointers - rdma_mrs and sig_mrs.
The following crash from syzkaller revealed it.

  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 23167 Comm: syz-executor.1 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:ib_mr_pool_destroy+0x81/0x1f0
  Code: 00 00 fc ff df 49 c1 ec 03 4d 01 fc e8 a8 ea 72 fe 41 80 3c 24 00
  0f 85 62 01 00 00 48 8b 13 48 89 d6 4c 8d 6a c8 48 c1 ee 03 <42> 80 3c
  3e 00 0f 85 34 01 00 00 48 8d 7a 08 4c 8b 02 48 89 fe 48
  RSP: 0018:ffffc9000951f8b0 EFLAGS: 00010046
  RAX: 0000000000040000 RBX: ffff88810f268038 RCX: ffffffff82c41628
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc9000951f850
  RBP: ffff88810f268020 R08: 0000000000000004 R09: fffff520012a3f0a
  R10: 0000000000000001 R11: fffff520012a3f0a R12: ffffed1021e4d007
  R13: ffffffffffffffc8 R14: 0000000000000246 R15: dffffc0000000000
  FS:  00007f54bc788700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000116920002 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   rdma_rw_cleanup_mrs+0x15/0x30
   ib_destroy_qp_user+0x674/0x7d0
   ib_create_qp_user+0xb01/0x11c0
   create_qp+0x1517/0x2130
   ib_uverbs_create_qp+0x13e/0x190
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x465b49
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
  f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f54bc787c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
  RDX: 0000000000000040 RSI: 0000000020000540 RDI: 0000000000000003
  RBP: 00007f54bc787c70 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54bc7886bc
  R13: 00000000004ca2ec R14: 000000000070ded0 R15: 0000000000000005

Fixes: a060b5629a ("IB/core: generic RDMA READ/WRITE API")
Link: https://lore.kernel.org/r/20200227112708.93023-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:43:02 -04:00
Jason Gunthorpe
c13cac2a21 Linux 5.6-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5cOX8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGw0AH/0nex1uEpzUTm+Gw
 D8QPFr3y61sYLu7sIMVt39+Zl6OSxvOOX14QIM/mrNrzjjRI8EXGvYgES5gSO4on
 6NLS6/64c1oQThDHCsxusKoSWLZ9KqP2vRPt7tZjn7DZMzEsuLhlINKBeupcqALX
 FnOBr768P+if/j0WcDR2pBaMg3ch+XC5sfYav7kapjgWUqCx9BvrHKLXXdlEGUC0
 7Ku7PH+nF7CIHiTay+i89odvOd8aLGsa/SUf5XGauKkH65VgQkmksgPeZUPqTnyC
 MEsyLJLfn4AP3ySwqzfSLac8jqZG8FGBt4DgM2MQBHibctzfeMIznfcfh/A8+Edx
 jqLKLAs=
 =4075
 -----END PGP SIGNATURE-----

Merge tag 'v5.6-rc4' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:11:06 -04:00
Maor Gottlieb
801b67f3ea RDMA/core: Fix pkey and port assignment in get_new_pps
When port is part of the modify mask, then we should take it from the
qp_attr and not from the old pps. Same for PKEY. Otherwise there are
panics in some configurations:

  RIP: 0010:get_pkey_idx_qp_list+0x50/0x80 [ib_core]
  Code: c7 18 e8 13 04 30 ef 0f b6 43 06 48 69 c0 b8 00 00 00 48 03 85 a0 04 00 00 48 8b 50 20 48 8d 48 20 48 39 ca 74 1a 0f b7 73 04 <66> 39 72 10 75 08 eb 10 66 39 72 10 74 0a 48 8b 12 48 39 ca 75 f2
  RSP: 0018:ffffafb3480932f0 EFLAGS: 00010203
  RAX: ffff98059ababa10 RBX: ffff980d926e8cc0 RCX: ffff98059ababa30
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff98059ababa28
  RBP: ffff98059b940000 R08: 00000000000310c0 R09: ffff97fe47c07480
  R10: 0000000000000036 R11: 0000000000000200 R12: 0000000000000071
  R13: ffff98059b940000 R14: ffff980d87f948a0 R15: 0000000000000000
  FS:  00007f88deb31740(0000) GS:ffff98059f600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000010 CR3: 0000000853e26001 CR4: 00000000001606e0
  Call Trace:
   port_pkey_list_insert+0x3d/0x1b0 [ib_core]
   ? kmem_cache_alloc_trace+0x215/0x220
   ib_security_modify_qp+0x226/0x3a0 [ib_core]
   _ib_modify_qp+0xcf/0x390 [ib_core]
   ipoib_init_qp+0x7f/0x200 [ib_ipoib]
   ? rvt_modify_port+0xd0/0xd0 [rdmavt]
   ? ib_find_pkey+0x99/0xf0 [ib_core]
   ipoib_ib_dev_open_default+0x1a/0x200 [ib_ipoib]
   ipoib_ib_dev_open+0x96/0x130 [ib_ipoib]
   ipoib_open+0x44/0x130 [ib_ipoib]
   __dev_open+0xd1/0x160
   __dev_change_flags+0x1ab/0x1f0
   dev_change_flags+0x23/0x60
   do_setlink+0x328/0xe30
   ? __nla_validate_parse+0x54/0x900
   __rtnl_newlink+0x54e/0x810
   ? __alloc_pages_nodemask+0x17d/0x320
   ? page_fault+0x30/0x50
   ? _cond_resched+0x15/0x30
   ? kmem_cache_alloc_trace+0x1c8/0x220
   rtnl_newlink+0x43/0x60
   rtnetlink_rcv_msg+0x28f/0x350
   ? kmem_cache_alloc+0x1fb/0x200
   ? _cond_resched+0x15/0x30
   ? __kmalloc_node_track_caller+0x24d/0x2d0
   ? rtnl_calcit.isra.31+0x120/0x120
   netlink_rcv_skb+0xcb/0x100
   netlink_unicast+0x1e0/0x340
   netlink_sendmsg+0x317/0x480
   ? __check_object_size+0x48/0x1d0
   sock_sendmsg+0x65/0x80
   ____sys_sendmsg+0x223/0x260
   ? copy_msghdr_from_user+0xdc/0x140
   ___sys_sendmsg+0x7c/0xc0
   ? skb_dequeue+0x57/0x70
   ? __inode_wait_for_writeback+0x75/0xe0
   ? fsnotify_grab_connector+0x45/0x80
   ? __dentry_kill+0x12c/0x180
   __sys_sendmsg+0x58/0xa0
   do_syscall_64+0x5b/0x200
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f88de467f10

Link: https://lore.kernel.org/r/20200227125728.100551-1-leon@kernel.org
Cc: <stable@vger.kernel.org>
Fixes: 1dd017882e ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:16:08 -04:00
Jason Gunthorpe
c14dfddbd8 RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
The algorithm pre-allocates a cm_id since allocation cannot be done while
holding the cm.lock spinlock, however it doesn't free it on one error
path, leading to a memory leak.

Fixes: 067b171b86 ("IB/cm: Share listening CM IDs")
Link: https://lore.kernel.org/r/20200221152023.GA8680@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27 16:43:29 -04:00
Jason Gunthorpe
7c11910783 RDMA/ucma: Put a lock around every call to the rdma_cm layer
The rdma_cm must be used single threaded.

This appears to be a bug in the design, as it does have lots of locking
that seems like it should allow concurrency. However, when it is all said
and done every single place that uses the cma_exch() scheme is broken, and
all the unlocked reads from the ucma of the cm_id data are wrong too.

syzkaller has been finding endless bugs related to this.

Fixing this in any elegant way is some enormous amount of work. Take a
very big hammer and put a mutex around everything to do with the
ucma_context at the top of every syscall.

Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Link: https://lore.kernel.org/r/20200218210432.GA31966@ziepe.ca
Reported-by: syzbot+adb15cf8c2798e4e0db4@syzkaller.appspotmail.com
Reported-by: syzbot+e5579222b6a3edd96522@syzkaller.appspotmail.com
Reported-by: syzbot+4b628fcc748474003457@syzkaller.appspotmail.com
Reported-by: syzbot+29ee8f76017ce6cf03da@syzkaller.appspotmail.com
Reported-by: syzbot+6956235342b7317ec564@syzkaller.appspotmail.com
Reported-by: syzbot+b358909d8d01556b790b@syzkaller.appspotmail.com
Reported-by: syzbot+6b46b135602a3f3ac99e@syzkaller.appspotmail.com
Reported-by: syzbot+8458d13b13562abf6b77@syzkaller.appspotmail.com
Reported-by: syzbot+bd034f3fdc0402e942ed@syzkaller.appspotmail.com
Reported-by: syzbot+c92378b32760a4eef756@syzkaller.appspotmail.com
Reported-by: syzbot+68b44a1597636e0b342c@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27 16:40:40 -04:00
Gustavo A. R. Silva
5b361328ca RDMA: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Link: https://lore.kernel.org/r/20200213010425.GA13068@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> # added a few more
2020-02-20 13:33:51 -04:00
Max Gurtovoy
6affca140c RDMA/rw: Fix error flow during RDMA context initialization
In case the SGL was mapped for P2P DMA operation, we must unmap it using
pci_p2pdma_unmap_sg during the error unwind of rdma_rw_ctx_init()

Fixes: 7f73eac3a7 ("PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()")
Link: https://lore.kernel.org/r/20200220100819.41860-1-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-20 13:19:21 -04:00
Selvin Xavier
779820c2e1 RDMA/core: Add helper function to retrieve driver gid context from gid attr
Adding a helper function to retrieve the driver gid context from the gid
attr.

Link: https://lore.kernel.org/r/1582107594-5180-2-git-send-email-selvin.xavier@broadcom.com
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 16:54:25 -04:00
Nathan Chancellor
4ca501d6aa RDMA/core: Fix use of logical OR in get_new_pps
Clang warns:

../drivers/infiniband/core/security.c:351:41: warning: converting the
enum constant to a boolean [-Wint-in-bool-context]
        if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) {
                                               ^
1 warning generated.

A bitwise OR should have been used instead.

Fixes: 1dd017882e ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
Link: https://lore.kernel.org/r/20200217204318.13609-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/889
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 16:42:12 -04:00
Jason Gunthorpe
167b95ec88 RDMA/ucma: Use refcount_t for the ctx->ref
Don't use an atomic as a refcount.

Link: https://lore.kernel.org/r/20200218191657.GA29724@ziepe.ca
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 16:41:00 -04:00
Parav Pandit
e4103312d7 Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
This reverts commit 219d2e9dfd.

The call chain below requires the cm_id_priv's destination address to be
setup before performing rdma_bind_addr(). Otherwise source port allocation
fails as cma_port_is_unique() no longer sees the correct tuple to allow
duplicate users of the source port.

rdma_resolve_addr()
  cma_bind_addr()
    rdma_bind_addr()
      cma_get_port()
        cma_alloc_any_port()
          cma_port_is_unique() <- compared with zero daddr

This can result in false failures to connect, particularly if the source
port range is restricted.

Fixes: 219d2e9dfd ("RDMA/cma: Simplify rdma_resolve_addr() error flow")
Link: https://lore.kernel.org/r/20200212072635.682689-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 14:25:52 -04:00
Jason Gunthorpe
b72bfc965e RDMA/core: Get rid of ib_create_qp_user
This function accepts a udata but does nothing with it, and is never
passed a !NULL udata. Rename it to ib_create_qp which was the only caller
and remove the udata.

Link: https://lore.kernel.org/r/20200213191911.GA9898@ziepe.ca
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-18 20:27:37 -04:00
Michael Guralnik
f03d9fadfe RDMA/core: Add weak ordering dma attr to dma mapping
For memory regions registered with IB_ACCESS_RELAXED_ORDERING will be dma
mapped with the DMA_ATTR_WEAK_ORDERING.

This will allow reads and writes to the mapping to be weakly ordered, such
change can enhance performance on some supporting architectures.

Link: https://lore.kernel.org/r/20200212073559.684139-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 13:38:02 -04:00
Leon Romanovsky
1dd017882e RDMA/core: Fix protection fault in get_pkey_idx_qp_list
We don't need to set pkey as valid in case that user set only one of pkey
index or port number, otherwise it will be resulted in NULL pointer
dereference while accessing to uninitialized pkey list.  The following
crash from Syzkaller revealed it.

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
  Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
  8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04
  02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
  RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
  RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
  RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
  RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
  R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
  R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
  FS:  00007f20777de700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   port_pkey_list_insert+0xd7/0x7c0
   ib_security_modify_qp+0x6fa/0xfc0
   _ib_modify_qp+0x8c4/0xbf0
   modify_qp+0x10da/0x16d0
   ib_uverbs_modify_qp+0x9a/0x100
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Message-Id: <20200212080651.GB679970@unreal>
2020-02-13 12:31:56 -04:00
Leon Romanovsky
ca750d4a9c RDMA/ucma: Mask QPN to be 24 bits according to IBTA
IBTA declares QPN as 24bits, mask input to ensure that kernel
doesn't get higher bits and ensure by adding WANR_ONCE() that
other CM users do the same.

Link: https://lore.kernel.org/r/20200212072635.682689-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 11:38:19 -04:00
Yonatan Cohen
9ea04d0df6 IB/umad: Fix kernel crash while unloading ib_umad
When disassociating a device from umad we must ensure that the sysfs
access is prevented before blocking the fops, otherwise assumptions in
syfs don't hold:

	    CPU0            	        CPU1
	 ib_umad_kill_port()        ibdev_show()
	    port->ib_dev = NULL
                                      dev_name(port->ib_dev)

The prior patch made an error in moving the device_destroy(), it should
have been split into device_del() (above) and put_device() (below). At
this point we already have the split, so move the device_del() back to its
original place.

  kernel stack
  PF: error_code(0x0000) - not-present page
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
  RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
  RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
  RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
  RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
  R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
  R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
  FS:  00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
  Call Trace:
   dev_attr_show+0x15/0x50
   sysfs_kf_seq_show+0xb8/0x1a0
   seq_read+0x12d/0x350
   vfs_read+0x89/0x140
   ksys_read+0x55/0xd0
   do_syscall_64+0x55/0x1b0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9:

Fixes: cf7ad30302 ("IB/umad: Avoid destroying device while it is accessed")
Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 10:00:50 -04:00
Michael Guralnik
a0767da777 RDMA/core: Add missing list deletion on freeing event queue
When the uobject file scheme was revised to allow device disassociation
from the file it became possible for read() to still happen the driver
destroys the uobject.

The old clode code was not tolerant to concurrent read, and when it was
moved to the driver destroy it creates a bug.

Ensure the event_list is empty after driver destroy by adding the missing
list_del(). Otherwise read() can trigger a use after free and double
kfree.

Fixes: f7c8416cce ("RDMA/core: Simplify destruction of FD uobjects")
Link: https://lore.kernel.org/r/20200212072635.682689-6-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 09:44:49 -04:00
Avihai Horon
a72f4ac1d7 RDMA/core: Fix invalid memory access in spec_filter_size
Add a check that the size specified in the flow spec header doesn't cause
an overflow when calculating the filter size, and thus prevent access to
invalid memory.  The following crash from syzkaller revealed it.

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:memchr_inv+0xd3/0x330
  Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85
  ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 <80> 3c 01
  00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01
  RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202
  RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04
  RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820
  RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009
  R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004
  R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040
  FS:  00007f9ee0e51700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   spec_filter_size.part.16+0x34/0x50
   ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770
   ib_uverbs_ex_create_flow+0x9ea/0x1b40
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x465b49
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
  f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
  RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004
  RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc
  R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff
  Modules linked in:
  Dumping ftrace buffer:
     (ftrace buffer empty)

Fixes: 94e03f11ad ("IB/uverbs: Add support for flow tag")
Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:14:52 -04:00
Parav Pandit
43fb5892cd RDMA/cma: Use refcount API to reflect refcount
Use a refcount_t for atomics being used as a refcount.

Link: https://lore.kernel.org/r/20200126142652.104803-8-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:00:40 -04:00
Parav Pandit
e368d23f57 RDMA/cma: Rename cma_device ref/deref helpers to to get/put
Helper functions which increment/decrement reference count of a
structure read better when they are named with the get/put suffix.

Hence, rename cma_ref/deref_id() to cma_id_get/put().  Also use
cma_get_id() wrapper to find the balancing put() calls.

Link: https://lore.kernel.org/r/20200126142652.104803-7-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:00:40 -04:00
Parav Pandit
be439912e7 RDMA/cma: Use refcount API to reflect refcount
Use the refcount variant to capture the reference counting of the cma
device structure.

Link: https://lore.kernel.org/r/20200126142652.104803-6-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:00:39 -04:00
Parav Pandit
5ff8c8fa44 RDMA/cma: Rename cma_device ref/deref helpers to to get/put
Helper functions which increment/decrement reference count of the
structure read better when they are named with the get/put suffix.

Hence, rename cma_ref/deref_dev() to cma_dev_get/put().

Link: https://lore.kernel.org/r/20200126142652.104803-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 13:55:13 -04:00
Parav Pandit
cc055dd3a7 RDMA/cma: Use RDMA device port iterator
Use RDMA device port iterator to avoid open coding.

Link: https://lore.kernel.org/r/20200126142652.104803-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 13:55:13 -04:00
Parav Pandit
081ea5195a RDMA/cma: Use a helper function to enqueue resolve work items
To avoid errors, with attaching ownership of work item and its cm_id
refcount which is decremented in work handler, tie them up in single
helper function. Also avoid code duplication.

Link: https://lore.kernel.org/r/20200126142652.104803-3-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 13:55:13 -04:00
Linus Torvalds
8fdd4019bc RDMA subsystem updates for 5.6
- Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4, rxe,
   i40iw
 
 - Larger series doing cleanup and rework for hns and hfi1.
 
 - Some general reworking of the CM code to make it a little more
   understandable
 
 - Unify the different code paths connected to the uverbs FD scheme
 
 - New UAPI ioctls conversions for get context and get async fd
 
 - Trace points for CQ and CM portions of the RDMA stack
 
 - mlx5 driver support for virtio-net formatted rings as RDMA raw ethernet QPs
 
 - verbs support for setting the PCI-E relaxed ordering bit on DMA traffic
   connected to a MR
 
 - A couple of bug fixes that came too late to make rc7
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl4zPwQACgkQOG33FX4g
 mxoURw//fuQmuJ7aTMH+0qrhaZUmzXOcI/WKvY0YMyYLvxolRcIO+uCL239wxezR
 9iTHPO7HeYXUQ4W8Hi/fTyuQ9hzaPOP3wgOJfQhm4QT/XDpRW0H3Mb+hTLHTUAcA
 rgKc9suAn+5BbIDOz7hEfeOTssx1wYrLsaHDc11NZ42JuG6uvPR33lhXiKWG+5tH
 2MpfeTU6BjL035dm3YZXCo+ouobpdMuvzJItYIsB2E5Nl0s91SMzsymIYiD0gb3t
 yUJ3wqPW3pchfAl8VEn+W5AHTUYYgGjmEblL8WdVq5JRrkQgQzj8QtCRT9NOPAT0
 LivCvgBrm0kscaQS2TjtG56Ojbwz8z1QPE/4shf0pj/G2lZfacYDAeaUc/2VafxY
 y/KG+3dB1DxtYY3eXJUxbB7Vpk7kfr35p5b75NdMhd2t49oPgV7EKoZMLYGzfX4S
 PtyNyNSiwx8qsRTr4lznOMswmrDLfG4XiywWgYo6NGOWyKYlARWIYBAEQZ0DPTiE
 9mqJ19gusdSdAgm8LGDInPmH6/AojGOVzYonJFWdlOtwCXGNXL4Gx02x4WYHykDG
 w+oy5NMJbU3b6+MWEagkuQNcrwqv02MT1mB/Lgv4GPm6rS0UXR7zUPDeccE50fSL
 X36k28UlftlPlaD7PeJdTOAhyBv5DxfpL5rbB2TfpUTpNxjayuU=
 =hepK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A very quiet cycle with few notable changes. Mostly the usual list of
  one or two patches to drivers changing something that isn't quite rc
  worthy. The subsystem seems to be seeing a larger number of rework and
  cleanup style patches right now, I feel that several vendors are
  prepping their drivers for new silicon.

  Summary:

   - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
     rxe, i40iw

   - Larger series doing cleanup and rework for hns and hfi1.

   - Some general reworking of the CM code to make it a little more
     understandable

   - Unify the different code paths connected to the uverbs FD scheme

   - New UAPI ioctls conversions for get context and get async fd

   - Trace points for CQ and CM portions of the RDMA stack

   - mlx5 driver support for virtio-net formatted rings as RDMA raw
     ethernet QPs

   - verbs support for setting the PCI-E relaxed ordering bit on DMA
     traffic connected to a MR

   - A couple of bug fixes that came too late to make rc7"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
  RDMA/core: Make the entire API tree static
  RDMA/efa: Mask access flags with the correct optional range
  RDMA/cma: Fix unbalanced cm_id reference count during address resolve
  RDMA/umem: Fix ib_umem_find_best_pgsz()
  IB/mlx4: Fix leak in id_map_find_del
  IB/opa_vnic: Spelling correction of 'erorr' to 'error'
  IB/hfi1: Fix logical condition in msix_request_irq
  RDMA/cm: Remove CM message structs
  RDMA/cm: Use IBA functions for complex structure members
  RDMA/cm: Use IBA functions for simple structure members
  RDMA/cm: Use IBA functions for swapping get/set acessors
  RDMA/cm: Use IBA functions for simple get/set acessors
  RDMA/cm: Add SET/GET implementations to hide IBA wire format
  RDMA/cm: Add accessors for CM_REQ transport_type
  IB/mlx5: Return the administrative GUID if exists
  RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
  IB/mlx4: Fix memory leak in add_gid error flow
  IB/mlx5: Expose RoCE accelerator counters
  RDMA/mlx5: Set relaxed ordering when requested
  RDMA/core: Add the core support field to METHOD_GET_CONTEXT
  ...
2020-01-31 14:40:36 -08:00
John Hubbard
f1f6a7dd9b mm, tree-wide: rename put_user_page*() to unpin_user_page*()
In order to provide a clearer, more symmetric API for pinning and
unpinning DMA pages.  This way, pin_user_pages*() calls match up with
unpin_user_pages*() calls, and the API is a lot closer to being
self-explanatory.

Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:38 -08:00
John Hubbard
dfa0a4fff1 IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
Convert infiniband to use the new pin_user_pages*() calls.

Also, revert earlier changes to Infiniband ODP that had it using
put_user_page().  ODP is "Case 3" in
Documentation/core-api/pin_user_pages.rst, which is to say, normal
get_user_pages() and put_page() is the API to use there.

The new pin_user_pages*() calls replace corresponding get_user_pages*()
calls, and set the FOLL_PIN flag.  The FOLL_PIN flag requires that the
caller must return the pages via put_user_page*() calls, but infiniband
was already doing that as part of an earlier commit.

Link: http://lkml.kernel.org/r/20200107224558.2362728-14-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:37 -08:00
John Hubbard
4789fcdd14 IB/umem: use get_user_pages_fast() to pin DMA pages
And get rid of the mmap_sem calls, as part of that.  Note that
get_user_pages_fast() will, if necessary, fall back to
__gup_longterm_unlocked(), which takes the mmap_sem as needed.

Link: http://lkml.kernel.org/r/20200107224558.2362728-10-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:37 -08:00
Jason Gunthorpe
8889f6fa35 RDMA/core: Make the entire API tree static
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates
the following error.

on x86_64:

 ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC':
 main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file'
 ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class'
 ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler'

This is happening because some parts of the UAPI description are not
static. This is a hold over from earlier code that relied on struct
pointers to refer to object types, now object types are referenced by
number. Remove the unused globals and add statics to the remaining UAPI
description elements.

Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete
mlx5_ib_get_devx_tree().

The compiler now trims alot more unused code, including the above
problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS.

Fixes: 7be76bef32 ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-30 16:28:52 -04:00
Parav Pandit
b4fb4cc5ba RDMA/cma: Fix unbalanced cm_id reference count during address resolve
Below commit missed the AF_IB and loopback code flow in
rdma_resolve_addr().  This leads to an unbalanced cm_id refcount in
cma_work_handler() which puts the refcount which was not incremented prior
to queuing the work.

A call trace is observed with such code flow:

 BUG: unable to handle kernel NULL pointer dereference at (null)
 [<ffffffff96b67e16>] __mutex_lock_slowpath+0x166/0x1d0
 [<ffffffff96b6715f>] mutex_lock+0x1f/0x2f
 [<ffffffffc0beabb5>] cma_work_handler+0x25/0xa0
 [<ffffffff964b9ebf>] process_one_work+0x17f/0x440
 [<ffffffff964baf56>] worker_thread+0x126/0x3c0

Hence, hold the cm_id reference when scheduling the resolve work item.

Fixes: 722c7b2bfe ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()")
Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28 14:15:23 -04:00
Artemy Kovalyov
36798d5ae1 RDMA/umem: Fix ib_umem_find_best_pgsz()
Except for the last entry, the ending iova alignment sets the maximum
possible page size as the low bits of the iova must be zero when starting
the next chunk.

Fixes: 4a35339958 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Link: https://lore.kernel.org/r/20200128135612.174820-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28 14:10:54 -04:00
Jason Gunthorpe
13e0af1801 RDMA/cm: Remove CM message structs
All accesses now use the new IBA acessor scheme, so delete the structs
entirely and generate the structures from the schema file.

Link: https://lore.kernel.org/r/20200116170037.30109-8-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:11:37 -04:00
Jason Gunthorpe
4ca662a30a RDMA/cm: Use IBA functions for complex structure members
Use a Coccinelle spatch to replace CM structure members used as
structures, arrays, or pointers with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression src;
expression len;
{struct} *msg;
@@
- memcpy(msg->{old_name}, src, len)
+ IBA_SET_MEM({new_name}, msg, src, len)
@@
{struct} *msg;
identifier x;
@@
- msg->{old_name}.x
+ IBA_GET_MEM_PTR({new_name}, msg)->x
@@
{struct} *msg;
@@
- &msg->{old_name}
+ IBA_GET_MEM_PTR({new_name}, msg)

For GIDs:
@@
{struct} *msg;
@@
- msg->{old_name}
+ *IBA_GET_MEM_PTR({new_name}, msg)

For non-GIDs:
@@
{struct} *msg;
@@
- msg->{old_name}
+ IBA_GET_MEM_PTR({new_name}, msg)

Iterated for every remaining IBA_CHECK_OFF()/IBA_CHECK_GET()
pairing. Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-7-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:06:01 -04:00
Jason Gunthorpe
91b60a7128 RDMA/cm: Use IBA functions for simple structure members
Use a Coccinelle spatch script to replace use of simple CM structure
members with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- msg->{old_name} = val
+ IBA_SET({new_name}, msg, be{bits}_to_cpu(val))
@@
{struct} *msg;
@@
- msg->{old_name}
+ cpu_to_be{bits}(IBA_GET({new_name}, msg))

Iterated for every IBA_CHECK_OFF that isn't a CM_FIELD_MLOC.

And the below iterated over all byte sizes to remove doubled byte swaps:

@@
expression val;
@@
-be{bits}_to_cpu(cpu_to_be{bits}(val))
+val

(and __be_to_cpu and ntoh varients)

Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-6-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:06:00 -04:00
Jason Gunthorpe
01adb7f46f RDMA/cm: Use IBA functions for swapping get/set acessors
Use a Coccinelle spatch script to replace CM helper functions that
return/accept BE values with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- {old_setter}(msg, val)
+ IBA_SET({new_name}, msg, be{bits}_to_cpu(val))
@@
{struct} *msg;
@@
- {old_getter}(msg)
+ cpu_to_be{bits}(IBA_GET({new_name}, msg))

Iterated for every IBA_CHECK_GET_BE()/IBA_CHECK_SET_BE() pairing.

And the below iterated over all byte sizes to remove doubled byte swaps:

@@
expression val;
@@
-be{bits}_to_cpu(cpu_to_be{bits}(val))
+val

(and __be_to_cpu and ntoh varients)

Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-5-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:06:00 -04:00
Jason Gunthorpe
b6bbee6889 RDMA/cm: Use IBA functions for simple get/set acessors
Use a Coccinelle spatch to replace CM helper functions with IBA_GET/SET
versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- {old_setter}
+ IBA_SET({new_name}, msg, val)
@@
{struct} *msg;
@@
- {old_getter}
+ IBA_GET({new_name}, msg)

Iterated for every IBA_CHECK_GET()/IBA_CHECK_GET() pairing. Touched up
with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-4-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:05:59 -04:00
Leon Romanovsky
d05d4ac4c9 RDMA/cm: Add SET/GET implementations to hide IBA wire format
There is no separation between RDMA-CM wire format as it is declared in
IBTA and kernel logic which implements needed support. Such situation
causes to many mistakes in conversion between big-endian (wire format)
and CPU format used by kernel. It also mixes RDMA core code with
combination of uXX and beXX variables.

The idea that all accesses to IBA definitions will go through special
GET/SET macros to ensure that no conversion mistakes are made. The
shifting and masking required to read the value is automatically deduced
using the field offset description from the tables in the IBA
specification.

This starts with the CM MADs described in IBTA release 1.3 volume 1.

To confirm that the new macros behave the same as the old accessors a
self-test is included in this patch.

Each macro replacing a straightforward struct field compile-time tests
that the new field has the same offsetof() and width as the old field.

For the fields with accessor functions a runtime test, the 'all ones'
value is placed in a dummy message and read back in several ways to
confirm that both approaches give identical results.

Later patches in this series delete the self test.

This creates a tested table of new field name, old field name(s) and some
meta information like BE coding for the functions which will be used in
the next patches.

Link: https://lore.kernel.org/r/20200116170037.30109-3-jgg@ziepe.ca
Link: https://lore.kernel.org/r/20191212093830.316934-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:05:59 -04:00
Jason Gunthorpe
792a7c1f2e RDMA/cm: Add accessors for CM_REQ transport_type
Access the two fields through wrappers, like all other fields, to make it
clearer what is happening.

Link: https://lore.kernel.org/r/20200116170037.30109-2-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:05:59 -04:00
Jason Gunthorpe
6b3712c024 RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
The set of entry->driver_removed is missing locking, protect it with
xa_lock() which is held by the only reader.

Otherwise readers may continue to see driver_removed = false after
rdma_user_mmap_entry_remove() returns and may continue to try and
establish new mmaps.

Fixes: 3411f9f01b ("RDMA/core: Create mmap database and cookie helper functions")
Link: https://lore.kernel.org/r/20200115202041.GA17199@ziepe.ca
Reviewed-by: Gal Pressman <galpress@amazon.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 14:48:33 -04:00
Jason Gunthorpe
e8b3a426fb Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
 user MRs through kernel ULPs as a proxy. The immediate use case is to
 allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
 MRs to be created and such MRs were not possible to create prior this
 series.
 
 The first part of this patchset extends RDMA to have special verb
 ib_reg_user_mr(). The common use case that uses this function is a
 userspace application that allocates memory for HCA access but the
 responsibility to register the memory at the HCA is on an kernel ULP.
 This ULP acts as an agent for the userspace application.
 
 The second part provides advise MR functionality for ULPs. This is
 integral part of ODP flows and used to trigger pagefaults in advance
 to prepare memory before running working set.
 
 The third part is actual user of those in-kernel APIs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
 scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
 +UcZb31auy5P8ueJYokRLhLAyRcOIAg=
 =yaHb
 -----END PGP SIGNATURE-----

Merge tag 'rds-odp-for-5.5' into rdma.git for-next

From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

* tag 'rds-odp-for-5.5':
  net/rds: Use prefetch for On-Demand-Paging MR
  net/rds: Handle ODP mr registration/unregistration
  net/rds: Detect need of On-Demand-Paging memory registration
  RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
  IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
  RDMA/mlx5: Don't fake udata for kernel path
  IB/mlx5: Add ODP WQE handlers for kernel QPs
  IB/core: Add interface to advise_mr for kernel users
  IB/core: Introduce ib_reg_user_mr
  IB: Allow calls to ib_umem_get from kernel ULPs

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21 09:55:04 -04:00
Michael Guralnik
811646998e RDMA/core: Add the core support field to METHOD_GET_CONTEXT
Add the core support field to METHOD_GET_CONTEXT, this field should
represent capabilities that are not device-specific.

Return support for optional access flags for memory regions. User-space
will use this capability to mask the optional access flags for
unsupporting kernels.

Link: https://lore.kernel.org/r/1578506740-22188-10-git-send-email-yishaih@mellanox.com
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:46 -04:00
Jason Gunthorpe
a1123418ba RDMA/uverbs: Add ioctl command to get a device context
Allow future extensions of the get context command through the uverbs
ioctl kabi.

Unlike the uverbs version this does not return an async_fd as well, that
has to be done with another command.

Link: https://lore.kernel.org/r/1578506740-22188-5-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:45 -04:00
Jason Gunthorpe
da57db2567 RDMA/core: Remove ucontext_lock from the uverbs_destry_ufile_hw() path
This lock only serializes ucontext creation. Instead of checking the
ucontext_lock during destruction hold the existing hw_destroy_rwsem during
creation, which is the standard pattern for object creation.

The simplification of locking is needed for the next patch.

Link: https://lore.kernel.org/r/1578506740-22188-4-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:45 -04:00
Jason Gunthorpe
d680e88e20 RDMA/core: Add UVERBS_METHOD_ASYNC_EVENT_ALLOC
Allow the async FD to be allocated separately from the context.

This is necessary to introduce the ioctl to create a context, as an ioctl
should only ever create a single uobject at a time.

If multiple async FDs are created then the first one is used to deliver
affiliated events from any ib_uevent_object, with all subsequent ones will
receive only unaffiliated events.

Link: https://lore.kernel.org/r/1578506740-22188-3-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:45 -04:00
Moni Shoua
87d8069f6b IB/core: Add interface to advise_mr for kernel users
Allow ULPs to call advise_mr, so they can control ODP regions
in the same way as user space applications.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:42 +02:00
Moni Shoua
33006bd4f3 IB/core: Introduce ib_reg_user_mr
Add ib_reg_user_mr() for kernel ULPs to register user MRs.

The common use case that uses this function is a userspace application
that allocates memory for HCA access but the responsibility to register
the memory at the HCA is on an kernel ULP. This ULP that acts as an agent
for the userspace application.

This function is intended to be used without a user context so vendor
drivers need to be aware of calling reg_user_mr() device operation with
udata equal to NULL.

Among all drivers, i40iw is the only driver which relies on presence
of udata, so check udata existence for that driver.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:36 +02:00
Moni Shoua
c320e527e1 IB: Allow calls to ib_umem_get from kernel ULPs
So far the assumption was that ib_umem_get() and ib_umem_odp_get()
are called from flows that start in UVERBS and therefore has a user
context. This assumption restricts flows that are initiated by ULPs
and need the service that ib_umem_get() provides.

This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
directly by relying on the fact that both UVERBS and ULPs sets that
field correctly.

Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:28 +02:00
Jason Gunthorpe
5c55cfd6a5 RDMA/core: Use READ_ONCE for ib_ufile.async_file
The writer for async_file holds the ucontext_lock, while the readers are
left unlocked. Most readers rely on an implicit locking, either by having
a uobject (which cannot be created before a context) or by holding the
ib_ufile kref.

However ib_uverbs_free_hw_resources() has no implicit lock and has a
possible race. Make this all clear and sane by using READ_ONCE
consistently.

Link: https://lore.kernel.org/r/1578504126-9400-15-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00
Jason Gunthorpe
3e032c0e92 RDMA/core: Make ib_uverbs_async_event_file into a uobject
This makes async events aligned with completion events as both are full
uobjects of FD type and use the same uobject lifecycle.

A bunch of duplicate code is consolidated and the general flow between the
two FDs is now very similar.

Link: https://lore.kernel.org/r/1578504126-9400-14-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00