1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

2399 commits

Author SHA1 Message Date
Matan Barak
dbf727de74 IB/core: Use GID table in AH creation and dmac resolution
Previously, vlan id and source MAC were used from QP attributes. Since
the net device is now stored in the GID attributes, they could be used
instead of getting this information from the QP attributes.

IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed
because there is no known libibverbs that uses them.

This commit also modifies the vendors (mlx4, ocrdma) drivers in order
to use the new approach.

ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com>

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak
99b27e3b5d IB/cache: Add ib_find_gid_by_filter cache API
GID cache API users might want to search for GIDs with specific
attributes rather than just specifying GID, net device and port.
This is used in a later patch, where we find the sgid index by
L2 Ethernet attributes.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak
abae1b71dd IB/cma: cma_validate_port should verify the port and netdevice
Previously, cma_validate_port searched for GIDs in IB cache and then
tried to verify the found port. This could fail when there are
identical GIDs on both ports. In addition, netdevice should be taken
into account when searching the GID table.
Fixing cma_validate_port to search only the relevant port's cache
and netdevice.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak
c2c6ff1345 IB/cm: cm_init_av_by_path should find a GID by its netdevice
Previously, the CM has searched the cache for any sgid_index whose
GID matches the path's GID. Since the path record stores the net
device, the CM should now search only for GIDs which originated from
this net device.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak
ba36e37fd3 IB/core: Add netdev to path record
In order to find the sgid_index, one could just query the IB cache
with the correct GID and netdevice. Therefore, instead of storing
the L2 attributes directly in the path, we only store the
ifindex and net and use them later to get the sgid_index.
The vlan_id and smac L2 attributes are removed in a later patch.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak
d300ec528b IB/core: Expose and rename ib_find_cached_gid_by_port cache API
Sometime consumers might want to search for a GID in a specific port.
For example, when a WC arrives and we want to search the GID
that matches that port - it's better to search only the relevant
port.
Exposing and renaming ib_cache_gid_find_by_port in order to match
the naming convention of the module.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak
55ee3ab2e4 IB/core: Add netdev and gid attributes paramteres to cache
Adding an ability to query the IB cache by a netdev and get the
attributes of a GID. These parameters are necessary in order to
successfully resolve the required GID (when the netdevice is known)
and get the Ethernet L2 attributes from a GID.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Eran Ben Elisha
ddf9529be1 IB/core: Allow setting create flags in QP init attribute
Allow setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK at create_flags in
ib_uverbs_create_qp_ex.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:46 -04:00
Eran Ben Elisha
6d8a74972b IB/core: Extend ib_uverbs_create_qp
ib_uverbs_ex_create_qp follows the extension verbs
mechanism. New features (for example, QP creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:46 -04:00
Arnd Bergmann
5d1e623591 IB/core: avoid 32-bit warning
The INIT_UDATA() macro requires a pointer or unsigned long argument for
both input and output buffer, and all callers had a cast from when
the code was merged until a recent restructuring, so now we get

core/uverbs_cmd.c: In function 'ib_uverbs_create_cq':
core/uverbs_cmd.c:1481:66: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

This makes the code behave as before by adding back the cast to
unsigned long.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 565197dd8f ("IB/core: Extend ib_uverbs_create_cq")
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 16:56:44 -04:00
Doron Tsur
0ca81a2840 IB/cm: Fix rb-tree duplicate free and use-after-free
ib_send_cm_sidr_rep could sometimes erase the node from the sidr
(depending on errors in the process). Since ib_send_cm_sidr_rep is
called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
could be either erased from the rb_tree twice or not erased at all.
Fixing that by making sure it's erased only once before freeing
cm_id_priv.

Fixes: a977049dac ('[PATCH] IB: Add the kernel CM implementation')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 15:43:12 -04:00
Haggai Eran
ab3964ad2a IB/cma: Use inner P_Key to determine netdev
When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
was decided that it is best to use the P_Key value in the packet headers.
However, the mlx5 and ipath drivers are currently unable to send correct
P_Key values in GMP headers. They always send using a single P_Key that is
set during the GSI QP initialization.

Change the rdma_cm code to look at the P_Key value that is part of the
packet payload as a workaround. Once the drivers are fixed this patch can
be reverted.

Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20 14:16:51 -04:00
Sasha Levin
0174b381ca IB/ucma: check workqueue allocation before usage
Allocating a workqueue might fail, which wasn't checked so far and would
lead to NULL ptr derefs when an attempt to use it was made.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20 13:35:51 -04:00
Haggai Eran
b3b51f9f6f IB/cma: Potential NULL dereference in cma_id_from_event
If the lookup of a listening ID failed for an AF_IB request, the code
would try to call dev_put() on a NULL net_dev.

Fixes: be688195bd ("IB/cma: Fix net_dev reference leak with failed
requests")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20 13:13:42 -04:00
Matan Barak
3909642034 IB/core: Fix use after free of ifa
When using ifup/ifdown while executing enum_netdev_ipv4_ips,
ifa could become invalid and cause use after free error.
Fixing it by protecting with RCU lock.

Fixes: 03db3a2d81 ('IB/core: Add RoCE GID table management')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20 13:10:46 -04:00
Doron Tsur
17b38fb890 IB/core: Fix memory corruption in ib_cache_gid_set_default_gid
When ib_cache_gid_set_default_gid is called from several threads,
updating the table could make find_gid fail, therefore a negative
index will be retruned and an invalid table entry will be used.
Locking find_gid as well fixes this problem.

Fixes: 03db3a2d81 ('IB/core: Add RoCE GID table management')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-15 12:35:54 -04:00
Christoph Hellwig
e622f2f4ad IB: split struct ib_send_wr
This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr.  This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old):	96

sizeof(struct ib_send_wr):		48
sizeof(struct ib_rdma_wr):		64
sizeof(struct ib_atomic_wr):		96
sizeof(struct ib_ud_wr):		88
sizeof(struct ib_fast_reg_wr):		88
sizeof(struct ib_bind_mw_wr):		96
sizeof(struct ib_sig_handover_wr):	80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr):		64

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: Haggai Eran <haggaie@mellanox.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
2015-10-08 11:09:10 +01:00
Haggai Eran
b8cab5dab1 IB/cma: Accept connection without a valid netdev on RoCE
The netdev checks recently added to RDMA CM expect a valid netdev to be
found for both InfiniBand and RoCE, but the code that find a netdev is
only implemented for InfiniBand.

Currently RoCE doesn't provide an API to find the netdev matching a
given set of parameters, so this patch just disables the netdev enforcement
for each incoming connections when the link layer is RoCE.

Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to RDMA CM")
Reported-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-06 14:25:16 -04:00
Linus Torvalds
26d2177e97 Changes for 4.3
- Create drivers/staging/rdma
 - Move amso1100 driver to staging/rdma and schedule for deletion
 - Move ipath driver to staging/rdma and schedule for deletion
 - Add hfi1 driver to staging/rdma and set TODO for move to regular tree
 - Initial support for namespaces to be used on RDMA devices
 - Add RoCE GID table handling to the RDMA core caching code
 - Infrastructure to support handling of devices with differing
   read and write scatter gather capabilities
 - Various iSER updates
 - Kill off unsafe usage of global mr registrations
 - Update SRP driver
 - Misc. mlx4 driver updates
 - Support for the mr_alloc verb
 - Support for a netlink interface between kernel and user space cache
   daemon to speed path record queries and route resolution
 - Ininitial support for safe hot removal of verbs devices
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV7v8wAAoJELgmozMOVy/d2dcP/3PXnGFPgFGJODKE6VCZtTvj
 nooNXRKXjxv470UT5DiAX7SNcBxzzS7Zl/Lj+831H9iNXUyzuH31KtBOAZ3W03vZ
 yXwCB2caOStSldTRSUUvPe2aIFPnyNmSpC4i6XcJLJMCFijKmxin5pAo8qE44BQU
 yjhT+wC9P6LL5wZXsn/nFIMLjOFfu0WBFHNp3gs5j59paxlx5VeIAZk16aQZH135
 m7YCyicwrS8iyWQl2bEXRMon2vlCHlX2RHmOJ4f/P5I0quNcGF2+d8Yxa+K1VyC5
 zcb3OBezz+wZtvh16yhsDfSPqHWirljwID2VzOgRSzTJWvQjju8VkwHtkq6bYoBW
 egIxGCHcGWsD0R5iBXLYr/tB+BmjbDObSm0AsR4+JvSShkeVA1IpeoO+19162ixE
 n6CQnk2jCee8KXeIN4PoIKsjRSbIECM0JliWPLoIpuTuEhhpajftlSLgL5hf1dzp
 HrSy6fXmmoRj7wlTa7DnYIC3X+ffwckB8/t1zMAm2sKnIFUTjtQXF7upNiiyWk4L
 /T1QEzJ2bLQckQ9yY4v528SvBQwA4Dy1amIQB7SU8+2S//bYdUvhysWPkdKC4oOT
 WlqS5PFDCI31MvNbbM3rUbMAD8eBAR8ACw9ZpGI/Rffm5FEX5W3LoxA8gfEBRuqt
 30ZYFuW8evTL+YQcaV65
 =EHLg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull inifiniband/rdma updates from Doug Ledford:
 "This is a fairly sizeable set of changes.  I've put them through a
  decent amount of testing prior to sending the pull request due to
  that.

  There are still a few fixups that I know are coming, but I wanted to
  go ahead and get the big, sizable chunk into your hands sooner rather
  than waiting for those last few fixups.

  Of note is the fact that this creates what is intended to be a
  temporary area in the drivers/staging tree specifically for some
  cleanups and additions that are coming for the RDMA stack.  We
  deprecated two drivers (ipath and amso1100) and are waiting to hear
  back if we can deprecate another one (ehca).  We also put Intel's new
  hfi1 driver into this area because it needs to be refactored and a
  transfer library created out of the factored out code, and then it and
  the qib driver and the soft-roce driver should all be modified to use
  that library.

  I expect drivers/staging/rdma to be around for three or four kernel
  releases and then to go away as all of the work is completed and final
  deletions of deprecated drivers are done.

  Summary of changes for 4.3:

   - Create drivers/staging/rdma
   - Move amso1100 driver to staging/rdma and schedule for deletion
   - Move ipath driver to staging/rdma and schedule for deletion
   - Add hfi1 driver to staging/rdma and set TODO for move to regular
     tree
   - Initial support for namespaces to be used on RDMA devices
   - Add RoCE GID table handling to the RDMA core caching code
   - Infrastructure to support handling of devices with differing read
     and write scatter gather capabilities
   - Various iSER updates
   - Kill off unsafe usage of global mr registrations
   - Update SRP driver
   - Misc  mlx4 driver updates
   - Support for the mr_alloc verb
   - Support for a netlink interface between kernel and user space cache
     daemon to speed path record queries and route resolution
   - Ininitial support for safe hot removal of verbs devices"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits)
  IB/ipoib: Suppress warning for send only join failures
  IB/ipoib: Clean up send-only multicast joins
  IB/srp: Fix possible protection fault
  IB/core: Move SM class defines from ib_mad.h to ib_smi.h
  IB/core: Remove unnecessary defines from ib_mad.h
  IB/hfi1: Add PSM2 user space header to header_install
  IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
  mlx5: Fix incorrect wc pkey_index assignment for GSI messages
  IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
  IB/uverbs: reject invalid or unknown opcodes
  IB/cxgb4: Fix if statement in pick_local_ip6adddrs
  IB/sa: Fix rdma netlink message flags
  IB/ucma: HW Device hot-removal support
  IB/mlx4_ib: Disassociate support
  IB/uverbs: Enable device removal when there are active user space applications
  IB/uverbs: Explicitly pass ib_dev to uverbs commands
  IB/uverbs: Fix race between ib_uverbs_open and remove_one
  IB/uverbs: Fix reference counting usage of event files
  IB/core: Make ib_dealloc_pd return void
  IB/srp: Create an insecure all physical rkey only if needed
  ...
2015-09-09 08:33:31 -07:00
Christoph Hellwig
b632ffa7ce IB/uverbs: reject invalid or unknown opcodes
We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure.  Reject all those not explicitly handled so that we
can't pass invalid information to drivers.

Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 14:25:24 -04:00
Kaike Wan
ba13b5f8f8 IB/sa: Fix rdma netlink message flags
The flags to ibnl_put_msg should be NLM_F_REQUEST instead of GFP_KERNEL.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-02 13:58:54 -04:00
Yishai Hadas
e1c30298cc IB/ucma: HW Device hot-removal support
Currently, IB/cma remove_one flow blocks until all user descriptor managed by
IB/ucma are released. This prevents hot-removal of IB devices. This patch
allows IB/cma to remove devices regardless of user space activity. Upon getting
the RDMA_CM_EVENT_DEVICE_REMOVAL event we close all the underlying HW resources
for the given ucontext. The ucontext itself is still alive till its explicit
destroying by its creator.

Running applications at that time will have some zombie device, further
operations may fail.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:41 -04:00
Yishai Hadas
036b106357 IB/uverbs: Enable device removal when there are active user space applications
Enables the uverbs_remove_one to succeed despite the fact that there are
running IB applications working with the given ib device.  This
functionality enables a HW device to be unbind/reset despite the fact that
there are running user space applications using it.

It exposes a new IB kernel API named 'disassociate_ucontext' which lets
a driver detaching its HW resources from a given user context without
crashing/terminating the application. In case a driver implemented the
above API and registered with ib_uverb there will be no dependency between its
device to its uverbs_device. Upon calling remove_one of ib_uverbs the call
should return after disassociating the open HW resources without waiting to
clients disconnecting. In case driver didn't implement this API there will be no
change to current behaviour and uverbs_remove_one will return only when last
client has disconnected and reference count on uverbs device became 0.

In case the lower driver device was removed any application will
continue working over some zombie HCA, further calls will ended with an
immediate error.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:40 -04:00
Yishai Hadas
057aec0d23 IB/uverbs: Explicitly pass ib_dev to uverbs commands
Done in preparation for deploying RCU for the device removal
flow. Allows isolating the RCU handling to the uverb_main layer and
keeping the uverbs_cmd code as is.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:40 -04:00
Yishai Hadas
35d4a0b63d IB/uverbs: Fix race between ib_uverbs_open and remove_one
Fixes: 2a72f21226 ("IB/uverbs: Remove dev_table")

Before this commit there was a device look-up table that was protected
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
it was dropped and container_of was used instead, it enabled the race
with remove_one as dev might be freed just after:
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
before the kref_get.

In addition, this buggy patch added some dead code as
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying "the open method
will either immediately run -ENXIO" is wrong as it can never happen.

The solution follows Jason Gunthorpe suggestion from below URL:
https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html

cdev will hold a kref on the parent (the containing structure,
ib_uverbs_device) and only when that kref is released it is
guaranteed that open will never be called again.

In addition, fixes the active count scheme to use an atomic
not a kref to prevent WARN_ON as pointed by above comment
from Jason.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:40 -04:00
Yishai Hadas
03c40442a0 IB/uverbs: Fix reference counting usage of event files
Fix the reference counting usage to be handled in the event file
creation/destruction function, instead of being done by the caller.
This is done for both async/non-async event files.

Based on Jason Gunthorpe report at https://www.mail-archive.com/
linux-rdma@vger.kernel.org/msg24680.html:
"The existing code for this is broken, in ib_uverbs_get_context all
the error paths between ib_uverbs_alloc_event_file and the
kref_get(file->ref) are wrong - this will result in fput() which will
call ib_uverbs_event_close, which will try to do kref_put and
ib_unregister_event_handler - which are no longer paired."

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:39 -04:00
Jason Gunthorpe
7dd78647a2 IB/core: Make ib_dealloc_pd return void
The majority of callers never check the return value, and even if they
did, they can't do anything about a failure.

All possible failure cases represent a bug in the caller, so just
WARN_ON inside the function instead.

This fixes a few random errors:
 net/rd/iw.c infinite loops while it fails. (racing with EBUSY?)

This also lays the ground work to get rid of error return from the
drivers. Most drivers do not error, the few that do are broken since
it cannot be handled.

Since uverbs can legitimately make use of EBUSY, open code the
check.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:39 -04:00
Jason Gunthorpe
4be90bc60d IB/mad: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:33 -04:00
Jason Gunthorpe
96249d70dd IB/core: Guarantee that a local_dma_lkey is available
Every single ULP requires a local_dma_lkey to do anything with
a QP, so let us ensure one exists for every PD created.

If the driver can supply a global local_dma_lkey then use that, otherwise
ask the driver to create a local use all physical memory MR associated
with the new PD.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:33 -04:00
Kaike Wan
2ca546b92a IB/sa: Route SA pathrecord query through netlink
This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:26 -04:00
Kaike Wan
5d2657708e IB/sa: Allocate SA query with kzalloc
Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:25 -04:00
Kaike Wan
bc10ed7d3d IB/core: Add rdma netlink helper functions
This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:25 -04:00
Doug Ledford
b8071ad893 IB/core: Remove needless bracketization
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:21 -04:00
Moni Shoua
e26be1bfef IB/mlx4: Implement ib_device callbacks
get_netdev: get the net_device on the physical port of the IB transport port. In
port aggregation mode it is required to return the netdev of the active port.

modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
the harsware GID table. It is possible that indexes in cahce and hardware tables
won't match so a translation is required when modifying a QP or creating an
address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:20 -04:00
Matan Barak
238fdf48f2 IB/core: Add RoCE table bonding support
Handling bonding and other devices require us to all all GIDs of the
net-devices which are upper-devices of the RoCE port related
net-device.

Active-backup configurations imposes even more challenges as the
default GID should only be set on the active devices (this is
necessary as otherwise the same MAC could be used for several
slaves and thus several slaves will have identical GIDs).

Managing these configurations are done by listening to:
(a) NETDEV_CHANGEUPPER event
	(1) if a related net-device is linked, delete all inactive
	    slaves default GIDs and add the upper device GIDs.
	(2) if a related net-device is unlinked, delete all upper GIDs
	    and add the default GIDs.
(b) NETDEV_BONDING_FAILOVER:
	(1) delete the bond GIDs from inactive slaves
	(2) delete the inactive slave's default GIDs
	(3) Add the bond GIDs to the active slave.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:20 -04:00
Dan Carpenter
98d25afa97 IB/core: missing curly braces in ib_find_gid()
Smatch says that, based on the indenting, we should probably add curly
braces here.

Fixes: 03db3a2d81 ('IB/core: Add RoCE GID table management')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:09 -04:00
Matan Barak
03db3a2d81 IB/core: Add RoCE GID table management
RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.

Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.

In order to populate the GID table, we listen for events:

(a) netdev up/down/change_addr events - if a netdev is built onto
    our RoCE device, we need to add/delete its IPs. This involves
    adding all GIDs related to this ndev, add default GIDs, etc.

(b) inet events - add new GIDs (according to the IP addresses)
    to the table.

For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.

RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.

RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).

Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.

While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.

In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.

When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).

The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.

The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:50 -04:00
Jason Gunthorpe
55aeed0654 IB/core: Make ib_alloc_device init the kobject
This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.

Consequently ib_device_release is now guaranteed to be called in
all situations and we needn't duplicate its kfrees on error paths.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:50 -04:00
Sagi Grimberg
d9f272c523 IB/core: Drop ib_alloc_fast_reg_mr
Fully replaced by a more generic and suitable
ib_alloc_mr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:49 -04:00
Sagi Grimberg
9bee178b4f IB: Modify ib_create_mr API
Use ib_alloc_mr with specific parameters.
Change the existing callers.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:44 -04:00
Sagi Grimberg
8b91ffc1cf IB/core: Get rid of redundant verb ib_destroy_mr
This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.

And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).

And, fixup the only callers (iser/isert) accordingly.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:44 -04:00
Haggai Eran
be688195bd IB/cma: Fix net_dev reference leak with failed requests
When no matching listening ID is found for a given request, the net_dev
that was used to find the request isn't released.

Fixes: 0b3ca768fc ("IB/cma: Use found net_dev for passive connections")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:28 -04:00
Haggai Eran
73fec7fd04 IB/cm: Remove compare_data checks
Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:24 -04:00
Haggai Eran
51efe394bc IB/cma: Share ib_cm_ids between rdma_cm_ids
Use ib_cm_insert_listen to create listening IB CM IDs or share existing
ones if needed. When given a request on a specific CM ID, the code now
matches the request to the RDMA CM ID based on the request parameters, so
it no longer needs to rely on the ib_cm's private data matching
capabilities.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:24 -04:00
Haggai Eran
0b3ca768fc IB/cma: Use found net_dev for passive connections
When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:24 -04:00
Haggai Eran
f887f2ac87 IB/cma: Validate routing of incoming requests
Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:23 -04:00
Haggai Eran
4c21b5bcef IB/cma: Add net_dev and private data checks to RDMA CM
Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:23 -04:00
Haggai Eran
24cad9a7e8 IB/cm: Expose BTH P_Key in CM and SIDR request events
The rdma_cm module will later use the P_Key from the BTH to de-mux
requests.

See discussion at:
  http://www.spinics.net/lists/netdev/msg336067.html

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:23 -04:00
Haggai Eran
aac978e152 IB/cma: Helper functions to access port space IDRs
Add helper functions to access the IDRs by port-space and port number.

Pass around the port-space enum in cma.c instead of using pointers to
port-space IDRs.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:23 -04:00
Haggai Eran
0c505f70a2 IB/cma: Refactor RDMA IP CM private-data parsing code
When receiving a connection request, rdma_cm needs to associate the request
with a network device, in order to disambiguate requests. To do this, it
needs to know the request's destination IP. For this the module needs to
allow getting this information from the private data in the request packet,
instead of relying on the information already being in the listening RDMA
CM ID.

When creating a new incoming connection ID, the code in
cma_save_ip{4,6}_info can no longer rely on the listener's private data to
find the port number, so it reads it from the requested service ID.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:22 -04:00