1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

739 commits

Author SHA1 Message Date
Jason Gunthorpe
33331a728c Linux 6.0
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmM5/fMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+Y0H/jG0HQjUHfIOGjPQ
 T33eaF5POoZKzZmsTNITbtya7B3/OTZZRGdSq9B8t5tElyPnZrdIfaXds17mCI8y
 5DJUEQGdv6fqRttYLGKf6rk1kzABhhaTS8n9BgDFEsmvdqwG4AV6dLQr3HL09gTV
 Lu+Is86RPwmpgH0Z9u7DyFCF8yLjefyu0vl6eFm/SXjCE8gQM/LZQSi9mv5/YDxa
 MVKlsZKIkQm6P8a6r8wbGedKYBped4+4gYedsf/IcS0lvKdLIs/P7YgR5wKklSbM
 tqECvBOmKq1Fwj/oxH+fx0KLX/ZD1xxQwoQd+a9DOSo+BuPBt6KGojYT9gQRyFJp
 R7gyUCo=
 =2UOD
 -----END PGP SIGNATURE-----

Merge tag 'v6.0' into rdma.git for-next

Trvial merge conflicts against rdma.git for-rc resolved matching
linux-next:
            drivers/infiniband/hw/hns/hns_roce_hw_v2.c
            drivers/infiniband/hw/hns/hns_roce_main.c

https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-06 19:48:45 -03:00
Maher Sanalla
9ca05b0f27 RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile
When the RDMA auxiliary driver probes, it sets its profile based on
devlink driverinit value. The latter might not be in sync with FW yet
(In case devlink reload is not performed), thus causing a mismatch
between RDMA driver and FW. This results in the following FW syndrome
when the RDMA driver tries to adjust RoCE state, which fails the probe:

"0xC1F678 | modify_nic_vport_context: roce_en set on a vport that
doesn't support roce"

To prevent this, select the PF profile based on FW RoCE capability
instead of relying on devlink driverinit value.
To provide backward compatibility of the RoCE disable feature, on older
FW's where roce_rw is not set (FW RoCE capability is read-only), keep
the current behavior e.g., rely on devlink driverinit value.

Fixes: fbfa97b4d7 ("net/mlx5: Disable roce at HCA level")
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-05 14:49:46 +03:00
Daisuke Matsuda
ca7ef7adad IB/mlx5: Remove duplicate header inclusion related to ODP
rdma/ib_umem.h and rdma/ib_verbs.h are included by rdma/ib_umem_odp.h.
This patch removes the redundant entries.

Link: https://lore.kernel.org/r/20220823025131.862811-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23 11:22:13 +03:00
Mark Bloch
4b83c3caf2 RDMA/mlx5: Use the proper number of ports
The cited commit allowed the driver to operate over HCAs that have
4 physical ports. Use the number of ports of the RDMA device in the for
loop instead of using the struct size.

Fixes: 4cd14d44b1 ("net/mlx5: Support devices with more than 2 ports")
Link: https://lore.kernel.org/r/a54a56c2ede16044a29d119209b35189c662ac72.1659944855.git.leonro@nvidia.com
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16 16:14:19 +03:00
Aharon Landau
13ad1125b9 RDMA/mlx5: Don't compare mkey tags in DEVX indirect mkey
According to the ib spec:
If the CI supports the Base Memory Management Extensions defined in this
specification, the L_Key format must consist of:
24 bit index in the most significant bits of the R_Key, and
8 bit key in the least significant bits of the R_Key
Through a successful Allocate L_Key verb invocation, the CI must let the
consumer own the key portion of the returned R_Key

Therefore, when creating a mkey using DEVX, the consumer is allowed to
change the key part. The kernel should compare only the index part of a
R_Key to determine equality with another R_Key.

Adding capability in order not to break backward compatibility.

Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Link: https://lore.kernel.org/r/3d669aacea85a3a15c3b3b953b3eaba3f80ef9be.1659255945.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16 15:21:52 +03:00
Aharon Landau
0113780870 RDMA/mlx5: Rename the mkey cache variables and functions
After replacing the MR cache with an Mkey cache, rename the variables and
functions to fit the new meaning.

Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27 14:45:48 -03:00
Linus Torvalds
780d8ce716 v5.19 pull request
Small collection of incremental improvement patches:
 
 - Minor code cleanup patches, comment improvements, etc from static tools
 
 - Clean the some of the kernel caps, reducing the historical stealth uAPI
   leftovers
 
 - Bug fixes and minor changes for rdmavt, hns, rxe, irdma
 
 - Remove unimplemented cruft from rxe
 
 - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer
 
 - flush_workqueue(system_unbound_wq) removal
 
 - Ensure rxe waits for objects to be unused before allowing the core to
   free them
 
 - Several rc quality bug fixes for hfi1
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYo+NxgAKCRCFwuHvBreF
 YbqSAQDJ+QolaATUvOQUPLbuLopUCJLe95VS15Kl3SNXiVUUFAEA8DLL1s6+WShd
 AgypUxGHipx5BAytrn45/WiwuDeEbQ8=
 =jgTl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Small collection of incremental improvement patches:

   - Minor code cleanup patches, comment improvements, etc from static
     tools

   - Clean the some of the kernel caps, reducing the historical stealth
     uAPI leftovers

   - Bug fixes and minor changes for rdmavt, hns, rxe, irdma

   - Remove unimplemented cruft from rxe

   - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
     layer

   - flush_workqueue(system_unbound_wq) removal

   - Ensure rxe waits for objects to be unused before allowing the core
     to free them

   - Several rc quality bug fixes for hfi1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
  RDMA/rtrs-clt: Fix one kernel-doc comment
  RDMA/hfi1: Remove all traces of diagpkt support
  RDMA/hfi1: Consolidate software versions
  RDMA/hfi1: Remove pointless driver version
  RDMA/hfi1: Fix potential integer multiplication overflow errors
  RDMA/hfi1: Prevent panic when SDMA is disabled
  RDMA/hfi1: Prevent use of lock before it is initialized
  RDMA/rxe: Fix an error handling path in rxe_get_mcg()
  IB/core: Fix typo in comment
  RDMA/core: Fix typo in comment
  IB/hf1: Fix typo in comment
  IB/qib: Fix typo in comment
  IB/iser: Fix typo in comment
  RDMA/mlx4: Avoid flush_scheduled_work() usage
  IB/isert: Avoid flush_scheduled_work() usage
  RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
  RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
  RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
  RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
  RDMA/irdma: Add SW mechanism to generate completions on error
  ...
2022-05-26 21:08:40 -07:00
Mark Bloch
34a30d7635 net/mlx5: Lag, expose number of lag ports
Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:00 -07:00
Aharon Landau
04876c12c1 RDMA/mlx5: Move init and cleanup of UMR to umr.c
The first patch in a series to split UMR logic to a dedicated file.  As a
start, move the init and cleanup of UMR resources to umr.c.

Link: https://lore.kernel.org/r/849e632dd1945a2534712a320cc5779f2149ba96.1649747695.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25 11:52:59 -03:00
Jason Gunthorpe
7bf5323b05 Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
Mellanox shared branch that includes:

 * Removal of FPGA TLS code https://lore.kernel.org/all/cover.1649073691.git.leonro@nvidia.com

  Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code
  is unmaintained, untested and not in-use by any upstream/distro oriented
  customers. In order to reduce code complexity, drop the kernel code,
  clean build config options and delete useless kTLS vs. TLS separation.

  [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf

 * Removal of FPGA IPsec code https://lore.kernel.org/all/cover.1649232994.git.leonro@nvidia.com

  Together with FPGA TLS, the IPsec went to EOL state in the November of
  2019 [1]. Exactly like FPGA TLS, no active customers exist for this
  upstream code and all the complexity around that area can be deleted.

  [2] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf

 * Fix to undefined behavior from Borislav https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de
====================

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Remove not-implemented IPsec capabilities
  net/mlx5: Remove ipsec_ops function table
  net/mlx5: Reduce kconfig complexity while building crypto support
  net/mlx5: Move IPsec file to relevant directory
  net/mlx5: Remove not-needed IPsec config
  net/mlx5: Align flow steering allocation namespace to common style
  net/mlx5: Unify device IPsec capabilities check
  net/mlx5: Remove useless IPsec device checks
  net/mlx5: Remove ipsec vs. ipsec offload file separation
  RDMA/core: Delete IPsec flow action logic from the core
  RDMA/mlx5: Drop crypto flow steering API
  RDMA/mlx5: Delete never supported IPsec flow action
  net/mlx5: Remove FPGA ipsec specific statistics
  net/mlx5: Remove XFRM no_trailer flag
  net/mlx5: Remove not-used IDA field from IPsec struct
  net/mlx5: Delete metadata handling logic
  net/mlx5_fpga: Drop INNOVA IPsec support
  IB/mlx5: Fix undefined behavior due to shift overflowing the constant
  net/mlx5: Cleanup kTLS function names and their exposure
  net/mlx5: Remove tls vs. ktls separation as it is the same
  net/mlx5: Remove indirection in TLS build
  net/mlx5: Reliably return TLS device capabilities
  net/mlx5_fpga: Drop INNOVA TLS support

Link: https://lore.kernel.org/r/20220409055303.1223644-1-leon@kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-12 10:43:36 -03:00
Leon Romanovsky
de8bdb4769 RDMA/mlx5: Drop crypto flow steering API
The mlx5 flow steering crypto API was intended to be used in FPGA
devices, which is not supported for years already. The removal of
mlx5 crypto FPGA code together with inability to configure encryption
keys makes the low steering API completely unusable.

So delete the code, so any ESP flow steering requests will fail with
not supported error, as it is happening now anyway as no device support
this type of API.

Link: https://lore.kernel.org/r/634a5face7734381463d809bfb89850f6998deac.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:25:06 +03:00
Leon Romanovsky
74ec29bdb0 RDMA/mlx5: Delete never supported IPsec flow action
The IPSEC_REQUIRED_METADATA capability bit is never set, and can be
safely removed from the flow action flags.

Link: https://lore.kernel.org/r/697cd60bd5c9b6a004c449c1a41c2798fac844ff.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:23:47 +03:00
Jason Gunthorpe
e945c653c8 RDMA: Split kernel-only global device caps from uverbs device caps
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.

This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.

Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.

Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-06 15:02:13 -03:00
Moshe Shemesh
66771a1c72 net/mlx5: Move debugfs entries to separate struct
Move the debugfs entry pointers under priv to their own struct.
Add get function for device debugfs root.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 13:33:02 -08:00
David E. Box
27963d3da4 RDMA/irdma: Use auxiliary_device driver data helpers
Use auxiliary_get_drvdata and auxiliary_set_drvdata helpers.

Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20211221235852.323752-2-david.e.box@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22 13:59:01 +01:00
Meir Lichtinger
d2c8a1554c IB/mlx5: Enable UAR to have DevX UID
UID field was added to alloc_uar and dealloc_uar PRM command, to specify
DevX UID for UAR. This change enables firmware validating user access to
its own UAR resources.

For the kernel allocated UARs the UID will stay 0 as of today.

Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-09-28 18:31:21 +03:00
Linus Torvalds
23852bec53 RDMA v5.15 merge window Pull Request
- Various cleanup and small features for rtrs
 
 - kmap_local_page() conversions
 
 - Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
 
 - Cache the IB subnet prefix
 
 - Rework how CRC is calcuated in rxe
 
 - Clean reference counting in iwpm's netlink
 
 - Pull object allocation and lifecycle for user QPs to the uverbs core
   code
 
 - Several small hns features and continued general code cleanups
 
 - Fix the scatterlist confusion of orig_nents/nents introduced in an
   earlier patch creating the append operation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmEudRgACgkQOG33FX4g
 mxraJA//c6bMxrrTVrzmrtrkyYD4tYWE8RDfgvoyZtleZnnEOJeunCQWakQrpJSv
 ukSnOGCA3PtnmRMdV54f/11YJ/7otxOJodSO7jWsIoBrqG/lISAdX8mn2iHhrvJ0
 dIaFEFPLy0WqoMLCJVIYIupR0IStVHb/mWx0uYL4XnnoYKyt7f7K5JMZpNWMhDN2
 ieJw0jfrvEYm8pipWuxUvB16XARlzAWQrjqLpMRI+jFRpbDVBY21dz2/LJvOJPrA
 LcQ+XXsV/F659ibOAGm6bU4BMda8fE6Lw90B/gmhSswJ205NrdziF5cNYHP0QxcN
 oMjrjSWWHc9GEE7MTipC2AH8e36qob16Q7CK+zHEJ+ds7R6/O/8XmED1L8/KFpNA
 FGqnjxnxsl1y27mUegfj1Hh8PfoDp2oVq0lmpEw0CYo4cfVzHSMRrbTR//XmW628
 Ie/mJddpFK4oLk+QkSNjSLrnxOvdTkdA58PU0i84S5eUVMNm41jJDkxg2J7vp0Zn
 sclZsclhUQ9oJ5Q2so81JMWxu4JDn7IByXL0ULBaa6xwQTiVEnyvSxSuPlflhLRW
 0vI2ylATYKyWkQqyX7VyWecZJzwhwZj5gMMWmoGsij8bkZhQ/VaQMaesByzSth+h
 NV5UAYax4GqyOQ/tg/tqT6e5nrI1zof87H64XdTCBpJ7kFyQ/oA=
 =ZwOe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This is quite a small cycle, no major series stands out. The HNS and
  rxe drivers saw the most activity this cycle, with rxe being broken
  for a good chunk of time. The significant deleted line count is due to
  a SPDX cleanup series.

  Summary:

   - Various cleanup and small features for rtrs

   - kmap_local_page() conversions

   - Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns

   - Cache the IB subnet prefix

   - Rework how CRC is calcuated in rxe

   - Clean reference counting in iwpm's netlink

   - Pull object allocation and lifecycle for user QPs to the uverbs
     core code

   - Several small hns features and continued general code cleanups

   - Fix the scatterlist confusion of orig_nents/nents introduced in an
     earlier patch creating the append operation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (90 commits)
  RDMA/mlx5: Relax DCS QP creation checks
  RDMA/hns: Delete unnecessary blank lines.
  RDMA/hns: Encapsulate the qp db as a function
  RDMA/hns: Adjust the order in which irq are requested and enabled
  RDMA/hns: Remove RST2RST error prints for hw v1
  RDMA/hns: Remove dqpn filling when modify qp from Init to Init
  RDMA/hns: Fix QP's resp incomplete assignment
  RDMA/hns: Fix query destination qpn
  RDMA/hfi1: Convert to SPDX identifier
  IB/rdmavt: Convert to SPDX identifier
  RDMA/hns: Bugfix for incorrect association between dip_idx and dgid
  RDMA/hns: Bugfix for the missing assignment for dip_idx
  RDMA/hns: Bugfix for data type of dip_idx
  RDMA/hns: Fix incorrect lsn field
  RDMA/irdma: Remove the repeated declaration
  RDMA/core/sa_query: Retry SA queries
  RDMA: Use the sg_table directly and remove the opencoded version from umem
  lib/scatterlist: Fix wrong update of orig_nents
  lib/scatterlist: Provide a dedicated function to support table append
  RDMA/hns: Delete unused hns bitmap interface
  ...
2021-09-02 14:47:21 -07:00
Jakub Kicinski
97c78d0af5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 17:57:57 -07:00
Jason Gunthorpe
1a010d73ef Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================

This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.

1) mlx5 single E-Switch FDB for lag

2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq

3) Add DCS caps & fields support

We need this in net-next as multiple features are dependent on the
single FDB feature.

====================

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

* mellanox/mlx5-next:
  net/mlx5: Lag, Create shared FDB when in switchdev mode
  net/mlx5: E-Switch, add logic to enable shared FDB
  net/mlx5: Lag, move lag destruction to a workqueue
  net/mlx5: Lag, properly lock eswitch if needed
  net/mlx5: Add send to vport rules on paired device
  net/mlx5: E-Switch, Add event callback for representors
  net/mlx5e: Use shared mappings for restoring from metadata
  net/mlx5e: Add an option to create a shared mapping
  net/mlx5: E-Switch, set flow source for send to uplink rule
  RDMA/mlx5: Add shared FDB support
  {net, RDMA}/mlx5: Extend send to vport rules
  RDMA/mlx5: Fill port info based on the relevant eswitch
  net/mlx5: Lag, add initial logic for shared FDB
  net/mlx5: Return mdev from eswitch
  IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
2021-08-22 19:22:58 -03:00
Maor Gottlieb
da78fe5fb3 RDMA/mlx5: Fix crash when unbind multiport slave
Fix the below crash when deleting a slave from the unaffiliated list
twice. First time when the slave is bound to the master and the second
when the slave is unloaded.

Fix it by checking if slave is unaffiliated (doesn't have ib device)
before removing from the list.

  RIP: 0010:mlx5r_mp_remove+0x4e/0xa0 [mlx5_ib]
  Call Trace:
   auxiliary_bus_remove+0x18/0x30
   __device_release_driver+0x177/x220
   device_release_driver+0x24/0x30
   bus_remove_device+0xd8/0x140
   device_del+0x18a/0x3e0
   mlx5_rescan_drivers_locked+0xa9/0x210 [mlx5_core]
   mlx5_unregister_device+0x34/0x60 [mlx5_core]
   mlx5_uninit_one+0x32/0x100 [mlx5_core]
   remove_one+0x6e/0xe0 [mlx5_core]
   pci_device_remove+0x36/0xa0
   __device_release_driver+0x177/0x220
   device_driver_detach+0x3c/0xa0
   unbind_store+0x113/0x130
   kernfs_fop_write_iter+0x110/0x1a0
   new_sync_write+0x116/0x1a0
   vfs_write+0x1ba/0x260
   ksys_write+0x5f/0xe0
   do_syscall_64+0x3d/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 93f8244431 ("RDMA/mlx5: Convert mlx5_ib to use auxiliary bus")
Link: https://lore.kernel.org/r/17ec98989b0ba88f7adfbad68eb20bce8d567b44.1628587493.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19 09:59:20 -03:00
Mark Bloch
c446d9da64 RDMA/mlx5: Add shared FDB support
Shared FDB allows to create a single RDMA device that holds representors
from both eswitches. As shared FDB is only active when both uplink
representors are enslaved there is a single RDMA port that represents
both uplinks.

The number of ports is the number of vports on both eswitches minus one
as we only need 1 port for both uplinks.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Leon Romanovsky
20da44dfe8 RDMA/mlx5: Drop in-driver verbs object creations
There is no real value in bypassing IB/core APIs for creating standard
objects with standard types. The open-coded variant didn't have any
restrack task management calls and caused to such objects to be not
present when running rdmatoool.

Link: https://lore.kernel.org/r/f745590e5fb7d56f90fdb25f64ee3983ba17e1e4.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:28 -03:00
Leon Romanovsky
514aee660d RDMA: Globally allocate and release QP memory
Convert QP object to follow IB/core general allocation scheme.  That
change allows us to make sure that restrack properly kref the memory.

Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com
Reviewed-by: Gal Pressman <galpress@amazon.com> #efa
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
0dc0da15ed RDMA/mlx5: Rework custom driver QP type creation
Starting from commit 2b1f747071 ("RDMA/core: Allow drivers to disable
restrack DB") the restrack is able to handle non-standard QP types either.

That change allows us to rewrite custom QP calls to their IB/core
counterparts, so we will use general QP creation flow even for the driver
QP types.

Link: https://lore.kernel.org/r/51682ab82298748941f38bd23ee3bf77ef1cab7b.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
8c9e7f0325 RDMA/mlx5: Delete device resource mutex that didn't protect anything
The dev->devr.mutex was intended to protect GSI QP pointer change in the
struct mlx5_ib_port_resources when it is accessed from the
pkey_change_work. However that pointer isn't changed during the runtime
and once IB/core adds MAD, it stays stable.

Link: https://lore.kernel.org/r/6e338c561033df20d92e1371fc6a7a0d93aad945.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
b0791dbf12 RDMA/mlx5: Cancel pkey work before destroying device resources
In the driver release flow, we are ensuring that notifier is disabled and
no new works can be added to pkey_change_handler. It means that we can
cancel that handler before destroying resources to make sure that our
unwind routine is symmetrical to the allocation one.

Link: https://lore.kernel.org/r/f2b1ea1bad952e4e7a48a6f731de9e0344986b29.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:26 -03:00
Lior Nahmanson
11656f593a RDMA/mlx5: Add DCS offload support
DCS is an offload to SW load balancing of DC initiator work requests.

A single DCI can be connected to only one target at the time and can't
start new connection until the previous work request is completed.  This
limitation will cause to delay when the initiator process needs to
transfer data to multiple targets at the same time.  The SW solution is to
use a process that handling and spreading the work request on many DCIs
according to destinations.

This feature is an offload to this process and coming to reduce the load
from the CPU and improve the performance.

Link: https://lore.kernel.org/r/491c2c2afdb5b07de7f03eab3f93cf0704549dbc.1624258894.git.leonro@nvidia.com
Reviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-20 15:04:14 -03:00
Leon Romanovsky
4a754d7637 RDMA/mlx5: Don't access NULL-cleared mpi pointer
The "dev->port[i].mp.mpi" is set to NULL during mlx5_ib_unbind_slave_port()
execution, however that field is needed to add device to unaffiliated list.

Such flow causes to the following kernel panic while unloading mlx5_ib
module in multi-port mode, hence the device should be added to the list
prior to unbind call.

 RPC: Unregistered rdma transport module.
 RPC: Unregistered rdma backchannel transport module.
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 0 P4D 0
 Oops: 0002 [#1] SMP NOPTI
 CPU: 4 PID: 1904 Comm: modprobe Not tainted 5.13.0-rc7_for_upstream_min_debug_2021_06_24_12_08 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:mlx5_ib_cleanup_multiport_master+0x18b/0x2d0 [mlx5_ib]
 Code: 00 04 0f 85 c4 00 00 00 48 89 df e8 ef fa ff ff 48 8b 83 40 0d 00 00 48 8b 15 b9 e8 05 00 4a 8b 44 28 20 48 89 05 ad e8 05 00 <48> c7 00 d0 57 c5 a0 48 89 50 08 48 89 02 39 ab 88 0a 00 00 0f 86
 RSP: 0018:ffff888116ee3df8 EFLAGS: 00010296
 RAX: 0000000000000000 RBX: ffff8881154f6000 RCX: 0000000000000080
 RDX: ffffffffa0c557d0 RSI: ffff88810b69d200 RDI: 000000000002d8a0
 RBP: 0000000000000002 R08: ffff888110780408 R09: 0000000000000000
 R10: ffff88812452e1c0 R11: fffffffffff7e028 R12: 0000000000000000
 R13: 0000000000000080 R14: ffff888102c58000 R15: 0000000000000000
 FS:  00007f884393a740(0000) GS:ffff8882f5a00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001249f6004 CR4: 0000000000370ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  mlx5_ib_stage_init_cleanup+0x16/0xd0 [mlx5_ib]
  __mlx5_ib_remove+0x33/0x90 [mlx5_ib]
  mlx5r_remove+0x22/0x30 [mlx5_ib]
  auxiliary_bus_remove+0x18/0x30
  __device_release_driver+0x177/0x220
  driver_detach+0xc4/0x100
  bus_remove_driver+0x58/0xd0
  auxiliary_driver_unregister+0x12/0x20
  mlx5_ib_cleanup+0x13/0x897 [mlx5_ib]
  __x64_sys_delete_module+0x154/0x230
  ? exit_to_user_mode_prepare+0x104/0x140
  do_syscall_64+0x3f/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f8842e095c7
 Code: 73 01 c3 48 8b 0d d9 48 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 48 2c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffc68f6e758 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 00005638207929c0 RCX: 00007f8842e095c7
 RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000563820792a28
 RBP: 00005638207929c0 R08: 00007ffc68f6d701 R09: 0000000000000000
 R10: 00007f8842e82880 R11: 0000000000000206 R12: 0000563820792a28
 R13: 0000000000000001 R14: 0000563820792a28 R15: 00007ffc68f6fb40
 Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_ipoib ib_cm ib_umad mlx5_ib(-) mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core mlx5_core ptp pps_core [last unloaded: rpcrdma]
 CR2: 0000000000000000
 ---[ end trace a0bb7e20804e9e9b ]---

Fixes: 7ce6095e3b ("RDMA/mlx5: Don't add slave port to unaffiliated list")
Link: https://lore.kernel.org/r/899ac1b33a995be5ec0e16a4765c4e43c2b1ba5b.1624956444.git.leonro@nvidia.com
Reviewed-by: Itay Aveksis <itayav@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-29 19:57:02 -03:00
Jason Gunthorpe
2833c977c3 Merge branch 'mlx5_realtime_ts' into rdma.git for-next
Aharon Landau says:

====================
In case device supports only real-time timestamp, the kernel will fail to
create QP despite rdma-core requested such timestamp type.

It is because device returns free-running timestamp, and the conversion
from free-running to real-time is performed in the user space.

This series fixes it, by returning real-time timestamp.
====================

* mlx5_realtime_ts:
  RDMA/mlx5: Support real-time timestamp directly from the device
  RDMA/mlx5: Refactor get_ts_format functions to simplify code

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 15:08:39 -03:00
Jason Gunthorpe
fdcebbc2ac Linux 5.13-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDPuyMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvxgH/RKvSuRPwkJ2Jcp9
 VLi5kCbqtJlYLq6tB6peSJ8otKgxkcRwC0pIY4LlYIAWYboktLQ5RKp/9nB2h2FN
 aMZUMu6AI/lVJyFMI5MnKnJIUiUq+WXR3lSSlw68vwFLFdzqUZFNq+bqeiVvnIy1
 yqA6naj24Tu/RbYffQoPvdSJcU2SLXRMxwD8HRGiU2d51RaFsOvsZvF+P5TVcsEV
 ZmttJeER21CaI/A809eqaFmyGrUOcZZK9roZEbMwanTZOMw18biEsLu/UH4kBX01
 JC4+RlGxcWjQ5YNZgChsgoOK/CHzc6ITztTntdeDWAvwZjQFzV7pCy4/3BWne3O+
 5178yHM=
 =o8cN
 -----END PGP SIGNATURE-----

Merge tag 'v5.13-rc7' into rdma.git for-next

Linux 5.13-rc7

Needed for dependencies in following patches. Merge conflict in rxe_cmop.c
resolved by compining both patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 14:43:51 -03:00
Aharon Landau
336529518e RDMA/mlx5: Support real-time timestamp directly from the device
Currently, if the user asks for a real-time timestamp, the device will
return a free-running one, and the timestamp will be translated to
real-time in the user-space.

When the device supports only real-time timestamp and not free-running,
the creation of the QP will fail even though the user needs supported the
real-time one. To prevent this, we will return the real-time timestamp
directly from the device.

Link: https://lore.kernel.org/r/c6cfc8e6f038575c5c2de6505830f7e74e4de80d.1623829775.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 14:23:50 -03:00
Jason Gunthorpe
915e4af59f RDMA: Remove rdma_set_device_sysfs_group()
The driver's device group can be specified as part of the ops structure
like the device's port group. No need for the complicated API.

Link: https://lore.kernel.org/r/8964785a34fd3a29ff5b6693493f575b717e594d.1623427137.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 20:58:32 -03:00
Leon Romanovsky
7ce6095e3b RDMA/mlx5: Don't add slave port to unaffiliated list
The mlx5_ib_bind_slave_port() doesn't remove multiport device from the
unaffiliated list, but mlx5_ib_unbind_slave_port() did it. This unbalanced
flow caused to the situation where mlx5_ib_unaffiliated_port_list was
changed during iteration.

Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Link: https://lore.kernel.org/r/2726e6603b1e6ecfe76aa5a12a063af72173bcf7.1622477058.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 15:26:01 -03:00
Sergey Gorenko
c906b86e9c RDMA/mlx5: Add SQD2RTS bit to the alloc ucontext response
The new bit in the comp_mask is needed to mark that kernel supports
SQD2RTS transition for the modify QP command.

Link: https://lore.kernel.org/r/7ce705fedac1b2b8e3a2f4013e04244dc5946344.1620641808.git.leonro@nvidia.com
Reviewed-by: Evgenii Kochetov <evgeniik@nvidia.com>
Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-20 11:41:07 -03:00
Maor Gottlieb
97f30d324c RDMA/mlx5: Recover from fatal event in dual port mode
When there is fatal event on the slave port, the device is marked as not
active. We need to mark it as active again when the slave is recovered to
regain full functionality.

Fixes: d69a24e036 ("IB/mlx5: Move IB event processing onto a workqueue")
Link: https://lore.kernel.org/r/8906754455bb23019ef223c725d2c0d38acfb80b.1620711734.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-17 13:52:47 -03:00
Parav Pandit
dedbc2d358 IB/mlx5: Set right RoCE l3 type and roce version while deleting GID
Currently when GID is deleted, it zero out all the fields of the RoCE
address in the SET_ROCE_ADDRESS command for a specified index.

roce_version = 0 means RoCEv1 in the SET_ROCE_ADDRESS command.

This assumes that device has RoCEv1 always enabled which is not always
correct. For example Subfunction does not support RoCEv1.

Due to this assumption a previously added RoCEv2 GID is always deleted as
RoCEv1 GID. This results in a below syndrome:

   mlx5_core.sf mlx5_core.sf.4: mlx5_cmd_check:777:(pid 4256): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x12822d)

Hence set the right RoCE version during GID deletion provided by the core.

Link: https://lore.kernel.org/r/d3f54129c90ca329caf438dbe31875d8ad08d91a.1618753425.git.leonro@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-20 09:41:10 -03:00
Jason Gunthorpe
fe73f96e7b Merge branch 'mlx5_memic_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Maor Gottlieb says:
====================
This series from Maor extends MEMIC to support atomic operations from the
host in addition to already supported regular read/write.
====================

* 'memic_ops':
  RDMA/mlx5: Expose UAPI to query DM
  RDMA/mlx5: Add support in MEMIC operations
  RDMA/mlx5: Add support to MODIFY_MEMIC command
  RDMA/mlx5: Re-organize the DM code
  RDMA/mlx5: Move all DM logic to separate file
  RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number
  net/mlx5: Add MEMIC operations related bits
2021-04-13 19:37:17 -03:00
Maor Gottlieb
cea85fa5db RDMA/mlx5: Add support in MEMIC operations
MEMIC buffer, in addition to regular read and write operations, can
support atomic operations from the host.

Introduce and implement new UAPI to allocate address space for MEMIC
operations such as atomic. This includes:

1. Expose new IOCTL for request mapping of MEMIC operation.
2. Hold the operations address in a list, so same operation to same DM
   will be allocated only once.
3. Manage refcount on the mlx5_ib_dm object, so it would be keep valid
   until all addresses were unmapped.

Link: https://lore.kernel.org/r/20210411122924.60230-7-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Maor Gottlieb
251b9d7887 RDMA/mlx5: Re-organize the DM code
1. Inline the checks from check_dm_type_support() into their
   respective allocation functions.
2. Fix use after free when driver fails to copy the MEMIC address to the
   user by moving the allocation code into their respective functions,
   hence we avoid the explicit call to free the DM in the error flow.
3. Split mlx5_ib_dm struct to memic and icm proper typesafety
   throughout.

Fixes: dc2316eba7 ("IB/mlx5: Fix device memory flows")
Link: https://lore.kernel.org/r/20210411122924.60230-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Maor Gottlieb
831df88381 RDMA/mlx5: Move all DM logic to separate file
Move all device memory related code to a separate file.

Link: https://lore.kernel.org/r/20210411122924.60230-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Jason Gunthorpe
a0354d2308 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================

This pr contains changes from  mlx5-next branch,
already reviewed on netdev and rdma mailing lists, links below.

1) From Leon, Dynamically assign MSI-X vectors count
Already Acked by Bjorn Helgaas.
https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/

2) Cleanup series:
https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/

From Mark, E-Switch cleanups and refactoring, and the addition
of single FDB mode needed HW bits.

From Mikhael, Remove unused struct field

From Saeed, Cleanup W=1 prototype warning

From Zheng, Esw related cleanup

From Tariq, User order-0 page allocation for EQs

====================

* mlx5-next:
  net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks
  net/mlx5: Dynamically assign MSI-X vectors count
  net/mlx5: Add dynamic MSI-X capabilities bits
  PCI/IOV: Add sysfs MSI-X vector assignment interface
  net/mlx5: Use order-0 allocations for EQs
  net/mlx5: Add IFC bits needed for single FDB mode
  net/mlx5: E-Switch, Refactor send to vport to be more generic
  RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
  net/mlx5: E-Switch, Add eswitch pointer to each representor
  net/mlx5: E-Switch, Add match on vhca id to default send rules
  net/mlx5: Remove unused mlx5_core_health member recover_work
  net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()
  net/mlx5: Cleanup prototype warning

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12 13:49:48 -03:00
Shay Drory
e5dc370bd9 RDMA/mlx5: Set ODP caps only if device profile support ODP
Currently, ODP caps are set during the init stage of mlx5_ib_dev,
regardless of whether the device profile supports ODP or not.  There is no
point in setting ODP caps if the device profile doesn't support
ODP. Hence, move setting the ODP caps to the odp_init stage.

Link: https://lore.kernel.org/r/20210318135259.681264-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 11:57:38 -03:00
Mark Bloch
1fb7f8973f RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs.  HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 09:31:21 -03:00
Mark Bloch
658cfceb62 RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
Now that a pointer to the managing E-Switch is stored in the representor
use it.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 13:07:23 -08:00
Yishai Hadas
2904bb37b3 IB/core: Split uverbs_get_const/default to consider target type
Change uverbs_get_const/uverbs_get_const_default to work properly with
both signed/unsigned parameters.

Current APIs mix s64 and u64 which leads to incorrect check when u64
value was supplied and its upper bit was set. In that case
uverbs_get_const() / uverbs_get_const_default() lower bound check may
fail unexpectedly, target is unsigned (lower bound is 0) but value
became negative as of the s64 usage.

Split to have two different APIs, no change to callers as the required
API will be called internally according to the target type.

Link: https://lore.kernel.org/r/20210304130501.1102577-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:20:36 -04:00
Maor Gottlieb
7852546f52 RDMA/mlx5: Fix query RoCE port
mlx5_is_roce_enabled returns the devlink RoCE init value, therefore it
should be used only when driver is loaded. Instead we just need to read
the roce_en field.

In addition, rename mlx5_is_roce_enabled to mlx5_is_roce_init_enabled.

Fixes: 7a58779edd ("IB/mlx5: Improve query port for representor port")
Link: https://lore.kernel.org/r/20210304124517.1100608-2-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:12:39 -04:00
Jason Gunthorpe
7289e26f39 Linux 5.11
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmAppPgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGeXYH/imZPBd4A1jIMehN
 5HV2A53Z+MXmmaMuGj9X1KV6vsf55/xB+IhOoFdtRAIsO8c2yYSCO8i4+4R0XfYA
 +/YFJeq672rojQnmh6XbpR8dugaAV7CUHy6n7KDsyvtT6EOCpwFSwkOb4X3tBRX6
 TlYgm2d/xgV/wRHSgLVugK0MdFCLMAnyb7mkPfar9QrMgG1BiDKLq07xmwnS23On
 TkqpJ9yZ/rJpUrrUqQYPShSO/FmA+fSfWs0CDv7EIrJ40LUScD6PZxSHWTIHtjLk
 E4jFda6wuqLRVWsBwaBzUIdD0zk7X5quHRzEpbC5ga16SK6yrWvE5YJJXCguIEuZ
 f3FMRYs=
 =CAjn
 -----END PGP SIGNATURE-----

Merge tag 'v5.11' into rdma.git for-next

Linux 5.11

Merged to resolve conflicts with RDMA rc commits

- drivers/infiniband/sw/rxe/rxe_net.c
  The final logic is to call rxe_get_dev_from_net() again with the master
  netdev if the packet was rx'd on a vlan. To keep the elimination of the
  local variables requires a trivial edit to the code in -rc

Link: https://lore.kernel.org/r/20210210131542.215ea67c@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-18 11:19:29 -04:00
Yishai Hadas
db72438c93 RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flow
Cleanup the synchronize_srcu() from the ODP flow as it was found to be a
very heavy time consumer as part of dereg_mr.

For example de-registration of 10000 ODP MRs each with size of 2M hugepage
took 19.6 sec comparing de-registration of same number of non ODP MRs that
took 172 ms.

The new locking scheme uses the wait_event() mechanism which follows the
use count of the MR instead of using synchronize_srcu().

By that change, the time required for the above test took 95 ms which is
even better than the non ODP flow.

Once fully dropped the srcu usage, had to come with a lock to protect the
XA access.

As part of using the above mechanism we could also clean the
num_deferred_work stuff and follow the use count instead.

Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-08 20:31:11 -04:00
Parav Pandit
131796524f IB/mlx5: Use rdma_for_each_port for port iteration
Instead of open coding the loop for port iteration, use rdma_for_each_port
macro provided by core.

To use such macro, early initialization of phys_port_cnt is needed.
Hence, initialize such constant early enough in the init stage.

Whichever functions are called with port using rdma_for_each_port(),
convert their port type from u8 to unsigned int to match the core API.

Link: https://lore.kernel.org/r/20210203130133.4057329-6-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:01 -04:00
Parav Pandit
7416790e22 RDMA/core: Introduce and use API to read port immutable data
Currently mlx5 driver caches port GID table length for 2 ports.  It is
also cached by IB core as port immutable data.

When mlx5 representor ports are present, which are usually more than 2,
invalid access to port_caps array can happen while validating the GID
table length which is only for 2 ports.

To avoid this, take help of the IB cores port immutable data by exposing
an API to read the port immutable fields.

Remove mlx5 driver's internal cache, thereby reduce code and data.

Link: https://lore.kernel.org/r/20210203130133.4057329-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:01 -04:00