1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

19 commits

Author SHA1 Message Date
Mustafa Ismail
81091d7696 RDMA/irdma: Add SW mechanism to generate completions on error
HW flushes after QP in error state is not reliable. This can lead to
   application hang waiting on a completion for outstanding WRs.  Implement a
SW mechanism to generate completions for any outstanding WR's after the QP
is modified to error.

This is accomplished by starting a delayed worker after the QP is modified
to error and the HW flush is performed. The worker will generate
completions that will be returned to the application when it polls the
CQ. This mechanism only applies to Kernel applications.

Link: https://lore.kernel.org/r/20220425181624.1617-1-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-11 15:58:40 -03:00
Jason Gunthorpe
e945c653c8 RDMA: Split kernel-only global device caps from uverbs device caps
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.

This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.

Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.

Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-06 15:02:13 -03:00
Mustafa Ismail
b200189626 RDMA/irdma: Fix Passthrough mode in VM
Using PCI_FUNC macro in a VM, when the device is in passthrough mode does
not provide the real function instance. This means that currently, devices
will not probe unless the instance in the VM matches the instance in the
host.

Fix this by getting the pf_id from the LAN during the probe.

Fixes: 8498a30e1b ("RDMA/irdma: Register auxiliary driver and implement private channel OPs")
Link: https://lore.kernel.org/r/20220225163211.127-3-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-28 12:07:40 -04:00
Shiraz Saleem
2322d17abf RDMA/irdma: Remove excess error variables
As irdma_status_code is replaced with an int, there is no need for two
variables to hold error codes.

Remove the excess variable in functions where this occurs.  Also, remove
any redundant initializations which are no longer needed.

Link: https://lore.kernel.org/r/20220217151851.1518-4-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23 15:24:19 -04:00
Shiraz Saleem
45225a93cc RDMA/irdma: Propagate error codes
All functions now return linux error codes. Propagate the return from
these functions as opposed to converting them to generic values.

Link: https://lore.kernel.org/r/20220217151851.1518-3-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23 15:24:18 -04:00
Shiraz Saleem
2c4b14ea95 RDMA/irdma: Remove enum irdma_status_code
Replace use of custom irdma_status_code with linux error codes.

Remove enum irdma_status_code and header in which its defined.

Link: https://lore.kernel.org/r/20220217151851.1518-2-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23 15:24:18 -04:00
Linus Torvalds
3689f9f8b0 bitmap patches for 5.17-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
 b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
 A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
 iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
 m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
 9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
 MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
 nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
 CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
 5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
 =RKW4
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - introduce for_each_set_bitrange()

 - use find_first_*_bit() instead of find_next_*_bit() where possible

 - unify for_each_bit() macros

* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
  vsprintf: rework bitmap_list_string
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  bitmap: unify find_bit operations
  mm/percpu: micro-optimize pcpu_is_populated()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  find: micro-optimize for_each_{set,clear}_bit()
  include/linux: move for_each_bit() macros from bitops.h to find.h
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  tools: sync tools/bitmap with mother linux
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  cpumask: use find_first_and_bit()
  lib: add find_first_and_bit()
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  include: move find.h from asm_generic to linux
  bitops: move find_bit_*_le functions from le.h to find.h
  bitops: protect find_first_{,zero}_bit properly
2022-01-23 06:20:44 +02:00
Yury Norov
b5c7e7ec7d all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2022-01-15 08:47:31 -08:00
Ingo Molnar
0422fe2666 Merge branch 'linus' into irq/core, to fix conflict
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-01-08 10:53:57 +01:00
Nitesh Narayan Lal
fb5bd85471 RDMA/irdma: Use irq_update_affinity_hint()
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.

To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://lore.kernel.org/r/20210903152430.244937-7-nitesh@redhat.com
2021-12-10 20:47:38 +01:00
Tatyana Nikolova
10467ce09f RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQ
Completion events (CEs) are lost if the application is allowed to arm the
CQ more than two times when no new CE for this CQ has been generated by
the HW.

Check if arming has been done for the CQ and if not, arm the CQ for any
event otherwise promote to arm the CQ for any event only when the last arm
event was solicited.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20211201231509.1930-2-shiraz.saleem@intel.com
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:53:01 -04:00
Shiraz Saleem
25b5d6fd6d RDMA/irdma: Report correct WC errors
Return IBV_WC_REM_OP_ERR for responder QP errors instead of
IBV_WC_REM_ACCESS_ERR.

Return IBV_WC_LOC_QP_OP_ERR for errors detected on the SQ with bad opcodes

Fixes: 44d9e52977 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20211201231509.1930-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:53:01 -04:00
Jakub Kicinski
fd92213e9a RDMA: Constify netdev->dev_addr accesses
netdev->dev_addr will become const soon, make sure drivers propagate the
qualifier.

Link: https://lore.kernel.org/r/20211019182604.1441387-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-25 14:33:09 -03:00
Sindhu Devale
9f7fa37a6b RDMA/irdma: Report correct WC error when there are MW bind errors
Report the correct WC error when MW bind error related asynchronous events
are generated by HW.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20210916191222.824-5-shiraz.saleem@intel.com
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20 14:13:23 -03:00
Sindhu Devale
d3bdcd5963 RDMA/irdma: Report correct WC error when transport retry counter is exceeded
When the retry counter exceeds, as the remote QP didn't send any Ack or
Nack an asynchronous event (AE) for too many retries is generated. Add
code to handle the AE and set the correct IB WC error code
IB_WC_RETRY_EXC_ERR.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20210916191222.824-4-shiraz.saleem@intel.com
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20 14:13:23 -03:00
Sindhu Devale
5b1e985f76 RDMA/irdma: Skip CQP ring during a reset
Due to duplicate reset flags, CQP commands are processed during reset.

This leads CQP failures such as below:

 irdma0: [Delete Local MAC Entry Cmd Error][op_code=49] status=-27 waiting=1 completion_err=0 maj=0x0 min=0x0

Remove the redundant flag and set the correct reset flag so CPQ is paused
during reset

Fixes: 8498a30e1b ("RDMA/irdma: Register auxiliary driver and implement private channel OPs")
Link: https://lore.kernel.org/r/20210916191222.824-2-shiraz.saleem@intel.com
Reported-by: LiLiang <liali@redhat.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20 14:13:22 -03:00
Zhu Yanjun
41f5fa9fa7 RDMA/irdma: Change the returned type of irdma_set_hw_rsrc to void
Since the function irdma_set_hw_rsrc always returns zero, change the
returned type to void and remove all the related source code.

Link: https://lore.kernel.org/r/20210714031130.1511109-3-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15 15:14:10 -03:00
Shiraz Saleem
2db7b2eac7 RDMA/irdma: Store PBL info address a pointer type
The level1 PBL info address is stored as u64.  This requires casting
through a uinptr_t before used as a pointer type.

And this leads to sparse warning such as this when uinptr_t is missing:

drivers/infiniband/hw/irdma/hw.c: In function 'irdma_destroy_virt_aeq':
drivers/infiniband/hw/irdma/hw.c:579:23: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
  579 |  dma_addr_t *pg_arr = (dma_addr_t *)aeq->palloc.level1.addr;

This can be fixed using an intermediate uintptr_t, but rather it is better
to fix the structure irdm_pble_info to store the address as u64* and the
VA it is assigned in irdma_chunk as a void*. This greatly reduces the
casting on this address.

Fixes: 44d9e52977 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20210609234924.938-1-shiraz.saleem@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10 09:39:27 -03:00
Mustafa Ismail
44d9e52977 RDMA/irdma: Implement device initialization definitions
Implement device initialization routines, interrupt set-up,
and allocate object bit-map tracking structures.
Also, add device specific attributes and register definitions.

Link: https://lore.kernel.org/r/20210602205138.889-3-shiraz.saleem@intel.com
[flexible array transformation]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 19:55:16 -03:00