1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

749 commits

Author SHA1 Message Date
Alex Vesker
41e684ef3f IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
Until now the flex parser capability was used in ib_query_device() to
indicate tunnel_offloads_caps support for mpls_over_gre/mpls_over_udp.

Newer devices and firmware will have configurations with the flexparser
but without mpls support.

Testing for the flex parser capability was a mistake, the tunnel_stateless
capability was intended for detecting mpls and was introduced at the same
time as the flex parser capability.

Otherwise userspace will be incorrectly informed that a future device
supports MPLS when it does not.

Link: https://lore.kernel.org/r/20200305123841.196086-1-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.17
Fixes: e818e255a5 ("IB/mlx5: Expose MPLS related tunneling offloads")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:41:35 -03:00
Mark Zhang
ec16b6bbda RDMA/mlx5: Fix the number of hwcounters of a dynamic counter
When we read the global counter and there's any dynamic counter allocated,
the value of a hwcounter is the sum of the default counter and all dynamic
counters. So the number of hwcounters of a dynamically allocated counter
must be same as of the default counter, otherwise there will be read
violations.

This fixes the KASAN slab-out-of-bounds bug:

  BUG: KASAN: slab-out-of-bounds in rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
  Read of size 8 at addr ffff8884192a5778 by task rdma/10138

  CPU: 7 PID: 10138 Comm: rdma Not tainted 5.5.0-for-upstream-dbg-2020-02-06_18-30-19-27 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0xb7/0x10b
   print_address_description.constprop.4+0x1e2/0x400
   ? rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   __kasan_report+0x15c/0x1e0
   ? mlx5_ib_query_q_counters+0x13f/0x270 [mlx5_ib]
   ? rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   kasan_report+0xe/0x20
   rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   ? rdma_counter_query_stats+0xd0/0xd0 [ib_core]
   ? memcpy+0x34/0x50
   ? nla_put+0xe2/0x170
   nldev_stat_get_doit+0x9c7/0x14f0 [ib_core]
   ...
   do_syscall_64+0x95/0x490
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7fcc457fe65a
  Code: bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 fa f1 2b 00 45 89 c9 4c 63 d1 48 63 ff 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 f3 c3 0f 1f 40 00 41 55 41 54 4d 89 c5 55
  RSP: 002b:00007ffc0586f868 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcc457fe65a
  RDX: 0000000000000020 RSI: 00000000013db920 RDI: 0000000000000003
  RBP: 00007ffc0586fa90 R08: 00007fcc45ac10e0 R09: 000000000000000c
  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004089c0
  R13: 0000000000000000 R14: 00007ffc0586fab0 R15: 00000000013dc9a0

  Allocated by task 9700:
   save_stack+0x19/0x80
   __kasan_kmalloc.constprop.7+0xa0/0xd0
   mlx5_ib_counter_alloc_stats+0xd1/0x1d0 [mlx5_ib]
   rdma_counter_alloc+0x16d/0x3f0 [ib_core]
   rdma_counter_bind_qpn_alloc+0x216/0x4e0 [ib_core]
   nldev_stat_set_doit+0x8c2/0xb10 [ib_core]
   rdma_nl_rcv_msg+0x3d2/0x730 [ib_core]
   rdma_nl_rcv+0x2a8/0x400 [ib_core]
   netlink_unicast+0x448/0x620
   netlink_sendmsg+0x731/0xd10
   sock_sendmsg+0xb1/0xf0
   __sys_sendto+0x25d/0x2c0
   __x64_sys_sendto+0xdd/0x1b0
   do_syscall_64+0x95/0x490
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 18d422ce8c ("IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support")
Link: https://lore.kernel.org/r/20200305124052.196688-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:36:47 -03:00
Jason Gunthorpe
3e3cf2e82c Merge branch 'mlx5_packet_pacing' into rdma.git for-next
Yishai Hadas Says:

====================
Expose raw packet pacing APIs to be used by DEVX based applications.  The
existing code was refactored to have a single flow with the new raw APIs.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_packet_pacing':
  IB/mlx5: Introduce UAPIs to manage packet pacing
  net/mlx5: Expose raw packet pacing APIs
2020-03-10 11:54:17 -03:00
Yishai Hadas
30f2fe40c7 IB/mlx5: Introduce UAPIs to manage packet pacing
Introduce packet pacing uobject and its alloc and destroy
methods.

This uobject holds mlx5 packet pacing context according to the device
specification and enables managing packet pacing device entries that are
needed by DEVX applications.

Link: https://lore.kernel.org/r/20200219190518.200912-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 11:53:52 -03:00
Parav Pandit
79db784e79 IB/mlx5: Fix missing congestion control debugfs on rep rdma device
Cited commit missed to include low level congestion control related
debugfs stage initialization.  This resulted in missing debugfs entries
for cc_params of a RDMA device.

Add them back.

Fixes: b5ca15ad7e ("IB/mlx5: Add proper representors support")
Link: https://lore.kernel.org/r/20200227125407.99803-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:20:14 -04:00
Alexander Lobakin
91b74bf531 IB/mlx5: Optimize u64 division on 32-bit arches
Commit f164be8c03 ("IB/mlx5: Extend caps stage to handle VAR
capabilities") introduced a straight "/" division of the u64 variable
"bar_size".

This was fixed with commit 685eff5131 ("IB/mlx5: Use div64_u64 for
num_var_hw_entries calculation"). However, div64_u64() is redundant here
as mlx5_var_table::stride_size is of type u32.  Make the actual code way
more optimized on 32-bit kernels using div_u64() and fix 80 chars
break-through by the way.

Fixes: 685eff5131 ("IB/mlx5: Use div64_u64 for num_var_hw_entries calculation")
Link: https://lore.kernel.org/r/20200217073629.8051-1-alobakin@dlink.ru
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:11:49 -04:00
Paul Blakey
0f0d3827c0 net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits
Multi chain support requires the miss path to continue the processing
from the last chain id, and for that we need to save the chain
miss tag (a mapping for 32bit chain id) on reg_c0 which will
come in a next patch.

Currently reg_c0 is exclusively used to store the source port
metadata, giving it 32bit, it is created from 16bits of vcha_id,
and 16bits of vport number.

We will move this source port metadata to upper 16bits, and leave the
lower bits for the chain miss tag. We compress the reg_c0 source port
metadata to 16bits by taking 8 bits from vhca_id, and 8bits from
the vport number.

Since we compress the vport number to 8bits statically, and leave two
top ids for special PF/ECPF numbers, we will only support a max of 254
vports with this strategy.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-19 17:49:48 -08:00
Jason Gunthorpe
685eff5131 IB/mlx5: Use div64_u64 for num_var_hw_entries calculation
On i386:

ERROR: "__udivdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!
ERROR: "__divdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!

Fixes: f164be8c03 ("IB/mlx5: Extend caps stage to handle VAR capabilities")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reported-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-14 15:21:52 -04:00
Leon Romanovsky
9b6d3bbc13 RDMA/mlx5: Prevent overflow in mmap offset calculations
The cmd and index variables declared as u16 and the result is supposed to
be stored in u64. The C arithmetic rules doesn't promote "(index >> 8) <<
16" to be u64 and leaves the end result to be u16.

Fixes: 7be76bef32 ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 10:39:23 -04:00
Linus Torvalds
8fdd4019bc RDMA subsystem updates for 5.6
- Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4, rxe,
   i40iw
 
 - Larger series doing cleanup and rework for hns and hfi1.
 
 - Some general reworking of the CM code to make it a little more
   understandable
 
 - Unify the different code paths connected to the uverbs FD scheme
 
 - New UAPI ioctls conversions for get context and get async fd
 
 - Trace points for CQ and CM portions of the RDMA stack
 
 - mlx5 driver support for virtio-net formatted rings as RDMA raw ethernet QPs
 
 - verbs support for setting the PCI-E relaxed ordering bit on DMA traffic
   connected to a MR
 
 - A couple of bug fixes that came too late to make rc7
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl4zPwQACgkQOG33FX4g
 mxoURw//fuQmuJ7aTMH+0qrhaZUmzXOcI/WKvY0YMyYLvxolRcIO+uCL239wxezR
 9iTHPO7HeYXUQ4W8Hi/fTyuQ9hzaPOP3wgOJfQhm4QT/XDpRW0H3Mb+hTLHTUAcA
 rgKc9suAn+5BbIDOz7hEfeOTssx1wYrLsaHDc11NZ42JuG6uvPR33lhXiKWG+5tH
 2MpfeTU6BjL035dm3YZXCo+ouobpdMuvzJItYIsB2E5Nl0s91SMzsymIYiD0gb3t
 yUJ3wqPW3pchfAl8VEn+W5AHTUYYgGjmEblL8WdVq5JRrkQgQzj8QtCRT9NOPAT0
 LivCvgBrm0kscaQS2TjtG56Ojbwz8z1QPE/4shf0pj/G2lZfacYDAeaUc/2VafxY
 y/KG+3dB1DxtYY3eXJUxbB7Vpk7kfr35p5b75NdMhd2t49oPgV7EKoZMLYGzfX4S
 PtyNyNSiwx8qsRTr4lznOMswmrDLfG4XiywWgYo6NGOWyKYlARWIYBAEQZ0DPTiE
 9mqJ19gusdSdAgm8LGDInPmH6/AojGOVzYonJFWdlOtwCXGNXL4Gx02x4WYHykDG
 w+oy5NMJbU3b6+MWEagkuQNcrwqv02MT1mB/Lgv4GPm6rS0UXR7zUPDeccE50fSL
 X36k28UlftlPlaD7PeJdTOAhyBv5DxfpL5rbB2TfpUTpNxjayuU=
 =hepK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A very quiet cycle with few notable changes. Mostly the usual list of
  one or two patches to drivers changing something that isn't quite rc
  worthy. The subsystem seems to be seeing a larger number of rework and
  cleanup style patches right now, I feel that several vendors are
  prepping their drivers for new silicon.

  Summary:

   - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
     rxe, i40iw

   - Larger series doing cleanup and rework for hns and hfi1.

   - Some general reworking of the CM code to make it a little more
     understandable

   - Unify the different code paths connected to the uverbs FD scheme

   - New UAPI ioctls conversions for get context and get async fd

   - Trace points for CQ and CM portions of the RDMA stack

   - mlx5 driver support for virtio-net formatted rings as RDMA raw
     ethernet QPs

   - verbs support for setting the PCI-E relaxed ordering bit on DMA
     traffic connected to a MR

   - A couple of bug fixes that came too late to make rc7"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
  RDMA/core: Make the entire API tree static
  RDMA/efa: Mask access flags with the correct optional range
  RDMA/cma: Fix unbalanced cm_id reference count during address resolve
  RDMA/umem: Fix ib_umem_find_best_pgsz()
  IB/mlx4: Fix leak in id_map_find_del
  IB/opa_vnic: Spelling correction of 'erorr' to 'error'
  IB/hfi1: Fix logical condition in msix_request_irq
  RDMA/cm: Remove CM message structs
  RDMA/cm: Use IBA functions for complex structure members
  RDMA/cm: Use IBA functions for simple structure members
  RDMA/cm: Use IBA functions for swapping get/set acessors
  RDMA/cm: Use IBA functions for simple get/set acessors
  RDMA/cm: Add SET/GET implementations to hide IBA wire format
  RDMA/cm: Add accessors for CM_REQ transport_type
  IB/mlx5: Return the administrative GUID if exists
  RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
  IB/mlx4: Fix memory leak in add_gid error flow
  IB/mlx5: Expose RoCE accelerator counters
  RDMA/mlx5: Set relaxed ordering when requested
  RDMA/core: Add the core support field to METHOD_GET_CONTEXT
  ...
2020-01-31 14:40:36 -08:00
Jason Gunthorpe
8889f6fa35 RDMA/core: Make the entire API tree static
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates
the following error.

on x86_64:

 ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC':
 main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file'
 ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class'
 ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler'

This is happening because some parts of the UAPI description are not
static. This is a hold over from earlier code that relied on struct
pointers to refer to object types, now object types are referenced by
number. Remove the unused globals and add statics to the remaining UAPI
description elements.

Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete
mlx5_ib_get_devx_tree().

The compiler now trims alot more unused code, including the above
problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS.

Fixes: 7be76bef32 ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-30 16:28:52 -04:00
Linus Torvalds
bd2463ac7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Add WireGuard

 2) Add HE and TWT support to ath11k driver, from John Crispin.

 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

 4) Add variable window congestion control to TIPC, from Jon Maloy.

 5) Add BCM84881 PHY driver, from Russell King.

 6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

 7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

 8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

 9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
  net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
  udp: segment looped gso packets correctly
  netem: change mailing list
  qed: FW 8.42.2.0 debug features
  qed: rt init valid initialization changed
  qed: Debug feature: ilt and mdump
  qed: FW 8.42.2.0 Add fw overlay feature
  qed: FW 8.42.2.0 HSI changes
  qed: FW 8.42.2.0 iscsi/fcoe changes
  qed: Add abstraction for different hsi values per chip
  qed: FW 8.42.2.0 Additional ll2 type
  qed: Use dmae to write to widebus registers in fw_funcs
  qed: FW 8.42.2.0 Parser offsets modified
  qed: FW 8.42.2.0 Queue Manager changes
  qed: FW 8.42.2.0 Expose new registers and change windows
  qed: FW 8.42.2.0 Internal ram offsets modifications
  MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
  Documentation: net: octeontx2: Add RVU HW and drivers overview
  octeontx2-pf: ethtool RSS config support
  octeontx2-pf: Add basic ethtool support
  ...
2020-01-28 16:02:33 -08:00
Jason Gunthorpe
e8b3a426fb Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
 user MRs through kernel ULPs as a proxy. The immediate use case is to
 allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
 MRs to be created and such MRs were not possible to create prior this
 series.
 
 The first part of this patchset extends RDMA to have special verb
 ib_reg_user_mr(). The common use case that uses this function is a
 userspace application that allocates memory for HCA access but the
 responsibility to register the memory at the HCA is on an kernel ULP.
 This ULP acts as an agent for the userspace application.
 
 The second part provides advise MR functionality for ULPs. This is
 integral part of ODP flows and used to trigger pagefaults in advance
 to prepare memory before running working set.
 
 The third part is actual user of those in-kernel APIs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
 scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
 +UcZb31auy5P8ueJYokRLhLAyRcOIAg=
 =yaHb
 -----END PGP SIGNATURE-----

Merge tag 'rds-odp-for-5.5' into rdma.git for-next

From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

* tag 'rds-odp-for-5.5':
  net/rds: Use prefetch for On-Demand-Paging MR
  net/rds: Handle ODP mr registration/unregistration
  net/rds: Detect need of On-Demand-Paging memory registration
  RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
  IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
  RDMA/mlx5: Don't fake udata for kernel path
  IB/mlx5: Add ODP WQE handlers for kernel QPs
  IB/core: Add interface to advise_mr for kernel users
  IB/core: Introduce ib_reg_user_mr
  IB: Allow calls to ib_umem_get from kernel ULPs

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21 09:55:04 -04:00
David S. Miller
ad063075d4 Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
 user MRs through kernel ULPs as a proxy. The immediate use case is to
 allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
 MRs to be created and such MRs were not possible to create prior this
 series.
 
 The first part of this patchset extends RDMA to have special verb
 ib_reg_user_mr(). The common use case that uses this function is a
 userspace application that allocates memory for HCA access but the
 responsibility to register the memory at the HCA is on an kernel ULP.
 This ULP acts as an agent for the userspace application.
 
 The second part provides advise MR functionality for ULPs. This is
 integral part of ODP flows and used to trigger pagefaults in advance
 to prepare memory before running working set.
 
 The third part is actual user of those in-kernel APIs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
 scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
 +UcZb31auy5P8ueJYokRLhLAyRcOIAg=
 =yaHb
 -----END PGP SIGNATURE-----

Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 10:22:51 +01:00
Saeed Mahameed
12e9e0d0d9 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
This merge syncs with mlx5-next latest HW bits and layout updates for next
features, in addition one patch that improves
mlx5_create_auto_grouped_flow_table() API across all mlx5 users.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
  net/mlx5e: Add discard counters per priority
  net/mlx5e: Expose FEC feilds and related capability bit
  net/mlx5: Add mlx5_ifc definitions for connection tracking support
  net/mlx5: Add copy header action struct layout
  net/mlx5: Expose resource dump register mapping
  net/mlx5: Add structures and defines for MIRC register
  net/mlx5: Read MCAM register groups 1 and 2
  net/mlx5: Add structures layout for new MCAM access reg groups
  net/mlx5: Expose vDPA emulation device capabilities
  net/mlx5: Add Virtio Emulation related device capabilities

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:48:24 -08:00
Paul Blakey
61dc7b0141 net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:41:59 -08:00
Avihai Horon
d7fab91637 IB/mlx5: Expose RoCE accelerator counters
Introduce the following RoCE accelerator counters:
* roce_adp_retrans - number of adaptive retransmission for RoCE traffic.
* roce_adp_retrans_to - number of times RoCE traffic reached time out
  due to adaptive retransmission.
* roce_slow_restart - number of times RoCE slow restart was used.
* roce_slow_restart_cnps - number of times RoCE slow restart
  generate CNP packets.
* roce_slow_restart_trans - number of times RoCE slow restart change
  state to slow restart.

Link: https://lore.kernel.org/r/20200115145459.83280-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 16:11:52 -04:00
Moni Shoua
a73a895588 IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
The ODP handler for WQEs in RQ or SRQ is not implented for kernel QPs.
Therefore don't report support in these if query comes from a kernel user.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:15:00 +02:00
Leon Romanovsky
4835709176 RDMA/mlx5: Don't fake udata for kernel path
Kernel paths must not set udata and provide NULL pointer,
instead of faking zeroed udata struct.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:53 +02:00
Yishai Hadas
3f59b6c3e6 IB/mlx5: Add mmap support for VAR
Add mmap support for VAR, it uses the 'offset' command mode with
involvement of IB core APIs to find the previously allocated mmap entry.

Link: https://lore.kernel.org/r/20191212110928.334995-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Yishai Hadas
7be76bef32 IB/mlx5: Introduce VAR object and its alloc/destroy methods
Introduce VAR object and its alloc/destroy KABI methods. The internal
implementation uses the IB core API to manage mmap/munamp calls.

Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Yishai Hadas
f164be8c03 IB/mlx5: Extend caps stage to handle VAR capabilities
Extend caps stage to handle VAR capabilities.

Link: https://lore.kernel.org/r/20191212110928.334995-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Ingo Molnar
57ad87ddce Merge branch 'x86/mm' into efi/core, to pick up dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:53:14 +01:00
Parav Pandit
4cca96a8d9 IB/mlx5: Do reverse sequence during device removal
When IB device profile initialization completes, device is marked as
active.

However, IB device is not marked inactive, during device removal flow. It
should be the mirror of the add flow.

Hence, mark it inactive during remove sequence.

Link: https://lore.kernel.org/r/20191212113024.336702-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 20:18:10 -04:00
Yishai Hadas
dc2316eba7 IB/mlx5: Fix device memory flows
Fix device memory flows so that only once there will be no live mmaped
VA to a given allocation the matching object will be destroyed.

This prevents a potential scenario that existing VA that was mmaped by
one process might still be used post its deallocation despite that it's
owned now by other process.

The above is achieved by integrating with IB core APIs to manage
mmap/munmap. Only once the refcount will become 0 the DM object and its
underlay area will be freed.

Fixes: 3b113a1ec3 ("IB/mlx5: Support device memory type attribute")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212100237.330654-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 16:55:36 -05:00
Maor Gottlieb
ed9085fed9 IB/mlx5: Fix steering rule of drop and count
There are two flow rule destinations: QP and packet. While users are
setting DROP packet rule, the QP should not be set as a destination.

Fixes: 3b3233fbf0 ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 15:38:16 -05:00
Ingo Molnar
eb243d1d28 x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h>
pat.h is a file whose main purpose is to provide the memtype_*() APIs.

PAT is the low level hardware mechanism - but the high level abstraction
is memtype.

So name the header <memtype.h> as well - this goes hand in hand with memtype.c
and memtype_interval.c.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Linus Torvalds
9a3d7fd275 Driver core patches for 5.5-rc1
Here is the "big" set of driver core patches for 5.5-rc1
 
 There's a few minor cleanups and fixes in here, but the majority of the
 patches in here fall into two buckets:
   - debugfs api cleanups and fixes
   - driver core device link support for boot dependancy issues
 
 The debugfs api cleanups are working to slowly refactor the debugfs apis
 so that it is even harder to use incorrectly.  That work has been
 happening for the past few kernel releases and will continue over time,
 it's a long-term project/goal
 
 The driver core device link support missed 5.4 by just a bit, so it's
 been sitting and baking for many months now.  It's from Saravana Kannan
 to help resolve the problems that DT-based systems have at boot time
 with dependancy graphs and kernel modules.  Turns out that no one has
 actually tried to build a generic arm64 kernel with loads of modules and
 have it "just work" for a variety of platforms (like a distro kernel)
 The big problem turned out to be a lack of depandancy information
 between different areas of DT entries, and the work here resolves that
 problem and now allows devices to boot properly, and quicker than a
 monolith kernel.
 
 All of these patches have been in linux-next for a long time with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXd6m6Q8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yntJQCcCqg6RQ7LTdHuZv1ETeefXlsfk00An1Jtean6
 42bWGx52bGFvAcpjWy8R
 =P7hq
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the "big" set of driver core patches for 5.5-rc1

  There's a few minor cleanups and fixes in here, but the majority of
  the patches in here fall into two buckets:

   - debugfs api cleanups and fixes

   - driver core device link support for boot dependancy issues

  The debugfs api cleanups are working to slowly refactor the debugfs
  apis so that it is even harder to use incorrectly. That work has been
  happening for the past few kernel releases and will continue over
  time, it's a long-term project/goal

  The driver core device link support missed 5.4 by just a bit, so it's
  been sitting and baking for many months now. It's from Saravana Kannan
  to help resolve the problems that DT-based systems have at boot time
  with dependancy graphs and kernel modules. Turns out that no one has
  actually tried to build a generic arm64 kernel with loads of modules
  and have it "just work" for a variety of platforms (like a distro
  kernel). The big problem turned out to be a lack of dependency
  information between different areas of DT entries, and the work here
  resolves that problem and now allows devices to boot properly, and
  quicker than a monolith kernel.

  All of these patches have been in linux-next for a long time with no
  reported issues"

* tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits)
  tracing: Remove unnecessary DEBUG_FS dependency
  of: property: Add device link support for interrupt-parent, dmas and -gpio(s)
  debugfs: Fix !DEBUG_FS debugfs_create_automount
  of: property: Add device link support for "iommu-map"
  of: property: Fix the semantics of of_is_ancestor_of()
  i2c: of: Populate fwnode in of_i2c_get_board_info()
  drivers: base: Fix Kconfig indentation
  firmware_loader: Fix labels with comma for builtin firmware
  driver core: Allow device link operations inside sync_state()
  driver core: platform: Declare ret variable only once
  cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h>
  crypto: hisilicon: no need to check return value of debugfs_create functions
  driver core: platform: use the correct callback type for bus_find_device
  firmware_class: make firmware caching configurable
  driver core: Clarify documentation for fwnode_operations.add_links()
  mailbox: tegra: Fix superfluous IRQ error message
  net: caif: Fix debugfs on 64-bit platforms
  mac80211: Use debugfs_create_xul() helper
  media: c8sectpfe: no need to check return value of debugfs_create functions
  of: property: Add device link support for iommus, mboxes and io-channels
  ...
2019-11-27 11:06:20 -08:00
Jason Gunthorpe
3694e41e41 Merge branch 'ib-guids' into rdma.git for-next
Danit Goldberg says:

====================
This series extends RTNETLINK to provide IB port and node GUIDs, which
were configured for Infiniband VFs.

The functionality to set VF GUIDs already existed for a long time, and
here we are adding the missing "get" so that netlink will be symmetric and
various cloud orchestration tools will be able to manage such VFs more
naturally.

The iproute2 was extended too to present those GUIDs.

- ip link show <device>

For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
    ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
    link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0     link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'ib-guids': (35 commits)
  IB/mlx5: Implement callbacks for getting VFs GUID attributes
  IB/ipoib: Add ndo operation for getting VFs GUID attributes
  IB/core: Add interfaces to get VF node and port GUIDs
  net/core: Add support for getting VF GUIDs

  net/mlx5: Add new chain for netfilter flow table offload
  net/mlx5: Refactor creating fast path prio chains
  net/mlx5: Accumulate levels for chains prio namespaces
  net/mlx5: Define fdb tc levels per prio
  net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines
  net/mlx5: Simplify fdb chain and prio eswitch defines
  IB/mlx5: Load profile according to RoCE enablement state
  IB/mlx5: Rename profile and init methods
  net/mlx5: Handle "enable_roce" devlink param
  net/mlx5: Document flow_steering_mode devlink param
  devlink: Add new "enable_roce" generic device param
  net/mlx5: fix spelling mistake "metdata" -> "metadata"
  net/mlx5: fix kvfree of uninitialized pointer spec
  IB/mlx5: Introduce and use mlx5_core_is_vf()
  net/mlx5: E-switch, Enable metadata on own vport
  net/mlx5: Refactor ingress acl configuration
  ...

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Danit Goldberg
9c0015ef09 IB/mlx5: Implement callbacks for getting VFs GUID attributes
Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW
for VFs attributes and return node and port GUIDs.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-22 18:17:24 +02:00
Mark Zhang
c16339b69c IB/mlx5: Support extended number of strides for Striding RQ
Extends the minimum single WQE strides from 64 to 8, which is exposed
by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps.
Choose right number of strides based on FW capability.

Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19 16:05:41 -04:00
Michael Guralnik
94de879c28 IB/mlx5: Load profile according to RoCE enablement state
When RoCE is disabled load mlx5_ib in raw_eth profile.
Clean pf_profile roce capability checks as it will not be used without
roce capability.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-11 12:15:29 -08:00
Michael Guralnik
b5a498baf9 IB/mlx5: Rename profile and init methods
Rename uplink_rep_profile and its unique init and cleanup stages to
suit its upcoming use as the profile when RoCE is disabled.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-11 12:15:29 -08:00
Michal Kalderon
c043ff2cfb RDMA: Connect between the mmap entry and the umap_priv structure
The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.

However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped.  The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.

Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before.  Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.

Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Greg Kroah-Hartman
09b0965ee8 IB: mlx5: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-04 08:38:58 +01:00
Parav Pandit
e53a9d26cf IB/mlx5: Introduce and use mlx5_core_is_vf()
Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().

This enables to clearly identify PF, VF and non virtual functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:40:27 -07:00
Michael Guralnik
11f552e217 IB/mlx5: Test write combining support
Linux can run in all sorts of physical machines and VMs where write
combining may or may not be supported. Currently there is no way to
reliably tell if the system supports WC, or not. The driver uses WC to
optimize posting work to the HCA, and getting this wrong in either
direction can cause a significant performance loss.

Add a test in mlx5_ib initialization process to test whether
write-combining is supported on the machine.  The test will run as part of
the enable_driver callback to ensure that the test runs after the device
is setup and can create and modify the QP needed, but runs before the
device is exposed to the users.

The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame
is different from the WQE in memory, requesting CQE only on the BlueFlame
WQE. By checking whether we received a completion on one of these WQEs we
can know if BlueFlame succeeded and this write-combining must be
supported.

Change reporting of BlueFlame support to be dependent on write-combining
support instead of the FW's guess as to what the machine can do.

Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-31 15:52:51 -03:00
Jason Gunthorpe
bb3dba3300 Merge branch 'odp_rework' into rdma.git for-next
Jason Gunthorpe says:

====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.

This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.

The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.

This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================

Merge conflicts with the odp statistics patch resolved.

* branch 'odp_rework':
  RDMA/odp: Remove broken debugging call to invalidate_range
  RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
  RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
  RDMA/mlx5: Rework implicit ODP destroy
  RDMA/mlx5: Avoid double lookups on the pagefault path
  RDMA/mlx5: Reduce locking in implicit_mr_get_data()
  RDMA/mlx5: Use an xarray for the children of an implicit ODP
  RDMA/mlx5: Split implicit handling from pagefault_mr
  RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
  RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
  RDMA/mlx5: Rework implicit_mr_get_data
  RDMA/mlx5: Delete struct mlx5_priv->mkey_table
  RDMA/mlx5: Use a dedicated mkey xarray for ODP
  RDMA/mlx5: Split sig_err MR data into its own xarray
  RDMA/mlx5: Use SRCU properly in ODP prefetch

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:47:52 -03:00
Jason Gunthorpe
5256edcb98 RDMA/mlx5: Rework implicit ODP destroy
Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.

This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.

For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.

Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
806b101b2b RDMA/mlx5: Use a dedicated mkey xarray for ODP
There is a per device xarray storing mkeys that is used to store every
mkey in the system. However, this xarray is now only read by ODP for
certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).

Create an xarray only for use by ODP, that only contains ODP related
MKeys. This xarray is protected by SRCU and all erases are protected by a
synchronize.

This improves performance:

 - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
   can be deleted. The xarray will also consume fewer nodes.

 - normal MR's are never mixed with ODP MRs in a SRCU data structure so
   performance sucking synchronize_srcu() on every MR destruction is not
   needed.

 - No smp_load_acquire(live) and xa_load() double barrier on read

Due to the SRCU locking scheme care must be taken with the placement of
the xa_store(). Once it completes the MR is immediately visible to other
threads and only through a xa_erase() & synchronize_srcu() cycle could it
be destroyed.

Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
50211ec944 RDMA/mlx5: Split sig_err MR data into its own xarray
The locking model for signature is completely different than ODP, do not
share the same xarray that relies on SRCU locking to support ODP.

Simply store the active mlx5_core_sig_ctx's in an xarray when signature
MRs are created and rely on trivial xarray locking to serialize
everything.

The overhead of storing only a handful of SIG related MRs is going to be
much less than an xarray full of every mkey.

Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Ran Rozenstein
68abaa765e IB/mlx5: Remove dead code
mlx5_ib_dc_atomic_is_supported function is not used anywhere. Remove the
dead code.

Fixes: a60109dc9a ("IB/mlx5: Add support for extended atomic operations")
Link: https://lore.kernel.org/r/20191020064454.8551-1-leon@kernel.org
Signed-off-by: Ran Rozenstein <ranro@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:37:48 -03:00
Erez Alfasi
4061ff7aa3 RDMA/nldev: Provide MR statistics
Add RDMA nldev netlink interface for dumping MR statistics information.

Output example:

$ ./ibv_rc_pingpong -o -P -s 500000000
  local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::

$ rdma stat show mr
dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0

Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:33:31 -03:00
Erez Alfasi
e1b95ae0b0 RDMA/mlx5: Return ODP type per MR
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show
mr' dump.

For example:

$ rdma -dd resource show mr
dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000
pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit

For non-ODP MRs, we won't print "drv_odp ..." at all.

Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Jason Gunthorpe
a2aca4d7f0 Merge branch 'mlx5-rd-sgl' into rdma.git for-next
From Yamin Friedman:

====================
This series from Yamin implements long standing "TODO" existed in rw.c. It
allows the driver to specify a cut-over point where it is faster to build
a lkey MR rather than do a large SGL for RDMA READ operations.

mlx5 HW gets a notable performane boost by switching to MRs.
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'mlx5-rd-sgl': (3 commits)
  RDMA/mlx5: Add capability for max sge to get optimized performance
  RDMA/rw: Support threshold for registration vs scattering to local pages
  net/mlx5: Expose optimal performance scatter entries capability
2019-10-22 14:27:25 -03:00
Yamin Friedman
366090564b RDMA/mlx5: Add capability for max sge to get optimized performance
Allows the IB device to provide a value of maximum scatter gather entries
per RDMA READ.

In certain cases it may be preferable for a device to perform UMR memory
registration rather than have many scatter entries in a single RDMA READ.
This provides a significant performance increase in devices capable of
using different memory registration schemes based on the number of scatter
gather entries. This general capability allows each device vendor to fine
tune when it is better to use memory registration.

Link: https://lore.kernel.org/r/20191007135933.12483-4-leon@kernel.org
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:27:00 -03:00
Erez Alfasi
6f26b2ac69 IB/mlx5: Remove unnecessary else statement
'else' is not generally useful after a break or return. Remove this
unnecessary statement.

Link: https://lore.kernel.org/r/20191002122517.17721-4-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:38:16 -03:00
Linus Torvalds
018c6837f3 RDMA subsystem updates for 5.4
This cycle mainly saw lots of bug fixes and clean up code across the core
 code and several drivers, few new functional changes were made.
 
 - Many cleanup and bug fixes for hns
 
 - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
   bnxt_re, efa
 
 - Share the query_port code between all the iWarp drivers
 
 - General rework and cleanup of the ODP MR umem code to fit better with
   the mmu notifier get/put scheme
 
 - Support rdma netlink in non init_net name spaces
 
 - mlx5 support for XRC devx and DC ODP
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl2A1ugACgkQOG33FX4g
 mxp+EQ//Ug8CyyDs40SGZztItoGghuQ4TVA2pXtuZ9LkFVJRWhYPJGOadHIYXqGO
 KQJJpZPQ02HTZUPWBZNKmD5bwHfErm4cS73/mVmUpximnUqu/UsLRJp8SIGmBg1D
 W1Lz1BJX24MdV8aUnChYvdL5Hbl52q+BaE99Z0haOvW7I3YnKJC34mR8m/A5MiRf
 rsNIZNPHdq2U6pKLgCbOyXib8yBcJQqBb8x4WNAoB1h4MOx+ir5QLfr3ozrZs1an
 xXgqyiOBmtkUgCMIpXC4juPN/6gw3Y5nkk2VIWY+MAY1a7jZPbI+6LJZZ1Uk8R44
 Lf2KSzabFMMYGKJYE1Znxk+JWV8iE+m+n6wWEfRM9l0b4gXXbrtKgaouFbsLcsQA
 CvBEQuiFSO9Kq01JPaAN1XDmhqyTarG6lHjXnW7ifNlLXnPbR1RJlprycExOhp0c
 axum5K2fRNW2+uZJt+zynMjk2kyjT1lnlsr1Rbgc4Pyionaiydg7zwpiac7y/bdS
 F7/WqdmPiff78KMi187EF5pjFqMWhthvBtTbWDuuxaxc2nrXSdiCUN+96j1amQUH
 yU/7AZzlnKeKEQQCR4xddsSs2eTrXiLLFRLk9GCK2eh4cUN212eHTrPLKkQf1cN+
 ydYbR2pxw3B38LCCNBy+bL+u7e/Tyehs4ynATMpBuEgc5iocTwE=
 =zHXW
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull RDMA subsystem updates from Jason Gunthorpe:
 "This cycle mainly saw lots of bug fixes and clean up code across the
  core code and several drivers, few new functional changes were made.

   - Many cleanup and bug fixes for hns

   - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
     bnxt_re, efa

   - Share the query_port code between all the iWarp drivers

   - General rework and cleanup of the ODP MR umem code to fit better
     with the mmu notifier get/put scheme

   - Support rdma netlink in non init_net name spaces

   - mlx5 support for XRC devx and DC ODP"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
  RDMA: Fix double-free in srq creation error flow
  RDMA/efa: Fix incorrect error print
  IB/mlx5: Free mpi in mp_slave mode
  IB/mlx5: Use the original address for the page during free_pages
  RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
  RDMA/hns: Package operations of rq inline buffer into separate functions
  RDMA/hns: Optimize cmd init and mode selection for hip08
  IB/hfi1: Define variables as unsigned long to fix KASAN warning
  IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
  IB/hfi1: Add traces for TID RDMA READ
  RDMA/siw: Relax from kmap_atomic() use in TX path
  IB/iser: Support up to 16MB data transfer in a single command
  RDMA/siw: Fix page address mapping in TX path
  RDMA: Fix goto target to release the allocated memory
  RDMA/usnic: Avoid overly large buffers on stack
  RDMA/odp: Add missing cast for 32 bit
  RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
  Documentation/infiniband: update name of some functions
  RDMA/cma: Fix false error message
  RDMA/hns: Fix wrong assignment of qp_access_flags
  ...
2019-09-21 10:26:24 -07:00
Linus Torvalds
84da111de0 hmm related patches for 5.4
This is more cleanup and consolidation of the hmm APIs and the very
 strongly related mmu_notifier interfaces. Many places across the tree
 using these interfaces are touched in the process. Beyond that a cleanup
 to the page walker API and a few memremap related changes round out the
 series:
 
 - General improvement of hmm_range_fault() and related APIs, more
   documentation, bug fixes from testing, API simplification &
   consolidation, and unused API removal
 
 - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and
   make them internal kconfig selects
 
 - Hoist a lot of code related to mmu notifier attachment out of drivers by
   using a refcount get/put attachment idiom and remove the convoluted
   mmu_notifier_unregister_no_release() and related APIs.
 
 - General API improvement for the migrate_vma API and revision of its only
   user in nouveau
 
 - Annotate mmu_notifiers with lockdep and sleeping region debugging
 
 Two series unrelated to HMM or mmu_notifiers came along due to
 dependencies:
 
 - Allow pagemap's memremap_pages family of APIs to work without providing
   a struct device
 
 - Make walk_page_range() and related use a constant structure for function
   pointers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl1/nnkACgkQOG33FX4g
 mxqaRg//c6FqowV1pQlLutvAOAgMdpzfZ9eaaDKngy9RVQxz+k/MmJrdRH/p/mMA
 Pq93A1XfwtraGKErHegFXGEDk4XhOustVAVFwvjyXO41dTUdoFVUkti6ftbrl/rS
 6CT+X90jlvrwdRY7QBeuo7lxx7z8Qkqbk1O1kc1IOracjKfNJS+y6LTamy6weM3g
 tIMHI65PkxpRzN36DV9uCN5dMwFzJ73DWHp1b0acnDIigkl6u5zp6orAJVWRjyQX
 nmEd3/IOvdxaubAoAvboNS5CyVb4yS9xshWWMbH6AulKJv3Glca1Aa7QuSpBoN8v
 wy4c9+umzqRgzgUJUe1xwN9P49oBNhJpgBSu8MUlgBA4IOc3rDl/Tw0b5KCFVfkH
 yHkp8n6MP8VsRrzXTC6Kx0vdjIkAO8SUeylVJczAcVSyHIo6/JUJCVDeFLSTVymh
 EGWJ7zX2iRhUbssJ6/izQTTQyCH3YIyZ5QtqByWuX2U7ZrfkqS3/EnBW1Q+j+gPF
 Z2yW8iT6k0iENw6s8psE9czexuywa/Lttz94IyNlOQ8rJTiQqB9wLaAvg9hvUk7a
 kuspL+JGIZkrL3ouCeO/VA6xnaP+Q7nR8geWBRb8zKGHmtWrb5Gwmt6t+vTnCC2l
 olIDebrnnxwfBQhEJ5219W+M1pBpjiTpqK/UdBd92A4+sOOhOD0=
 =FRGg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull hmm updates from Jason Gunthorpe:
 "This is more cleanup and consolidation of the hmm APIs and the very
  strongly related mmu_notifier interfaces. Many places across the tree
  using these interfaces are touched in the process. Beyond that a
  cleanup to the page walker API and a few memremap related changes
  round out the series:

   - General improvement of hmm_range_fault() and related APIs, more
     documentation, bug fixes from testing, API simplification &
     consolidation, and unused API removal

   - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
     and make them internal kconfig selects

   - Hoist a lot of code related to mmu notifier attachment out of
     drivers by using a refcount get/put attachment idiom and remove the
     convoluted mmu_notifier_unregister_no_release() and related APIs.

   - General API improvement for the migrate_vma API and revision of its
     only user in nouveau

   - Annotate mmu_notifiers with lockdep and sleeping region debugging

  Two series unrelated to HMM or mmu_notifiers came along due to
  dependencies:

   - Allow pagemap's memremap_pages family of APIs to work without
     providing a struct device

   - Make walk_page_range() and related use a constant structure for
     function pointers"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
  libnvdimm: Enable unit test infrastructure compile checks
  mm, notifier: Catch sleeping/blocking for !blockable
  kernel.h: Add non_block_start/end()
  drm/radeon: guard against calling an unpaired radeon_mn_unregister()
  csky: add missing brackets in a macro for tlb.h
  pagewalk: use lockdep_assert_held for locking validation
  pagewalk: separate function pointers from iterator data
  mm: split out a new pagewalk.h header from mm.h
  mm/mmu_notifiers: annotate with might_sleep()
  mm/mmu_notifiers: prime lockdep
  mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
  mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
  mm/hmm: hmm_range_fault() infinite loop
  mm/hmm: hmm_range_fault() NULL pointer bug
  mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
  mm/mmu_notifiers: remove unregister_no_release
  RDMA/odp: remove ib_ucontext from ib_umem
  RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
  RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
  RDMA/mlx5: Use ib_umem_start instead of umem.address
  ...
2019-09-21 10:07:42 -07:00
Danit Goldberg
5d44adebbb IB/mlx5: Free mpi in mp_slave mode
ib_add_slave_port() allocates a multiport struct but never frees it.
Don't leak memory, free the allocated mpi struct during driver unload.

Cc: <stable@vger.kernel.org>
Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Link: https://lore.kernel.org/r/20190916064818.19823-3-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 14:08:18 -03:00