Introduce packet pacing uobject and its alloc and destroy
methods.
This uobject holds mlx5 packet pacing context according to the device
specification and enables managing packet pacing device entries that are
needed by DEVX applications.
Link: https://lore.kernel.org/r/20200219190518.200912-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add two debugfs parameters described below.
np_min_time_between_cnps - Minimum time between sending CNPs from the
port.
Unit = microseconds.
Default = 0 (no min wait time; generated
based on incoming ECN marked packets).
rp_max_rate - Maximum rate at which reaction point node can transmit.
Once this limit is reached, RP is no longer rate limited.
Unit = Mbits/sec
Default = 0 (full speed)
Link: https://lore.kernel.org/r/20200227125246.99472-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Following race may occur because of the call_srcu and the placement of
the synchronize_srcu vs the xa_erase.
CPU0 CPU1
mlx5_ib_free_implicit_mr: destroy_unused_implicit_child_mr:
xa_erase(odp_mkeys)
synchronize_srcu()
xa_lock(implicit_children)
if (still in xarray)
atomic_inc()
call_srcu()
xa_unlock(implicit_children)
xa_erase(implicit_children):
xa_lock(implicit_children)
__xa_erase()
xa_unlock(implicit_children)
flush_workqueue()
[..]
free_implicit_child_mr_rcu:
(via call_srcu)
queue_work()
WARN_ON(atomic_read())
[..]
free_implicit_child_mr_work:
(via wq)
free_implicit_child_mr()
mlx5_mr_cache_invalidate()
mlx5_ib_update_xlt() <-- UMR QP fail
atomic_dec()
The wait_event() solves the race because it blocks until
free_implicit_child_mr_work() completes.
Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200227113918.94432-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Relaxed ordering is not supported in UMR so we are disabling UMR usage
when user passes relaxed ordering access flag.
Enable using UMR when user requested relaxed ordering but there are no
relaxed ordering capabilities.
This will prevent user from unnecessarily registering a new mkey.
Fixes: d6de0bb185 ("RDMA/mlx5: Set relaxed ordering when requested")
Link: https://lore.kernel.org/r/20200227113834.94233-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates
the following error.
on x86_64:
ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC':
main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file'
ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class'
ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler'
This is happening because some parts of the UAPI description are not
static. This is a hold over from earlier code that relied on struct
pointers to refer to object types, now object types are referenced by
number. Remove the unused globals and add statics to the remaining UAPI
description elements.
Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete
mlx5_ib_get_devx_tree().
The compiler now trims alot more unused code, including the above
problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS.
Fixes: 7be76bef32 ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.
The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.
The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.
The third part is actual user of those in-kernel APIs.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
+UcZb31auy5P8ueJYokRLhLAyRcOIAg=
=yaHb
-----END PGP SIGNATURE-----
Merge tag 'rds-odp-for-5.5' into rdma.git for-next
From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma
Leon Romanovsky says:
====================
Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.
The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.
The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.
The third part is actual user of those in-kernel APIs.
====================
* tag 'rds-odp-for-5.5':
net/rds: Use prefetch for On-Demand-Paging MR
net/rds: Handle ODP mr registration/unregistration
net/rds: Detect need of On-Demand-Paging memory registration
RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
RDMA/mlx5: Don't fake udata for kernel path
IB/mlx5: Add ODP WQE handlers for kernel QPs
IB/core: Add interface to advise_mr for kernel users
IB/core: Introduce ib_reg_user_mr
IB: Allow calls to ib_umem_get from kernel ULPs
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Enable relaxed ordering in the mkey context when requested. As relaxed
ordering is not currently supported in UMR, disable UMR usage for relaxed
ordering MRs.
Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.com
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
One of the steps in ODP page fault handler for WQEs is to read a WQE
from a QP send queue or receive queue buffer at a specific index.
Since the implementation of this buffer is different between kernel and
user QP the implementation of the handler needs to be aware of that and
handle it in a different way.
ODP for kernel MRs is currently supported only for RDMA_READ
and RDMA_WRITE operations so change the handler to
- read a WQE from a kernel QP send queue
- fail if access to receive queue or shared receive queue is
required for a kernel QP
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Introduce VAR object and its alloc/destroy KABI methods. The internal
implementation uses the IB core API to manage mmap/munamp calls.
Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Building MR translation table in the ODP case requires additional
flexibility, namely random access to DMA addresses. Make both direct and
indirect ODP MR use same code path, separated from the non-ODP MR code
path.
With the restructuring the correct page_shift is now used around
__mlx5_ib_populate_pas().
Fixes: d2183c6f19 ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem")
Link: https://lore.kernel.org/r/20191222124649.52300-2-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix device memory flows so that only once there will be no live mmaped
VA to a given allocation the matching object will be destroyed.
This prevents a potential scenario that existing VA that was mmaped by
one process might still be used post its deallocation despite that it's
owned now by other process.
The above is achieved by integrating with IB core APIs to manage
mmap/munmap. Only once the refcount will become 0 the DM object and its
underlay area will be freed.
Fixes: 3b113a1ec3 ("IB/mlx5: Support device memory type attribute")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212100237.330654-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
This is another round of bug fixing and cleanup. This time the focus is on
the driver pattern to use mmu notifiers to monitor a VA range. This code
is lifted out of many drivers and hmm_mirror directly into the
mmu_notifier core and written using the best ideas from all the driver
implementations.
This removes many bugs from the drivers and has a very pleasing
diffstat. More drivers can still be converted, but that is for another
cycle.
- A shared branch with RDMA reworking the RDMA ODP implementation
- New mmu_interval_notifier API. This is focused on the use case of
monitoring a VA and simplifies the process for drivers
- A common seq-count locking scheme built into the mmu_interval_notifier
API usable by drivers that call get_user_pages() or hmm_range_fault()
with the VA range
- Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev
drivers to the new API. This deletes a lot of wonky driver code.
- Two improvements for hmm_range_fault(), from testing done by Ralph
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl3cCjQACgkQOG33FX4g
mxpp8xAAiR9iOdT28m/tx1GF31XludrMhRZVIiz0vmCIxIiAkWekWEfAEVm9PDnh
wdrxTJohSs+B65AK3sfToOM3AIuNCuFVWmbbHI5qmOO76vaSvcZa905Z++pNsawO
Bn8mgRCprYoFHcxWLvTvnA5U0g1S2BSSOwBSZI43CbEnVvHjYAR6MnvRqfGMk+NF
bf8fTk/x+fl0DCemhynlBLuJkogzoE2Hgl0yPY5bFna4PktOxdpa1yPaQsiqZ7e6
2s2NtM3pbMBJk0W42q5BU+aPhiqfxFFszasPSLBduXrD2xDsG76HJdHj5VydKmfL
nelG4BvqJozXTEZWvTEePYhCqaZ41eJZ7Asw8BXtmacVqE5mDlTXo/Zdgbz7yEOR
mI5MVyjD5rauZJldUOWXbwrPoWVFRvboauehiSgqvxvT9HvlFp9GKObSuu4gubBQ
mzxs4t48tPhA7bswLmw0/pETSogFuVDfaB7hsyY0gi8EwxMFMpw2qFypm1PEEF+C
BuUxCSShzvNKrraNe5PWaNNFd3AzIwAOWJHE+poH4bCoXQVr5nA+rq2gnHkdY5vq
/xrBCyxkf0U05YoFGYembPVCInMehzp9Xjy8V+SueSvCg2/TYwGDCgGfsbe9dNOP
Bc40JpS7BDn5w9nyLUJmOx7jfruNV6kx1QslA7NDDrB/rzOlsEc=
=Hj8a
-----END PGP SIGNATURE-----
Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
"This is another round of bug fixing and cleanup. This time the focus
is on the driver pattern to use mmu notifiers to monitor a VA range.
This code is lifted out of many drivers and hmm_mirror directly into
the mmu_notifier core and written using the best ideas from all the
driver implementations.
This removes many bugs from the drivers and has a very pleasing
diffstat. More drivers can still be converted, but that is for another
cycle.
- A shared branch with RDMA reworking the RDMA ODP implementation
- New mmu_interval_notifier API. This is focused on the use case of
monitoring a VA and simplifies the process for drivers
- A common seq-count locking scheme built into the
mmu_interval_notifier API usable by drivers that call
get_user_pages() or hmm_range_fault() with the VA range
- Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen
GntDev drivers to the new API. This deletes a lot of wonky driver
code.
- Two improvements for hmm_range_fault(), from testing done by Ralph"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap
mm/hmm: make full use of walk_page_range()
xen/gntdev: use mmu_interval_notifier_insert
mm/hmm: remove hmm_mirror and related
drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror
drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror
drm/amdgpu: Call find_vma under mmap_sem
nouveau: use mmu_interval_notifier instead of hmm_mirror
nouveau: use mmu_notifier directly for invalidate_range_start
drm/radeon: use mmu_interval_notifier_insert
RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv
RDMA/odp: Use mmu_interval_notifier_insert()
mm/hmm: define the pre-processor related parts of hmm.h even if disabled
mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror
mm/mmu_notifier: add an interval tree notifier
mm/mmu_notifier: define the header pre-processor parts even if disabled
mm/hmm: allow snapshot of the special zero page
Here is the "big" set of driver core patches for 5.5-rc1
There's a few minor cleanups and fixes in here, but the majority of the
patches in here fall into two buckets:
- debugfs api cleanups and fixes
- driver core device link support for boot dependancy issues
The debugfs api cleanups are working to slowly refactor the debugfs apis
so that it is even harder to use incorrectly. That work has been
happening for the past few kernel releases and will continue over time,
it's a long-term project/goal
The driver core device link support missed 5.4 by just a bit, so it's
been sitting and baking for many months now. It's from Saravana Kannan
to help resolve the problems that DT-based systems have at boot time
with dependancy graphs and kernel modules. Turns out that no one has
actually tried to build a generic arm64 kernel with loads of modules and
have it "just work" for a variety of platforms (like a distro kernel)
The big problem turned out to be a lack of depandancy information
between different areas of DT entries, and the work here resolves that
problem and now allows devices to boot properly, and quicker than a
monolith kernel.
All of these patches have been in linux-next for a long time with no
reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXd6m6Q8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yntJQCcCqg6RQ7LTdHuZv1ETeefXlsfk00An1Jtean6
42bWGx52bGFvAcpjWy8R
=P7hq
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of driver core patches for 5.5-rc1
There's a few minor cleanups and fixes in here, but the majority of
the patches in here fall into two buckets:
- debugfs api cleanups and fixes
- driver core device link support for boot dependancy issues
The debugfs api cleanups are working to slowly refactor the debugfs
apis so that it is even harder to use incorrectly. That work has been
happening for the past few kernel releases and will continue over
time, it's a long-term project/goal
The driver core device link support missed 5.4 by just a bit, so it's
been sitting and baking for many months now. It's from Saravana Kannan
to help resolve the problems that DT-based systems have at boot time
with dependancy graphs and kernel modules. Turns out that no one has
actually tried to build a generic arm64 kernel with loads of modules
and have it "just work" for a variety of platforms (like a distro
kernel). The big problem turned out to be a lack of dependency
information between different areas of DT entries, and the work here
resolves that problem and now allows devices to boot properly, and
quicker than a monolith kernel.
All of these patches have been in linux-next for a long time with no
reported issues"
* tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits)
tracing: Remove unnecessary DEBUG_FS dependency
of: property: Add device link support for interrupt-parent, dmas and -gpio(s)
debugfs: Fix !DEBUG_FS debugfs_create_automount
of: property: Add device link support for "iommu-map"
of: property: Fix the semantics of of_is_ancestor_of()
i2c: of: Populate fwnode in of_i2c_get_board_info()
drivers: base: Fix Kconfig indentation
firmware_loader: Fix labels with comma for builtin firmware
driver core: Allow device link operations inside sync_state()
driver core: platform: Declare ret variable only once
cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h>
crypto: hisilicon: no need to check return value of debugfs_create functions
driver core: platform: use the correct callback type for bus_find_device
firmware_class: make firmware caching configurable
driver core: Clarify documentation for fwnode_operations.add_links()
mailbox: tegra: Fix superfluous IRQ error message
net: caif: Fix debugfs on 64-bit platforms
mac80211: Use debugfs_create_xul() helper
media: c8sectpfe: no need to check return value of debugfs_create functions
of: property: Add device link support for iommus, mboxes and io-channels
...
Danit Goldberg says:
====================
This series extends RTNETLINK to provide IB port and node GUIDs, which
were configured for Infiniband VFs.
The functionality to set VF GUIDs already existed for a long time, and
here we are adding the missing "get" so that netlink will be symmetric and
various cloud orchestration tools will be able to manage such VFs more
naturally.
The iproute2 was extended too to present those GUIDs.
- ip link show <device>
For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
====================
Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies
* branch 'ib-guids': (35 commits)
IB/mlx5: Implement callbacks for getting VFs GUID attributes
IB/ipoib: Add ndo operation for getting VFs GUID attributes
IB/core: Add interfaces to get VF node and port GUIDs
net/core: Add support for getting VF GUIDs
net/mlx5: Add new chain for netfilter flow table offload
net/mlx5: Refactor creating fast path prio chains
net/mlx5: Accumulate levels for chains prio namespaces
net/mlx5: Define fdb tc levels per prio
net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines
net/mlx5: Simplify fdb chain and prio eswitch defines
IB/mlx5: Load profile according to RoCE enablement state
IB/mlx5: Rename profile and init methods
net/mlx5: Handle "enable_roce" devlink param
net/mlx5: Document flow_steering_mode devlink param
devlink: Add new "enable_roce" generic device param
net/mlx5: fix spelling mistake "metdata" -> "metadata"
net/mlx5: fix kvfree of uninitialized pointer spec
IB/mlx5: Introduce and use mlx5_core_is_vf()
net/mlx5: E-switch, Enable metadata on own vport
net/mlx5: Refactor ingress acl configuration
...
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Replace the internal interval tree based mmu notifier with the new common
mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
deadlock that can be triggered in ODP:
zap_page_range()
mmu_notifier_invalidate_range_start()
[..]
ib_umem_notifier_invalidate_range_start()
down_read(&per_mm->umem_rwsem)
unmap_single_vma()
[..]
__split_huge_page_pmd()
mmu_notifier_invalidate_range_start()
[..]
ib_umem_notifier_invalidate_range_start()
down_read(&per_mm->umem_rwsem) // DEADLOCK
mmu_notifier_invalidate_range_end()
up_read(&per_mm->umem_rwsem)
mmu_notifier_invalidate_range_end()
up_read(&per_mm->umem_rwsem)
The umem_rwsem is held across the range_start/end as the ODP algorithm for
invalidate_range_end cannot tolerate changes to the interval
tree. However, due to the nested invalidation regions the second
down_read() can deadlock if there are competing writers. The new core code
provides an alternative scheme to solve this problem.
Fixes: ca748c39ea ("RDMA/umem: Get rid of per_mm->notifier_count")
Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca
Tested-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW
for VFs attributes and return node and port GUIDs.
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Extends the minimum single WQE strides from 64 to 8, which is exposed
by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps.
Choose right number of strides based on FW capability.
Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support for flow steering counters action with a non-base counter
ID (offset) for bulk counters.
When creating a flow counter object, save the bulk value. This value is
used when a flow action with a non-base counter ID is requested - to
validate that the required offset is in the range of the allocated bulk.
Link: https://lore.kernel.org/r/20191103140723.77411-1-leon@kernel.org
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
All users of process_mad() converts input pointers from ib_mad_hdr to be
ib_mad, update the function declaration to use ib_mad directly.
Also remove not used input MAD size parameter.
Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux can run in all sorts of physical machines and VMs where write
combining may or may not be supported. Currently there is no way to
reliably tell if the system supports WC, or not. The driver uses WC to
optimize posting work to the HCA, and getting this wrong in either
direction can cause a significant performance loss.
Add a test in mlx5_ib initialization process to test whether
write-combining is supported on the machine. The test will run as part of
the enable_driver callback to ensure that the test runs after the device
is setup and can create and modify the QP needed, but runs before the
device is exposed to the users.
The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame
is different from the WQE in memory, requesting CQE only on the BlueFlame
WQE. By checking whether we received a completion on one of these WQEs we
can know if BlueFlame succeeded and this write-combining must be
supported.
Change reporting of BlueFlame support to be dependent on write-combining
support instead of the FW's guess as to what the machine can do.
Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe says:
====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.
This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.
The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.
This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================
Merge conflicts with the odp statistics patch resolved.
* branch 'odp_rework':
RDMA/odp: Remove broken debugging call to invalidate_range
RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
RDMA/mlx5: Rework implicit ODP destroy
RDMA/mlx5: Avoid double lookups on the pagefault path
RDMA/mlx5: Reduce locking in implicit_mr_get_data()
RDMA/mlx5: Use an xarray for the children of an implicit ODP
RDMA/mlx5: Split implicit handling from pagefault_mr
RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
RDMA/mlx5: Rework implicit_mr_get_data
RDMA/mlx5: Delete struct mlx5_priv->mkey_table
RDMA/mlx5: Use a dedicated mkey xarray for ODP
RDMA/mlx5: Split sig_err MR data into its own xarray
RDMA/mlx5: Use SRCU properly in ODP prefetch
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.
Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.
The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.
For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.
Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.
This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.
For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.
Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently the child leaves are stored in the shared interval tree and
every lookup for a child must be done under the interval tree rwsem.
This is further complicated by dropping the rwsem during iteration (ie the
odp_lookup(), odp_next() pattern), which requires a very tricky an
difficult to understand locking scheme with SRCU.
Instead reserve the interval tree for the exclusive use of the mmu
notifier related code in umem_odp.c and give each implicit MR a xarray
containing all the child MRs.
Since the size of each child is 1GB of VA, a 1 level xarray will index 64G
of VA, and a 2 level will index 2TB, making xarray a much better
data structure choice than an interval tree.
The locking properties of xarray will be used in the next patches to
rework the implicit ODP locking scheme into something simpler.
At this point, the xarray is locked by the implicit MR's umem_mutex, and
read can also be locked by the odp_srcu.
Link: https://lore.kernel.org/r/20191009160934.3143-10-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a per device xarray storing mkeys that is used to store every
mkey in the system. However, this xarray is now only read by ODP for
certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).
Create an xarray only for use by ODP, that only contains ODP related
MKeys. This xarray is protected by SRCU and all erases are protected by a
synchronize.
This improves performance:
- All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
can be deleted. The xarray will also consume fewer nodes.
- normal MR's are never mixed with ODP MRs in a SRCU data structure so
performance sucking synchronize_srcu() on every MR destruction is not
needed.
- No smp_load_acquire(live) and xa_load() double barrier on read
Due to the SRCU locking scheme care must be taken with the placement of
the xa_store(). Once it completes the MR is immediately visible to other
threads and only through a xa_erase() & synchronize_srcu() cycle could it
be destroyed.
Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The locking model for signature is completely different than ODP, do not
share the same xarray that relies on SRCU locking to support ODP.
Simply store the active mlx5_core_sig_ctx's in an xarray when signature
MRs are created and rely on trivial xarray locking to serialize
everything.
The overhead of storing only a handful of SIG related MRs is going to be
much less than an xarray full of every mkey.
Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl210Z8eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGv+kIAKRpO7EuDokQL4qp
hxEEaCMJA1T055EMlNU6FVAq/ZbmapzreUyNYiRMpPWKGTWNMkhIcZQfysYeGZz5
y/KRxAiVxlcB+3v3yRmoZd/XoQmhgvJmqD4zhaGI2Utonow4f/SGSEFFZqqs9WND
4HJROjZHgQ4JBxg9Z+QMo0FxbV/DCZpEOUq51N9WJywyyDRb18zotE83stpU+pE2
fjqT7mk0NLrnYXuDRAbFC1Aau9ed4H6LlwLmxaqxq/Pt5Rz7wIKwKL9HIT4Dm/0a
qpani6phhHWL7MwUpa2wkEonFCD03rJFl3DUVJo64Ijh4up5D/jyXQ+GKV2P4WKJ
275Rb5Q=
=WiZZ
-----END PGP SIGNATURE-----
Merge tag 'v5.4-rc5' into rdma.git for-next
Linux 5.4-rc5
For dependencies in the next patches
Conflict resolved by keeping the delete of the unlock.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is little value in keeping separate function for one flag, provide
it directly like any other mlx5 define.
Link: https://lore.kernel.org/r/20191020064400.8344-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
mlx5_ib_dc_atomic_is_supported function is not used anywhere. Remove the
dead code.
Fixes: a60109dc9a ("IB/mlx5: Add support for extended atomic operations")
Link: https://lore.kernel.org/r/20191020064454.8551-1-leon@kernel.org
Signed-off-by: Ran Rozenstein <ranro@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show
mr' dump.
For example:
$ rdma -dd resource show mr
dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000
pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit
For non-ODP MRs, we won't print "drv_odp ..." at all.
Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce ODP diagnostic counters and count the following
per MR within IB/mlx5 driver:
1) Page faults:
Total number of faulted pages.
2) Page invalidations:
Total number of pages invalidated by the OS during all
invalidation events. The translations can be no longer
valid due to either non-present pages or mapping changes.
Link: https://lore.kernel.org/r/20191016062308.11886-2-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.
Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.
This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.
Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()
Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:
CPU0 CPU1
mlx5_ib_alloc_implicit_mr()
implicit_mr_alloc()
live = 1
imr->umem = umem
num_pending_prefetch_inc()
if (live)
atomic_inc(num_pending_prefetch)
atomic_set(num_pending_prefetch,0) // Overwrites other thread's store
Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.
Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.
Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().
The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Clean the code to store all boolean parameters inside one variable.
Link: https://lore.kernel.org/r/20191002122517.17721-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This cycle mainly saw lots of bug fixes and clean up code across the core
code and several drivers, few new functional changes were made.
- Many cleanup and bug fixes for hns
- Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
bnxt_re, efa
- Share the query_port code between all the iWarp drivers
- General rework and cleanup of the ODP MR umem code to fit better with
the mmu notifier get/put scheme
- Support rdma netlink in non init_net name spaces
- mlx5 support for XRC devx and DC ODP
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl2A1ugACgkQOG33FX4g
mxp+EQ//Ug8CyyDs40SGZztItoGghuQ4TVA2pXtuZ9LkFVJRWhYPJGOadHIYXqGO
KQJJpZPQ02HTZUPWBZNKmD5bwHfErm4cS73/mVmUpximnUqu/UsLRJp8SIGmBg1D
W1Lz1BJX24MdV8aUnChYvdL5Hbl52q+BaE99Z0haOvW7I3YnKJC34mR8m/A5MiRf
rsNIZNPHdq2U6pKLgCbOyXib8yBcJQqBb8x4WNAoB1h4MOx+ir5QLfr3ozrZs1an
xXgqyiOBmtkUgCMIpXC4juPN/6gw3Y5nkk2VIWY+MAY1a7jZPbI+6LJZZ1Uk8R44
Lf2KSzabFMMYGKJYE1Znxk+JWV8iE+m+n6wWEfRM9l0b4gXXbrtKgaouFbsLcsQA
CvBEQuiFSO9Kq01JPaAN1XDmhqyTarG6lHjXnW7ifNlLXnPbR1RJlprycExOhp0c
axum5K2fRNW2+uZJt+zynMjk2kyjT1lnlsr1Rbgc4Pyionaiydg7zwpiac7y/bdS
F7/WqdmPiff78KMi187EF5pjFqMWhthvBtTbWDuuxaxc2nrXSdiCUN+96j1amQUH
yU/7AZzlnKeKEQQCR4xddsSs2eTrXiLLFRLk9GCK2eh4cUN212eHTrPLKkQf1cN+
ydYbR2pxw3B38LCCNBy+bL+u7e/Tyehs4ynATMpBuEgc5iocTwE=
=zHXW
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull RDMA subsystem updates from Jason Gunthorpe:
"This cycle mainly saw lots of bug fixes and clean up code across the
core code and several drivers, few new functional changes were made.
- Many cleanup and bug fixes for hns
- Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
bnxt_re, efa
- Share the query_port code between all the iWarp drivers
- General rework and cleanup of the ODP MR umem code to fit better
with the mmu notifier get/put scheme
- Support rdma netlink in non init_net name spaces
- mlx5 support for XRC devx and DC ODP"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
RDMA: Fix double-free in srq creation error flow
RDMA/efa: Fix incorrect error print
IB/mlx5: Free mpi in mp_slave mode
IB/mlx5: Use the original address for the page during free_pages
RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
RDMA/hns: Package operations of rq inline buffer into separate functions
RDMA/hns: Optimize cmd init and mode selection for hip08
IB/hfi1: Define variables as unsigned long to fix KASAN warning
IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
IB/hfi1: Add traces for TID RDMA READ
RDMA/siw: Relax from kmap_atomic() use in TX path
IB/iser: Support up to 16MB data transfer in a single command
RDMA/siw: Fix page address mapping in TX path
RDMA: Fix goto target to release the allocated memory
RDMA/usnic: Avoid overly large buffers on stack
RDMA/odp: Add missing cast for 32 bit
RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
Documentation/infiniband: update name of some functions
RDMA/cma: Fix false error message
RDMA/hns: Fix wrong assignment of qp_access_flags
...
To resolve dependencies in following patches
mlx5_ib.h conflict resolved by keeing both hunks
Linux 5.3-rc8
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Merge mlx5-next patches needed for upcoming mlx5 software steering.
1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.
This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.
In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently user applications can only steer TCP/IP(NIC RX/RX) traffic.
This patch adds RDMA_RX as a new flow type to allow the user to insert
steering rules to control RDMA traffic.
Two destinations are supported(but not set at the same time): devx
flow table object and QP.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190819113626.20284-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce helper function to unify various use_umr checks.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-6-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
The fix for IB port statistics initialization ("IB/core: Fix querying
total rdma stats") is needed before we take a follow-on patch to
for-next.
Signed-off-by: Doug Ledford <dledford@redhat.com>
When parent mlx5_core_dev is in switchdev mode, q_counters are not
applicable to multiple non uplink vports.
Hence, have make them limited to device level.
While at it, correct __mlx5_ib_qp_set_counter() and
__mlx5_ib_modify_qp() to use u16 set_id as defined by the device.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190723073117.7175-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix unreg_umr to ignore the mkey state and do not fail if was freed. This
prevents a case that a user space application already changed the mkey
state to free and then the UMR operation will fail leaving the mkey in an
inappropriate state.
Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.19
Fixes: 968e78dd96 ("IB/mlx5: Enhance UMR support to allow partial page table update")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Support bind a qp with counter. If counter is null then bind the qp to the
default counter. Different QP state has different operation:
- RESET: Set the counter field so that it will take effective during
RST2INIT change;
- RTS: Issue an RTS2RTS change to update the QP counter;
- Other: Set the counter field and mark the counter_pending flag, when QP
is moved to RTS state and this flag is set, then issue an RTS2RTS
modification to update the counter.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Register DEVX with with mlx5_core to get async events. This will enable
to dispatch the applicable events to its consumers in down stream patches.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Required for dependencies in the next patches.
Resolved the conflicts:
- esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports()
version
- esw_offloads_steering_init() drop the cap test
- esw_offloads_init() drop the extra function arguments
* branch 'mlx5-next': (39 commits)
net/mlx5: Expose device definitions for object events
net/mlx5: Report EQE data upon CQ completion
net/mlx5: Report a CQ error event only when a handler was set
net/mlx5: mlx5_core_create_cq() enhancements
net/mlx5: Expose the API to register for ANY event
net/mlx5: Use event mask based on device capabilities
net/mlx5: Fix mlx5_core_destroy_cq() error flow
net/mlx5: E-Switch, Handle UC address change in switchdev mode
net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop
net/mlx5: E-Switch, Use iterator for vlan and min-inline setups
net/mlx5: E-Switch, Reg/unreg function changed event at correct stage
net/mlx5: E-Switch, Consolidate eswitch function number of VFs
net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
net/mlx5: Handle host PF vport mac/guid for ECPF
net/mlx5: E-Switch, Use correct flags when configuring vlan
net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs
net/mlx5: Don't handle VF func change if host PF is disabled
net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices
net/mlx5: Move pci status reg access mutex to mlx5_pci_init
net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type
...
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>