linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Maor Gottlieb	982b7c140e	RDMA/mlx5: Fix type assignment for ICM DM We should hold the UAPI DM type in the base struct and not the internal mlx5 type. Fixes: `251b9d7887` ("RDMA/mlx5: Re-organize the DM code") Link: https://lore.kernel.org/r/58dedbd5c132660f808e59166d434e2eaa6ecf7a.1618753425.git.leonro@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-20 09:42:38 -03:00
Parav Pandit	dedbc2d358	IB/mlx5: Set right RoCE l3 type and roce version while deleting GID Currently when GID is deleted, it zero out all the fields of the RoCE address in the SET_ROCE_ADDRESS command for a specified index. roce_version = 0 means RoCEv1 in the SET_ROCE_ADDRESS command. This assumes that device has RoCEv1 always enabled which is not always correct. For example Subfunction does not support RoCEv1. Due to this assumption a previously added RoCEv2 GID is always deleted as RoCEv1 GID. This results in a below syndrome: mlx5_core.sf mlx5_core.sf.4: mlx5_cmd_check:777:(pid 4256): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x12822d) Hence set the right RoCE version during GID deletion provided by the core. Link: https://lore.kernel.org/r/d3f54129c90ca329caf438dbe31875d8ad08d91a.1618753425.git.leonro@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-20 09:41:10 -03:00
Jason Gunthorpe	fe73f96e7b	Merge branch 'mlx5_memic_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Maor Gottlieb says: ==================== This series from Maor extends MEMIC to support atomic operations from the host in addition to already supported regular read/write. ==================== * 'memic_ops': RDMA/mlx5: Expose UAPI to query DM RDMA/mlx5: Add support in MEMIC operations RDMA/mlx5: Add support to MODIFY_MEMIC command RDMA/mlx5: Re-organize the DM code RDMA/mlx5: Move all DM logic to separate file RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number net/mlx5: Add MEMIC operations related bits	2021-04-13 19:37:17 -03:00
Maor Gottlieb	18731642d4	RDMA/mlx5: Expose UAPI to query DM Expose UAPI to query MEMIC DM, this will let user space application that didn't allocate the DM but has access to by owning the matching command FD to retrieve its information. Link: https://lore.kernel.org/r/20210411122924.60230-8-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-13 19:36:37 -03:00
Maor Gottlieb	cea85fa5db	RDMA/mlx5: Add support in MEMIC operations MEMIC buffer, in addition to regular read and write operations, can support atomic operations from the host. Introduce and implement new UAPI to allocate address space for MEMIC operations such as atomic. This includes: 1. Expose new IOCTL for request mapping of MEMIC operation. 2. Hold the operations address in a list, so same operation to same DM will be allocated only once. 3. Manage refcount on the mlx5_ib_dm object, so it would be keep valid until all addresses were unmapped. Link: https://lore.kernel.org/r/20210411122924.60230-7-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-13 19:36:36 -03:00
Maor Gottlieb	39cc792ff2	RDMA/mlx5: Add support to MODIFY_MEMIC command Add two functions to allocate and deallocate MEMIC operations by using the MODIFY_MEMIC command. Link: https://lore.kernel.org/r/20210411122924.60230-6-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-13 19:36:36 -03:00
Maor Gottlieb	251b9d7887	RDMA/mlx5: Re-organize the DM code 1. Inline the checks from check_dm_type_support() into their respective allocation functions. 2. Fix use after free when driver fails to copy the MEMIC address to the user by moving the allocation code into their respective functions, hence we avoid the explicit call to free the DM in the error flow. 3. Split mlx5_ib_dm struct to memic and icm proper typesafety throughout. Fixes: `dc2316eba7` ("IB/mlx5: Fix device memory flows") Link: https://lore.kernel.org/r/20210411122924.60230-5-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-13 19:36:36 -03:00
Maor Gottlieb	831df88381	RDMA/mlx5: Move all DM logic to separate file Move all device memory related code to a separate file. Link: https://lore.kernel.org/r/20210411122924.60230-4-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-13 19:36:36 -03:00
Jason Gunthorpe	a0354d2308	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== This pr contains changes from mlx5-next branch, already reviewed on netdev and rdma mailing lists, links below. 1) From Leon, Dynamically assign MSI-X vectors count Already Acked by Bjorn Helgaas. https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/ 2) Cleanup series: https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/ From Mark, E-Switch cleanups and refactoring, and the addition of single FDB mode needed HW bits. From Mikhael, Remove unused struct field From Saeed, Cleanup W=1 prototype warning From Zheng, Esw related cleanup From Tariq, User order-0 page allocation for EQs ==================== * mlx5-next: net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks net/mlx5: Dynamically assign MSI-X vectors count net/mlx5: Add dynamic MSI-X capabilities bits PCI/IOV: Add sysfs MSI-X vector assignment interface net/mlx5: Use order-0 allocations for EQs net/mlx5: Add IFC bits needed for single FDB mode net/mlx5: E-Switch, Refactor send to vport to be more generic RDMA/mlx5: Use representor E-Switch when getting netdev and metadata net/mlx5: E-Switch, Add eswitch pointer to each representor net/mlx5: E-Switch, Add match on vhca id to default send rules net/mlx5: Remove unused mlx5_core_health member recover_work net/mlx5: simplify the return expression of mlx5_esw_offloads_pair() net/mlx5: Cleanup prototype warning Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 13:49:48 -03:00
Jakub Kicinski	95b5c29132	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next 2021-04-09 This pr contains changes from mlx5-next branch, already reviewed on netdev and rdma mailing lists, links below. 1) From Leon, Dynamically assign MSI-X vectors count Already Acked by Bjorn Helgaas. https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/ 2) Cleanup series: https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/ From Mark, E-Switch cleanups and refactoring, and the addition of single FDB mode needed HW bits. From Mikhael, Remove unused struct field From Saeed, Cleanup W=1 prototype warning From Zheng, Esw related cleanup From Tariq, User order-0 page allocation for EQs * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks net/mlx5: Dynamically assign MSI-X vectors count net/mlx5: Add dynamic MSI-X capabilities bits PCI/IOV: Add sysfs MSI-X vector assignment interface net/mlx5: Use order-0 allocations for EQs net/mlx5: Add IFC bits needed for single FDB mode net/mlx5: E-Switch, Refactor send to vport to be more generic RDMA/mlx5: Use representor E-Switch when getting netdev and metadata net/mlx5: E-Switch, Add eswitch pointer to each representor net/mlx5: E-Switch, Add match on vhca id to default send rules net/mlx5: Remove unused mlx5_core_health member recover_work net/mlx5: simplify the return expression of mlx5_esw_offloads_pair() net/mlx5: Cleanup prototype warning ==================== Link: https://lore.kernel.org/r/20210409200704.10886-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-04-09 18:07:21 -07:00
Praveen Kumar Kannoju	7e111bbff9	IB/mlx5: Reduce max order of memory allocated for xlt update To update xlt (during mlx5_ib_reg_user_mr()), the driver can request up to 1 MB (order-8) memory, depending on the size of the MR. This costly allocation can sometimes take very long to return (a few seconds). This causes the calling application to hang for a long time, especially when the system is fragmented. To avoid these long latency spikes, the calls the higher order allocations need to fail faster in case they are not available. In order to acheive this we need __GFP_NORETRY flag in the gfp_mask before during fetching the free pages. Allow the algorithm to automatically fall back to smaller page sizes. Link: https://lore.kernel.org/r/1617425635-35631-1-git-send-email-praveen.kannoju@oracle.com Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-07 20:25:17 -03:00
Shay Drory	e5dc370bd9	RDMA/mlx5: Set ODP caps only if device profile support ODP Currently, ODP caps are set during the init stage of mlx5_ib_dev, regardless of whether the device profile supports ODP or not. There is no point in setting ODP caps if the device profile doesn't support ODP. Hence, move setting the ODP caps to the odp_init stage. Link: https://lore.kernel.org/r/20210318135259.681264-1-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-26 11:57:38 -03:00
Maor Gottlieb	c73700806d	RDMA/mlx5: Fix drop packet rule in egress table Initial drop action support missed that drop action can be added to egress flow tables as well. Add the missing support. This requires making sure that dest_type isn't set to PORT which in turn exposes a possibility of passing dst while indicating number of dsts as zero. Explicitly check for number of dsts and pass the appropriate pointer. Fixes: `f29de9eee7` ("RDMA/mlx5: Add support for drop action in DV steering") Link: https://lore.kernel.org/r/20210318135123.680759-1-leon@kernel.org Reviewed-by: Mark Bloch <markb@nvidia.com> Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-26 11:52:47 -03:00
Mark Bloch	1fb7f8973f	RDMA: Support more than 255 rdma ports Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic. This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed. With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue. When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported. The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged. While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt), Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-26 09:31:21 -03:00
Linus Torvalds	2ba9bea2d3	RDMA 5.12 second rc pull request - Typo causing a regression in mlx5 devx - Regression in the recent hns rework causing the HW to get out of sync - Longstanding cxgb4 adaptor crash when destroying cm ids -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmBcxe0ACgkQOG33FX4g mxruFQ/+JMfHtWowI9l32N+SI93zWAQVN6bcO5XiUeziCLEsktU2dh5Lu8ZWWVmB BX1US24oBuGKs+YNx1ayshlgQj7EgojnGZODGtL4O157IvvgMrm4gFN84EoN3gMA 8QzPVgZrrvyjDc8tSEINAW5crkpmhqdeg6XYM9BTUrVLqy+rXWv6V5E8Gnvtexmq h/+UgbIxW00SVVxpNyAFeuu1IlbgSYU0DvU4xpha/XKX6Ifyl9SeKmn6y+1UEU4q AYd/6UuYvK26G8tA3Zteh3lR8cUiLeorIwB6B5WoMDZXhyBz+PhaBS2ypHsrrnyr IIEDres5/zEm355nT8j0hTMv40ZUlj4UFRf2eWqcdrLD54yb6n8g4X2rmsASoe1A Z3CrhkV39dPEsB7JXmGB9j6W47PWGYBYUGpYTMRr69K7eB3viDIC+i6fDK7eKS0e u+fO3K9kU2B/PSJvWVCPKn2GCOw3jRuqwbTFUt0kvo8E90yzV6yqjgyoTKt+PgBn t0SUyPEGue+KEKgJd/sp62OL8LPulFksx+E7ksvqqbmSTNAjH+bs/FrbUYoN0r3v kmkxgxLzipu1aYtLwcAtNJpEya3c6s4ERBFeJMk01xUkiiWPG9RLa66Zg2IzDqWW J55irlJIqBm792vB+vH7+FHU3YjEGh66ycGRqsBWiw2oUxmlxEc= =vjhp -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "Not much going on, just some small bug fixes: - Typo causing a regression in mlx5 devx - Regression in the recent hns rework causing the HW to get out of sync - Long-standing cxgb4 adaptor crash when destroying cm ids" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server RDMA/hns: Fix bug during CMDQ initialization RDMA/mlx5: Fix typo in destroy_mkey inbox	2021-03-25 11:23:35 -07:00
Shay Drory	ad50294d4d	RDMA/mlx5: Create ODP EQ only when ODP MR is created There is no need to create the ODP EQ if the user doesn't use ODP MRs. Hence, create it only when the first ODP MR is created. This EQ will be destroyed only when the device is unloaded. This will decrease the number of EQs created per device. for example: If we creates 1K devices (SF/VF/etc'), than we will decrease the num of EQs by 1K. Link: https://lore.kernel.org/r/20210314125418.179716-1-leon@kernel.org Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-23 17:00:14 -03:00
Leon Romanovsky	b5486430bb	RDMA/mlx5: Add missing returned error check of mlx5_ib_dereg_mr Fix the following smatch error: drivers/infiniband/hw/mlx5/mr.c:1950 mlx5_ib_dereg_mr() error: uninitialized symbol 'rc'. Fixes: `e6fb246cca` ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") Link: https://lore.kernel.org/r/20210314082250.10143-1-leon@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-22 09:28:51 -03:00
Mark Bloch	3a46f4fb55	net/mlx5: E-Switch, Refactor send to vport to be more generic Now that each representor stores a pointer to the managing E-Switch use that information when creating the send-to-vport rules. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 13:07:46 -08:00
Mark Bloch	658cfceb62	RDMA/mlx5: Use representor E-Switch when getting netdev and metadata Now that a pointer to the managing E-Switch is stored in the representor use it. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 13:07:23 -08:00
Jason Gunthorpe	7610ab57de	RDMA/mlx5: Allow larger pages in DevX umem The umem DMA list calculation was locked at 4k pages due to confusion around how this API works and is used when larger pages are present. The conclusion is: - umem's cannot extend past what is mapped into the process, so creating a lage page size and referring to a sub-range is not allowed - umem's must always have a page offset of zero, except for sub PAGE_SIZE umems - The feature of umem_offset to create multiple objects inside a umem is buggy and isn't used anyplace. Thus we can assume all users of the current API have umem_offset == 0 as well Provide a new page size calculator that limits the DMA list to the VA range and enforces umem_offset == 0. Allow user space to specify the page sizes which it can accept, this bitmap must be derived from the intended use of the umem, based on per-usage HW limitations. Link: https://lore.kernel.org/r/20210304130501.1102577-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:20:37 -04:00
Yishai Hadas	2904bb37b3	IB/core: Split uverbs_get_const/default to consider target type Change uverbs_get_const/uverbs_get_const_default to work properly with both signed/unsigned parameters. Current APIs mix s64 and u64 which leads to incorrect check when u64 value was supplied and its upper bit was set. In that case uverbs_get_const() / uverbs_get_const_default() lower bound check may fail unexpectedly, target is unsigned (lower bound is 0) but value became negative as of the s64 usage. Split to have two different APIs, no change to callers as the required API will be called internally according to the target type. Link: https://lore.kernel.org/r/20210304130501.1102577-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:20:36 -04:00
Mark Zhang	6fe6e56863	RDMA/mlx5: Fix mlx5 rates to IB rates map Correct the map between mlx5 rates and corresponding ib rates, as they don't always have a fixed offset between them. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20210304124517.1100608-4-leon@kernel.org Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:12:39 -04:00
Maor Gottlieb	7852546f52	RDMA/mlx5: Fix query RoCE port mlx5_is_roce_enabled returns the devlink RoCE init value, therefore it should be used only when driver is loaded. Instead we just need to read the roce_en field. In addition, rename mlx5_is_roce_enabled to mlx5_is_roce_init_enabled. Fixes: `7a58779edd` ("IB/mlx5: Improve query port for representor port") Link: https://lore.kernel.org/r/20210304124517.1100608-2-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:12:39 -04:00
Jason Gunthorpe	14d05b552b	RDMA/mlx5: Rename mlx5_mr_cache_invalidate() to revoke_mr() Now that this is only used in a few places in mr.c give it a sensible name. It has nothing to do with the cache and can be invoked on any MR. DMA is stopped and the user cannot touch the MR any further once it completes. Link: https://lore.kernel.org/r/20210304120745.1090751-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:03:26 -04:00
Jason Gunthorpe	e6fb246cca	RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr() Now that the SRCU stuff has been removed the entire MR destroy logic can be made a lot simpler. Currently there are many different ways to destroy a MR and it makes it really hard to do this task correctly. Route all destruction through mlx5_ib_dereg_mr() and make it work for all situations. Since it turns out all the different MR types do basically the same thing this removes a lot of knowledge of MR internals from ODP and leaves ODP just exporting an operation to clean up children. This fixes a few weird corner cases bugs and firmly uses the correct ordering of the MR destruction: - Stop parallel access to the mkey via the ODP xarray - Stop DMA - Release the umem - Clean up ODP children - Free/Recycle the MR Link: https://lore.kernel.org/r/20210304120745.1090751-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:03:25 -04:00
Jason Gunthorpe	f18ec42231	RDMA/mlx5: Use a union inside mlx5_ib_mr The struct mlx5_ib_mr can be used for three different things, but only one at a time: - In the user MR cache - As a kernel MR - As a user MR Overlay the three things into a single union with the following rules: - If the mr is found on the cache_ent->head list then it is a cache MR and umem == NULL. The entire union is zero after the MR is removed from the cache. - If umem != NULL or type == IB_MR_TYPE_USER then it is a user MR. - If umem == NULL then it is a kernel MR This reduces the size of struct mlx5_ib_mr to 552 bytes from 702. The only place the three flows overlap in the code is during dereg, so add a few extra checks along there. Link: https://lore.kernel.org/r/20210304120745.1090751-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:03:25 -04:00
Jason Gunthorpe	a639e66703	RDMA/mlx5: Zero out ODP related items in the mlx5_ib_mr All of the ODP code assumes when it calls mlx5_mr_cache_alloc() the ODP related fields are zero'd. This is true if the MR was just allocated, but if the MR is recycled through the cache then the values are never zero'd. This causes a bug in the odp_stats, they don't reset when the MR is reallocated, also is_odp_implicit is never 0'd. So we can use memset on a block of the mlx5_ib_mr reorganize the structure to put all the data that can be zero'd by the cache at the end. It is organized as an anonymous struct because the next patch will make this a union. Delete the unused smr_info. Don't set the kernel only desc_size on the user path. No longer any need to zero mr->parent before freeing it, the memset() will get it now. Fixes: `a3de94e3d6` ("IB/mlx5: Introduce ODP diagnostic counters") Link: https://lore.kernel.org/r/20210304120745.1090751-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:03:25 -04:00
Mark Zhang	22053df0a3	RDMA/mlx5: Fix typo in destroy_mkey inbox Set mkey_index to the correct place when preparing the destroy_mkey inbox, this will fix the following syndrome: mlx5_core 0000:08:00.0: mlx5_cmd_check:782:(pid 980716): DEALLOC_PD(0x801) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x31b635) Link: https://lore.kernel.org/r/20210311142024.1282004-1-leon@kernel.org Fixes: `1368ead04c` ("RDMA/mlx5: Use strict get/set operations for obj_id") Reviewed-by: Ido Kalir <idok@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 13:35:48 -04:00
Maor Gottlieb	8256c69b2d	RDMA/mlx5: Fix timestamp default mode 1. Don't set the ts_format bit to default when it reserved - device is running in the old mode (free running). 2. XRC doesn't have a CQ therefore the ts format in the QP context should be default / free running. 3. Set ts_format to WQ. Fixes: `2fe8d4b878` ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:57 -08:00
Leon Romanovsky	f91803998c	RDMA/mlx5: Set correct kernel-doc identifier The W=1 allmodconfig build produces the following warning: drivers/infiniband/hw/mlx5/odp.c:1086: warning: wrong kernel-doc identifier on line: * Parse a series of data segments for page fault handling. Fix it by changing /** to be /* as it is written in kernel-doc documentation. Fixes: `5e769e444d` ("RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'") Link: https://lore.kernel.org/r/20210302074214.1054299-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-03 13:21:24 -04:00
YueHaibing	3a9b3d4536	IB/mlx5: Add missing error code Set err to -ENOMEM if kzalloc fails instead of 0. Fixes: `7597385371` ("IB/mlx5: Enable subscription for device events over DEVX") Link: https://lore.kernel.org/r/20210222122343.19720-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-01 14:49:09 -04:00
Jason Gunthorpe	7289e26f39	Linux 5.11 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmAppPgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGeXYH/imZPBd4A1jIMehN 5HV2A53Z+MXmmaMuGj9X1KV6vsf55/xB+IhOoFdtRAIsO8c2yYSCO8i4+4R0XfYA +/YFJeq672rojQnmh6XbpR8dugaAV7CUHy6n7KDsyvtT6EOCpwFSwkOb4X3tBRX6 TlYgm2d/xgV/wRHSgLVugK0MdFCLMAnyb7mkPfar9QrMgG1BiDKLq07xmwnS23On TkqpJ9yZ/rJpUrrUqQYPShSO/FmA+fSfWs0CDv7EIrJ40LUScD6PZxSHWTIHtjLk E4jFda6wuqLRVWsBwaBzUIdD0zk7X5quHRzEpbC5ga16SK6yrWvE5YJJXCguIEuZ f3FMRYs= =CAjn -----END PGP SIGNATURE----- Merge tag 'v5.11' into rdma.git for-next Linux 5.11 Merged to resolve conflicts with RDMA rc commits - drivers/infiniband/sw/rxe/rxe_net.c The final logic is to call rxe_get_dev_from_net() again with the master netdev if the packet was rx'd on a vlan. To keep the elimination of the local variables requires a trivial edit to the code in -rc Link: https://lore.kernel.org/r/20210210131542.215ea67c@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-18 11:19:29 -04:00
Jason Gunthorpe	68ad4d1cc6	Merge branch 'mlx5_timestamp' into rdma.git for-next Leon Romanovsky says: ==================== Add an extra timestamp format for mlx5_ib device. ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> * branch 'mlx5_timestamp': RDMA/mlx5: Fail QP creation if the device can not support the CQE TS net/mlx5: Add new timestamp mode bits	2021-02-16 14:49:36 -04:00
Aharon Landau	2fe8d4b878	RDMA/mlx5: Fail QP creation if the device can not support the CQE TS In ConnectX6Dx device, HW can work in real time timestamp mode according to the device capabilities per RQ/SQ/QP. When the flag IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION is set, the user expect to get TS on the CQEs in free running format, so we need to fail the QP creation if the current mode of the device doesn't support it. Link: https://lore.kernel.org/r/20210209131107.698833-3-leon@kernel.org Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-16 14:48:47 -04:00
Tal Gilboa	7232c132d1	RDMA/mlx5: Allow CQ creation without attached EQs The traditional DevX CQ creation flow goes through mlx5_core_create_cq() which checks that the given EQN corresponds to an existing EQ and attaches a devx handler to the EQN for the CQ. In some cases the EQ will not be a kernel EQ, but will be controlled by modify CQ, don't block creating these just because the EQN can't be found in the kernel. Link: https://lore.kernel.org/r/20210211085549.1277674-1-leon@kernel.org Signed-off-by: Tal Gilboa <talgi@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-16 14:42:59 -04:00
Patrisious Haddad	c70f51de85	RDMA/mlx5: Support 400Gbps IB rate in mlx5 driver Support 400Gbps IB rate in mlx5 driver. Link: https://lore.kernel.org/r/20210209130429.698237-1-leon@kernel.org Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-09 14:00:35 -04:00
Yishai Hadas	db72438c93	RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flow Cleanup the synchronize_srcu() from the ODP flow as it was found to be a very heavy time consumer as part of dereg_mr. For example de-registration of 10000 ODP MRs each with size of 2M hugepage took 19.6 sec comparing de-registration of same number of non ODP MRs that took 172 ms. The new locking scheme uses the wait_event() mechanism which follows the use count of the MR instead of using synchronize_srcu(). By that change, the time required for the above test took 95 ms which is even better than the non ODP flow. Once fully dropped the srcu usage, had to come with a lock to protect the XA access. As part of using the above mechanism we could also clean the num_deferred_work stuff and follow the use count instead. Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-08 20:31:11 -04:00
Parav Pandit	131796524f	IB/mlx5: Use rdma_for_each_port for port iteration Instead of open coding the loop for port iteration, use rdma_for_each_port macro provided by core. To use such macro, early initialization of phys_port_cnt is needed. Hence, initialize such constant early enough in the init stage. Whichever functions are called with port using rdma_for_each_port(), convert their port type from u8 to unsigned int to match the core API. Link: https://lore.kernel.org/r/20210203130133.4057329-6-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-05 12:06:01 -04:00
Parav Pandit	7416790e22	RDMA/core: Introduce and use API to read port immutable data Currently mlx5 driver caches port GID table length for 2 ports. It is also cached by IB core as port immutable data. When mlx5 representor ports are present, which are usually more than 2, invalid access to port_caps array can happen while validating the GID table length which is only for 2 ports. To avoid this, take help of the IB cores port immutable data by exposing an API to read the port immutable fields. Remove mlx5 driver's internal cache, thereby reduce code and data. Link: https://lore.kernel.org/r/20210203130133.4057329-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-05 12:06:01 -04:00
Parav Pandit	7a58779edd	IB/mlx5: Improve query port for representor port Improve query port functionality for representor port as below. 1. RoCE Qkey violation counters are not applicable for representor port. 2. Avoid setting gid_tbl_len twice for representor port. 3. Avoid setting ip_gids and IB_PORT_CM_SUP property for representor port as GID table is empty and CM support is not present in representor mode. Link: https://lore.kernel.org/r/20210203130133.4057329-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-05 12:06:00 -04:00
Parav Pandit	2019d70e91	IB/mlx5: Avoid calling query device for reading pkey table length Pkey table length for all the ports of the device is the same. Currently get_ports_cap() reads and stores it for each port by querying the device which reads more than just pkey table length. For representor device ports which can be in range of hundreds, it queries is for each such port and end up returning same value for all the ports. When in representor mode, modify QP accesses pkey port caps for a port index that can be outside of the port_caps table. Hence, simplify the logic to query the max pkey table length only once during device initialization sequence. Link: https://lore.kernel.org/r/20210203130133.4057329-3-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-05 12:06:00 -04:00
Parav Pandit	3ce60f443b	IB/mlx5: Move mlx5_port_caps from mlx5_core_dev to mlx5_ib_dev mlx5_port_caps are RDMA specific capabilities. These are not used by the mlx5_core_device at all. Move them to mlx5_ib_dev where it is used and reduce the scope of it to multiple drivers. Link: https://lore.kernel.org/r/20210203130133.4057329-2-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-05 12:06:00 -04:00
Parav Pandit	d6fd59e14e	IB/mlx5: Support default partition key for representor port Representor port has only one default pkey. Hence have simpler query pkey callback or it. Link: https://lore.kernel.org/r/20210127150010.1876121-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-02 19:31:44 -04:00
Parav Pandit	d286ac1d05	IB/mlx5: Return appropriate error code instead of ENOMEM When mlx5_ib_stage_init_init() fails, return the error code related to failure instead of -ENOMEM. Fixes: `16c1975f10` ("IB/mlx5: Create profile infrastructure to add and remove stages") Link: https://lore.kernel.org/r/20210127150010.1876121-8-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-02 13:21:18 -04:00
Mark Bloch	2614488d1f	RDMA/mlx5: Allow creating all QPs even when non RDMA profile is used The cited commit disallowed creating any QP which isn't raw ethernet, reg umr or the special UD qp for testing WC, this proved too strict. While modify can't be done (no GIDS/GID table for example) just creating a QP is okay. This patch partially reverts the bellow mentioned commit and places the restriction at the modify QP stage and not at the creation. DEVX commands should be used to manipulate such QPs. Fixes: `42caf9cb59` ("RDMA/mlx5: Allow only raw Ethernet QPs when RoCE isn't enabled") Link: https://lore.kernel.org/r/20210125120709.836718-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-01-28 15:19:43 -04:00
Lee Jones	30cd9fc5e7	RDMA/hw/mlx5/qp: Demote non-conformant kernel-doc header Fixes the following W=1 kernel build warning(s): drivers/infiniband/hw/mlx5/qp.c:5384: warning: Function parameter or member 'qp' not described in 'mlx5_ib_qp_set_counter' drivers/infiniband/hw/mlx5/qp.c:5384: warning: Function parameter or member 'counter' not described in 'mlx5_ib_qp_set_counter' Link: https://lore.kernel.org/r/20210121094519.2044049-3-lee.jones@linaro.org Cc: Leon Romanovsky <leon@kernel.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-01-22 14:37:29 -04:00
Lee Jones	5e769e444d	RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()' Fixes the following W=1 kernel build warning(s): drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'dev' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'pfault' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'wqe' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'wqe_end' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'bytes_mapped' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'total_wqe_bytes' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'receive_queue' not described in 'pagefault_data_segments' Link: https://lore.kernel.org/r/20210121094519.2044049-2-lee.jones@linaro.org Cc: Leon Romanovsky <leon@kernel.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-01-22 14:37:29 -04:00
Jianxin Xiong	90da7dc820	RDMA/mlx5: Support dma-buf based userspace memory region Implement the new driver method 'reg_user_mr_dmabuf'. Utilize the core functions to import dma-buf based memory region and update the mappings. Add code to handle dma-buf related page fault. Link: https://lore.kernel.org/r/1608067636-98073-5-git-send-email-jianxin.xiong@intel.com Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-01-20 16:49:25 -04:00
Parav Pandit	de641d74fb	Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion" This reverts commit `fbdd0049d9`. Due to commit in fixes tag, netdevice events were received only in one net namespace of mlx5_core_dev. Due to this when netdevice events arrive in net namespace other than net namespace of mlx5_core_dev, they are missed. This results in empty GID table due to RDMA device being detached from its net device. Hence, revert back to receive netdevice events in all net namespaces to restore back RDMA functionality in non init_net net namespace. The deadlock will have to be addressed in another patch. Fixes: `fbdd0049d9` ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion") Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-01-19 20:22:59 -04:00
Parav Pandit	559a3eacc4	IB/mlx5: Make function static mlx5_query_mad_ifc_smp_attr_node_info() is internal to mad.c Hence, make it static. Link: https://lore.kernel.org/r/20210113121703.559778-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-01-19 20:14:14 -04:00

1 2 3 4 5 ...

1622 commits