There exist two different names for the doorbell records: db_info and
db_record. We use dbrec for cpu address of the doorbell record and
dbrec_dma for dma address of the doorbell recordi uniformly.
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240311113821.22482-3-boshiyu@alibaba-inc.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Currently, the 8 byte doorbell record is allocated along with the queue
buffer, which may result in waste of dma space when the queue buffer is
page aligned. To address this issue, we introduce a dma pool named
db_pool and allocate doorbell record from it.
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240311113821.22482-2-boshiyu@alibaba-inc.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
First, we add a new command to query hardware statistics, and then
implement two functions: ib_device_ops.alloc_hw_port_stats and
ib_device_ops.get_hw_stats to allow rdma tool can get the statistics
of erdma device.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231227084800.99091-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The erdma_create_scatter_mtt() function is supposed to return error
pointers. Returning NULL will lead to an Oops.
Fixes: ed10435d35 ("RDMA/erdma: Implement hierarchical MTT")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/1eb400d5-d8a3-4a8e-b3da-c43c6c377f86@moroto.mountain
Acked-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Hierarchical MTT allows large MR registration without the need of
continuous physical address. This commit adds the support of hierarchical
MTT support for erdma.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-4-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Currently our MTT only support inline mtt entries (0 level MTT) and
indirect MTT entries (1 level mtt), which will limit the maximum length
of MRs. In order to implement a multi-level MTT, we refactor the
structure of MTT first.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Currently, variable names and field names of struct erdma_mem contain
'mtt', which is not accurate. Renaming them with 'xxx_mem' or 'mem'.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230817102151.75964-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The original doorbell allocation mechanism is complex and does not meet
the isolation requirement. So we introduce a new doorbell mechanism and the
original mechanism (only be used with CAP_SYS_RAWIO if hardware does not
support the new mechanism) needs to be kept as simple as possible for
compatibility.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230606055005.80729-5-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
For the isolation requirement, each QP/CQ can only issue doorbells from the
allocated mmio space. Configure the relationship between QPs/CQs and
mmio doorbell spaces to hardware in create_qp/create_cq interfaces.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230606055005.80729-4-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Each ucontext will try to allocate doorbell resources in the extended bar
space from hardware. For compatibility, we change nothing for the original
bar space, and it will be used only for applications with CAP_SYS_RAWIO
authority in the older HW/FW environments.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230606055005.80729-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Hardware's page size is 4096, but the kernel's page size may vary. Driver
should use hardware's page size when communicating with hardware.
Fixes: 1550557717 ("RDMA/erdma: Add verbs implementation")
Link: https://lore.kernel.org/r/20230307102924.70577-2-chengyou@linux.alibaba.com
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rdma_user_mmap_entry_get() take reference, we should release it when not
need anymore, add the missing rdma_user_mmap_entry_put() in the error
path to fix it.
Fixes: 1550557717 ("RDMA/erdma: Add verbs implementation")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20221220121139.1540564-1-linmq006@gmail.com
Acked-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Introduce "capacity flags" field at where hardware put all zeros originally
in "query device" response. Using this field, hardware can report atomic
feature if supports.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221107021845.44598-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
To support atomic operations, IB_ACCESS_REMOTE_ATOMIC right should be
passed to hardware for permission check. Since "access mode" field in FRMR
SQE and RegMr command is never used by hw, we remove the "access mode"
field, so that we can then have enough space to extend access fields.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221107021845.44598-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Many of erdma's includes are redundant, because they are already included
indirectly by kernel headers or custom headers. So we remove all the
unnecessary direct-includes. Besides, add linux/pci.h to erdma.h because
it's also used in the file.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220909093822.33868-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
erdma_post_cmd_wait does not use the 'u64 *req' input parameter directly.
So it is better to define it to 'void *req', and by this we can eliminate
the casting when calling erdma_post_cmd_wait.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220909093822.33868-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
QP0 in HW is used for CMDQ, and the rest is for RDMA QPs. So the actual
max_qp capacity reported to core should be max_qp (reported by HW) - 1.
So does max_cq.
Fixes: 1550557717 ("RDMA/erdma: Add verbs implementation")
Link: https://lore.kernel.org/all/20220810014320.88026-1-chengyou@linux.alibaba.com
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
The RDMA verbs implementation of erdma is divided into three files:
erdma_qp.c, erdma_cq.c, and erdma_verbs.c. Internal used functions and
datapath functions of QP/CQ are put in erdma_qp.c and erdma_cq.c, the rest
is in erdma_verbs.c.
This commit also fixes some static check warnings.
Link: https://lore.kernel.org/r/20220727014927.76564-8-chengyou@linux.alibaba.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>