1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Linux kernel source tree
Find a file
Sagi Grimberg 3219378987 nvme-tcp: Fix I/O queue cpu spreading for multiple controllers
Since day-1 we are assigning the queue io_cpu very naively. We always
base the queue id (controller scope) and assign it its matching cpu
from the online mask. This works fine when the number of queues match
the number of cpu cores.

The problem starts when we have less queues than cpu cores. First, we
should take into account the mq_map and select a cpu within the cpus
that are assigned to this queue by the mq_map in order to minimize cross
numa cpu bouncing.

Second, even worse is that we don't take into account multiple
controllers may have assigned queues to a given cpu. As a result we may
simply compund more and more queues on the same set of cpus, which is
suboptimal.

We fix this by introducing global per-cpu counters that tracks the
number of queues assigned to each cpu, and we select the least used cpu
based on the mq_map and the per-cpu counters, and assign it as the queue
io_cpu.

The behavior for a single controller is slightly optimized by selecting
better cpu candidates by consulting with the mq_map, and multiple
controllers are spreading queues among cpu cores much better, resulting
in lower average cpu load, and less likelihood to hit hotspots.

Note that the accounting is not 100% perfect, but we don't need to be,
we're simply putting our best effort to select the best candidate cpu
core that we find at any given point.

Another byproduct is that every controller reset/reconnect may change
the queues io_cpu mapping, based on the current LRU accounting scheme.

Here is the baseline queue io_cpu assignment for 4 controllers, 2 queues
per controller, and 4 cpus on the host:
nvme1: queue 0: using cpu 0
nvme1: queue 1: using cpu 1
nvme2: queue 0: using cpu 0
nvme2: queue 1: using cpu 1
nvme3: queue 0: using cpu 0
nvme3: queue 1: using cpu 1
nvme4: queue 0: using cpu 0
nvme4: queue 1: using cpu 1

And this is the fixed io_cpu assignment:
nvme1: queue 0: using cpu 0
nvme1: queue 1: using cpu 2
nvme2: queue 0: using cpu 1
nvme2: queue 1: using cpu 3
nvme3: queue 0: using cpu 0
nvme3: queue 1: using cpu 2
nvme4: queue 0: using cpu 1
nvme4: queue 1: using cpu 3

Fixes: 3f2304f8c6 ("nvme-tcp: add NVMe over TCP host driver")
Suggested-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[fixed kbuild reported errors]
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-01-10 19:30:04 -08:00
arch block: remove BLK_MQ_F_SHOULD_MERGE 2024-12-23 08:17:23 -07:00
block block: simplify tag allocation policy selection 2025-01-06 07:37:41 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto crypto: rsassa-pkcs1 - Copy source data for SG list 2024-12-10 13:34:05 +08:00
Documentation Devicetree fixes for 6.13, part 1: 2024-12-22 08:40:23 -08:00
drivers nvme-tcp: Fix I/O queue cpu spreading for multiple controllers 2025-01-10 19:30:04 -08:00
fs block: Delete bio_set_prio() 2024-12-23 08:17:23 -07:00
include block: simplify tag allocation policy selection 2025-01-06 07:37:41 -07:00
init - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
io_uring io_uring: check if iowq is killed before queuing 2024-12-19 13:31:53 -07:00
ipc - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
kernel blktrace: remove redundant return at end of function 2024-12-23 08:17:23 -07:00
lib alloc_tag: fix module allocation tags populated area calculation 2024-12-18 19:04:46 -08:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm mm: huge_memory: handle strsep not finding delimiter 2024-12-18 19:04:47 -08:00
net BPF fixes: 2024-12-21 11:07:19 -08:00
rust rust: block: fix use of BLK_MQ_F_SHOULD_MERGE 2024-12-23 08:17:23 -07:00
samples BPF fixes: 2024-12-06 15:07:48 -08:00
scripts Kbuild fixes for v6.13 (2nd) 2024-12-21 11:24:32 -08:00
security selinux/stable-6.13 PR 20241217 2024-12-18 12:10:15 -08:00
sound soc: fixes for 6.13 2024-12-16 10:10:53 -08:00
tools 25 hotfixes. 16 are cc:stable. 19 are MM and 6 are non-MM. 2024-12-21 15:31:56 -08:00
usr kbuild: Drop support for include/asm-<arch> in headers_check.pl 2024-12-21 11:43:17 +09:00
virt VFIO updates for v6.13 2024-11-27 12:57:03 -08:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.clippy.toml rust: enable Clippy's check-private-items 2024-10-07 21:39:57 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Kbuild updates for v6.13 2024-11-30 13:41:50 -08:00
.mailmap mailmap: add entry for Ying Huang 2024-12-18 19:04:41 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS cgroup: Changes for v6.13 2024-11-20 09:54:49 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS drm fixes for 6.13-rc4 2024-12-20 10:17:53 -08:00
Makefile Linux 6.13-rc4 2024-12-22 13:22:21 -08:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.