1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Linux kernel source tree
Find a file
yangge 6268f0a166 mm: compaction: use the proper flag to determine watermarks
There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of
memory.  I have configured 16GB of CMA memory on each NUMA node, and
starting a 32GB virtual machine with device passthrough is extremely slow,
taking almost an hour.

Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB
of no-CMA memory on a NUMA node can be used as virtual machine memory. 
There is 16GB of free CMA memory on a NUMA node, which is sufficient to
pass the order-0 watermark check, causing the __compaction_suitable()
function to consistently return true.

For costly allocations, if the __compaction_suitable() function always
returns true, it causes the __alloc_pages_slowpath() function to fail to
exit at the appropriate point.  This prevents timely fallback to
allocating memory on other nodes, ultimately resulting in excessively long
virtual machine startup times.

Call trace:
__alloc_pages_slowpath
    if (compact_result == COMPACT_SKIPPED ||
        compact_result == COMPACT_DEFERRED)
        goto nopage; // should exit __alloc_pages_slowpath() from here

We could use the real unmovable allocation context to have
__zone_watermark_unusable_free() subtract CMA pages, and thus we won't
pass the order-0 check anymore once the non-CMA part is exhausted.  There
is some risk that in some different scenario the compaction could in fact
migrate pages from the exhausted non-CMA part of the zone to the CMA part
and succeed, and we'll skip it instead.  But only __GFP_NORETRY
allocations should be affected in the immediate "goto nopage" when
compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY
anyway and won't fail without trying to compact-migrate the non-CMA
pageblocks into CMA pageblocks first, so it should be fine.

After this fix, it only takes a few tens of seconds to start a 32GB
virtual machine with device passthrough functionality.

Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/
Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com
Signed-off-by: yangge <yangge1116@126.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01 03:53:25 -08:00
arch s390 fixes for 6.14 merge window 2025-01-30 10:53:49 -08:00
block Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
Documentation Two fixes for footnote-related warnings that appeared with Sphinx 8.x. 2025-01-30 10:57:19 -08:00
drivers mm/fake-numa: handle cases with no SRAT info 2025-02-01 03:53:25 -08:00
fs ocfs2: fix incorrect CPU endianness conversion causing mount failure 2025-02-01 03:53:24 -08:00
include mm/vmscan: fix hard LOCKUP in function isolate_lru_folios 2025-02-01 03:53:23 -08:00
init treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
io_uring treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
ipc treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
kernel kernel: be more careful about dup_mmap() failures and uprobe registering 2025-02-01 03:53:25 -08:00
lib CRC fixes for 6.14 2025-01-29 10:50:28 -08:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm mm: compaction: use the proper flag to determine watermarks 2025-02-01 03:53:25 -08:00
net NFS Client Updates for Linux 6.14 2025-01-28 14:23:46 -08:00
rust Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
samples Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
scripts scripts/gdb: fix aarch64 userspace detection in get_current_task 2025-02-01 03:53:24 -08:00
security treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
sound Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
tools gpio fixes for v6.14-rc1 2025-01-30 10:19:30 -08:00
usr kbuild: Drop support for include/asm-<arch> in headers_check.pl 2024-12-21 11:43:17 +09:00
virt Merge branch 'kvm-mirror-page-tables' into HEAD 2025-01-20 07:15:58 -05:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.clippy.toml rust: give Clippy the minimum supported Rust version 2025-01-10 00:17:25 +01:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore rust: use host dylib naming convention to support macOS 2025-01-10 01:01:24 +01:00
.mailmap mailmap: add an entry for Hamza Mahfooz 2025-02-01 03:53:24 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS Mainly individually changelogged singleton patches. The patch series in 2025-01-26 17:50:53 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: mailmap: update Yosry Ahmed's email address 2025-02-01 03:53:24 -08:00
Makefile Networking changes for 6.14. 2025-01-22 08:28:57 -08:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.