1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/mm
Ricardo Cañuelo Navarro 2ede647a6f mm,madvise,hugetlb: check for 0-length range after end address adjustment
Add a sanity check to madvise_dontneed_free() to address a corner case in
madvise where a race condition causes the current vma being processed to
be backed by a different page size.

During a madvise(MADV_DONTNEED) call on a memory region registered with a
userfaultfd, there's a period of time where the process mm lock is
temporarily released in order to send a UFFD_EVENT_REMOVE and let
userspace handle the event.  During this time, the vma covering the
current address range may change due to an explicit mmap done concurrently
by another thread.

If, after that change, the memory region, which was originally backed by
4KB pages, is now backed by hugepages, the end address is rounded down to
a hugepage boundary to avoid data loss (see "Fixes" below).  This rounding
may cause the end address to be truncated to the same address as the
start.

Make this corner case follow the same semantics as in other similar cases
where the requested region has zero length (ie.  return 0).

This will make madvise_walk_vmas() continue to the next vma in the range
(this time holding the process mm lock) which, due to the prev pointer
becoming stale because of the vma change, will be the same hugepage-backed
vma that was just checked before.  The next time madvise_dontneed_free()
runs for this vma, if the start address isn't aligned to a hugepage
boundary, it'll return -EINVAL, which is also in line with the madvise
api.

From userspace perspective, madvise() will return EINVAL because the start
address isn't aligned according to the new vma alignment requirements
(hugepage), even though it was correctly page-aligned when the call was
issued.

Link: https://lkml.kernel.org/r/20250203075206.1452208-1-rcn@igalia.com
Fixes: 8ebe0a5eaa ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs")
Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Florent Revest <revest@google.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17 22:40:01 -08:00
..
damon mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() 2025-01-25 20:22:46 -08:00
kasan kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() 2025-01-25 20:22:46 -08:00
kfence kfence: skip __GFP_THISNODE allocations on NUMA systems 2025-02-01 03:53:26 -08:00
kmsan mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c bootmem: stop using page->index 2024-11-07 14:38:07 -08:00
cma.c cma: enforce non-zero pageblock_order during cma_init_reserved_mem() 2024-11-14 22:49:19 -08:00
cma.h mm: change type of cma_area_count to unsigned int 2025-01-13 22:40:35 -08:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm: compaction: use the proper flag to determine watermarks 2025-02-01 03:53:25 -08:00
debug.c mm/debug: introduce VM_WARN_ON_VMG() to dump VMA merge state 2025-01-25 20:22:23 -08:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
early_ioremap.c mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereference 2025-01-13 22:40:59 -08:00
execmem.c alloc_tag: populate memory for module tags as needed 2024-11-07 14:25:16 -08:00
fadvise.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
folio-compat.c mm/writeback: add folio_mark_dirty_lock() 2024-11-05 11:14:32 +01:00
gup.c mm: gup: fix infinite loop within __get_longterm_locked 2025-02-01 03:53:27 -08:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm/huge_memory: convert has_hwpoisoned into a pure folio flag 2025-01-25 20:22:41 -08:00
hugetlb.c mm/hugetlb: fix hugepage allocation for interleaved memory nodes 2025-02-01 03:53:27 -08:00
hugetlb_cgroup.c mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios 2025-01-25 20:22:42 -08:00
hugetlb_vmemmap.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: convert mm_lock_seq to a proper seqcount 2025-01-13 22:40:50 -08:00
internal.h mm/truncate: add folio_unmap_invalidate() helper 2025-01-25 20:22:43 -08:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig mm: add build-time option for hotplug memory default online type 2025-01-25 20:22:21 -08:00
Kconfig.debug slub: Introduce CONFIG_SLUB_RCU_DEBUG 2024-08-27 14:12:51 +02:00
khugepaged.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
kmemleak.c mm: kmemleak: fix upper boundary check for physical address objects 2025-02-01 03:53:25 -08:00
ksm.c ksm: add ksm involvement information for each process 2025-01-25 20:22:40 -08:00
list_lru.c mm/list_lru: fix false warning of negative counter 2024-12-30 17:59:10 -08:00
maccess.c kasan: migrate copy_user_test to kunit 2024-11-11 00:26:44 -08:00
madvise.c mm,madvise,hugetlb: check for 0-length range after end address adjustment 2025-02-17 22:40:01 -08:00
Makefile mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) 2025-01-13 22:40:48 -08:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
memcontrol-v1.c mm: remove the non-useful else after a break in a if statement 2025-01-13 22:40:40 -08:00
memcontrol-v1.h mm: memcg: declare do_memsw_account inline 2024-12-05 19:54:46 -08:00
memcontrol.c memcg: fix soft lockup in the OOM process 2025-01-25 20:22:35 -08:00
memfd.c mm/memfd: use strncpy_from_user() to read memfd name 2025-01-25 20:22:40 -08:00
memory-failure.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
memory_hotplug.c mm: add build-time option for hotplug memory default online type 2025-01-25 20:22:21 -08:00
mempolicy.c mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() 2025-01-25 20:22:41 -08:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate.c mm: separate move/undo parts from migrate_pages_batch() 2025-01-25 20:22:45 -08:00
migrate_device.c mm: remap unused subpages to shared zeropage when splitting isolated thp 2024-09-09 16:39:03 -07:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c mm/memmap: prevent double scanning of memmap by kmemleak 2025-01-25 20:22:30 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap.c mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
mmap_lock.c mm: mmap_lock: optimize mmap_lock tracepoints 2025-01-13 22:40:34 -08:00
mmu_gather.c mm: pgtable: move __tlb_remove_table_one() in x86 to generic file 2025-01-25 20:22:23 -08:00
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm: add PTE_MARKER_GUARD PTE marker 2024-11-11 00:26:44 -08:00
mremap.c mm: clear uffd-wp PTE/PMD state on mremap() 2025-01-12 19:03:37 -08:00
mseal.c mseal: remove can_do_mseal() 2025-01-13 22:40:51 -08:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c fsnotify: generate pre-content permission event on page fault 2024-12-11 17:28:41 +01:00
numa.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
numa_emulation.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
numa_memblks.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
oom_kill.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page-writeback.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page_alloc.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page_counter.c kernel/cgroup: Add "dmem" memory accounting cgroup 2025-01-06 17:24:38 +01:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_frag_cache.c mm/page_alloc: export free_frozen_pages() instead of free_unref_page() 2025-01-13 22:40:31 -08:00
page_idle.c mm/page_idle: constify 'struct bin_attribute' 2025-01-25 20:22:19 -08:00
page_io.c mm, swap: clean up device availability check 2025-01-25 20:22:36 -08:00
page_isolation.c mm/page_isolation: don't pass gfp flags to start_isolate_page_range() 2025-01-13 22:40:44 -08:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
pagewalk.c mm: pagewalk: add the ability to install PTEs 2024-11-11 00:26:44 -08:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
pgalloc-track.h
pgtable-generic.c mm: add RCU annotation to pte_offset_map(_lock) 2024-12-18 19:04:43 -08:00
process_vm_access.c mm: refactor mm_access() to not return NULL 2024-11-05 16:56:23 -08:00
pt_reclaim.c mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) 2025-01-13 22:40:48 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
rmap.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
rodata_test.c mm/rodata_test: verify test data is unchanged, rather than non-zero 2025-01-13 22:40:38 -08:00
secretmem.c add a string-to-qstr constructor 2025-01-27 19:25:45 -05:00
shmem.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
show_mem.c mm/show_mem: use str_yes_no() helper in show_free_areas() 2024-11-07 14:38:08 -08:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shrinker_debug.c saner replacement for debugfs_rename() 2025-01-15 13:14:37 +01:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab.h mm/slab: fix kernel-doc func param names 2025-01-13 10:22:04 +01:00
slab_common.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
slub.c Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
sparse-vmemmap.c mm/memmap: prevent double scanning of memmap by kmemleak 2025-01-25 20:22:30 -08:00
sparse.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
swap.c mm/filemap: add read support for RWF_DONTCACHE 2025-01-25 20:22:43 -08:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swap_cgroup.c mm/swap_cgroup: decouple swap cgroup recording and clearing 2025-01-25 20:22:19 -08:00
swap_slots.c mm, swap_slots: remove slot cache for freeing path 2025-01-25 20:22:37 -08:00
swap_state.c mm: remove unnecessary calls to lru_add_drain 2025-01-25 20:22:21 -08:00
swapfile.c mm, swap: fix reclaim offset calculation error during allocation 2025-02-01 03:53:26 -08:00
truncate.c mm/truncate: add folio_unmap_invalidate() helper 2025-01-25 20:22:43 -08:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() 2025-01-13 22:40:46 -08:00
util.c mm: add comments to do_mmap(), mmap_region() and vm_mmap() 2025-01-13 22:40:59 -08:00
vma.c mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
vma.h mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
vma_internal.h mm/vma: move brk() internals to mm/vma.c 2025-01-13 22:40:42 -08:00
vmalloc.c mm: alloc_pages_bulk: rename API 2025-01-25 20:22:31 -08:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c mm/vmscan: accumulate nr_demoted for accurate demotion statistics 2025-02-01 03:53:24 -08:00
vmstat.c vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-12 19:03:38 -08:00
workingset.c mm/mglru: rework workingset protection 2025-01-25 20:22:39 -08:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpdesc.h mm/zsmalloc: introduce __zpdesc_clear/set_zsmalloc() 2025-01-25 20:22:35 -08:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc() 2025-02-01 03:53:23 -08:00
zswap.c mm/zswap: fix inconsistency when zswap_store_page() fails 2025-02-17 22:40:01 -08:00