1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/mm
David Hildenbrand 41cddf83d8 mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize()
If migration succeeded, we called
folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from the
old to the new folio.  This will set memcg_data of the old folio to 0.

Similarly, if migration failed, memcg_data of the dst folio is left unset.

If we call folio_putback_lru() on such folios (memcg_data == 0), we will
add the folio to be freed to the LRU, making memcg code unhappy.  Running
the hmm selftests:

  # ./hmm-tests
  ...
  #  RUN           hmm.hmm_device_private.migrate ...
  [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
  [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
  [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
  [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
  [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
  [  102.087230][T14893] ------------[ cut here ]------------
  [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.090478][T14893] Modules linked in:
  [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
  [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
  [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.096104][T14893] Code: ...
  [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
  [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
  [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
  [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
  [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
  [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
  [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
  [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
  [  102.113478][T14893] PKRU: 55555554
  [  102.114172][T14893] Call Trace:
  [  102.114805][T14893]  <TASK>
  [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.116547][T14893]  ? __warn.cold+0x110/0x210
  [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.118667][T14893]  ? report_bug+0x1b9/0x320
  [  102.119571][T14893]  ? handle_bug+0x54/0x90
  [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
  [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
  [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
  [  102.123506][T14893]  ? dump_page+0x4f/0x60
  [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
  [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
  [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
  [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
  [  102.129550][T14893]  folio_putback_lru+0x16/0x80
  [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
  [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
  [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80

Likely, nothing else goes wrong: putting the last folio reference will
remove the folio from the LRU again.  So besides memcg complaining, adding
the folio to be freed to the LRU is just an unnecessary step.

The new flow resembles what we have in migrate_folio_move(): add the dst
to the lru, remove migration ptes, unlock and unref dst.

Link: https://lkml.kernel.org/r/20250210161317.717936-1-david@redhat.com
Fixes: 8763cb45ab ("mm/migrate: new memory migration helper for use with device memory")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17 22:40:02 -08:00
..
damon mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() 2025-01-25 20:22:46 -08:00
kasan kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() 2025-01-25 20:22:46 -08:00
kfence kfence: skip __GFP_THISNODE allocations on NUMA systems 2025-02-01 03:53:26 -08:00
kmsan mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c bootmem: stop using page->index 2024-11-07 14:38:07 -08:00
cma.c cma: enforce non-zero pageblock_order during cma_init_reserved_mem() 2024-11-14 22:49:19 -08:00
cma.h mm: change type of cma_area_count to unsigned int 2025-01-13 22:40:35 -08:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm: compaction: use the proper flag to determine watermarks 2025-02-01 03:53:25 -08:00
debug.c mm/debug: introduce VM_WARN_ON_VMG() to dump VMA merge state 2025-01-25 20:22:23 -08:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
early_ioremap.c mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereference 2025-01-13 22:40:59 -08:00
execmem.c alloc_tag: populate memory for module tags as needed 2024-11-07 14:25:16 -08:00
fadvise.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
folio-compat.c mm/writeback: add folio_mark_dirty_lock() 2024-11-05 11:14:32 +01:00
gup.c mm: gup: fix infinite loop within __get_longterm_locked 2025-02-01 03:53:27 -08:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm/huge_memory: convert has_hwpoisoned into a pure folio flag 2025-01-25 20:22:41 -08:00
hugetlb.c mm/hugetlb: fix hugepage allocation for interleaved memory nodes 2025-02-01 03:53:27 -08:00
hugetlb_cgroup.c mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios 2025-01-25 20:22:42 -08:00
hugetlb_vmemmap.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: convert mm_lock_seq to a proper seqcount 2025-01-13 22:40:50 -08:00
internal.h mm/truncate: add folio_unmap_invalidate() helper 2025-01-25 20:22:43 -08:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig mm: add build-time option for hotplug memory default online type 2025-01-25 20:22:21 -08:00
Kconfig.debug slub: Introduce CONFIG_SLUB_RCU_DEBUG 2024-08-27 14:12:51 +02:00
khugepaged.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
kmemleak.c mm: kmemleak: fix upper boundary check for physical address objects 2025-02-01 03:53:25 -08:00
ksm.c ksm: add ksm involvement information for each process 2025-01-25 20:22:40 -08:00
list_lru.c mm/list_lru: fix false warning of negative counter 2024-12-30 17:59:10 -08:00
maccess.c kasan: migrate copy_user_test to kunit 2024-11-11 00:26:44 -08:00
madvise.c mm,madvise,hugetlb: check for 0-length range after end address adjustment 2025-02-17 22:40:01 -08:00
Makefile mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) 2025-01-13 22:40:48 -08:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
memcontrol-v1.c mm: remove the non-useful else after a break in a if statement 2025-01-13 22:40:40 -08:00
memcontrol-v1.h mm: memcg: declare do_memsw_account inline 2024-12-05 19:54:46 -08:00
memcontrol.c memcg: fix soft lockup in the OOM process 2025-01-25 20:22:35 -08:00
memfd.c mm/memfd: use strncpy_from_user() to read memfd name 2025-01-25 20:22:40 -08:00
memory-failure.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
memory_hotplug.c mm: add build-time option for hotplug memory default online type 2025-01-25 20:22:21 -08:00
mempolicy.c mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() 2025-01-25 20:22:41 -08:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate.c mm: separate move/undo parts from migrate_pages_batch() 2025-01-25 20:22:45 -08:00
migrate_device.c mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() 2025-02-17 22:40:02 -08:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c mm/memmap: prevent double scanning of memmap by kmemleak 2025-01-25 20:22:30 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap.c mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
mmap_lock.c mm: mmap_lock: optimize mmap_lock tracepoints 2025-01-13 22:40:34 -08:00
mmu_gather.c mm: pgtable: move __tlb_remove_table_one() in x86 to generic file 2025-01-25 20:22:23 -08:00
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm: add PTE_MARKER_GUARD PTE marker 2024-11-11 00:26:44 -08:00
mremap.c mm: clear uffd-wp PTE/PMD state on mremap() 2025-01-12 19:03:37 -08:00
mseal.c mseal: remove can_do_mseal() 2025-01-13 22:40:51 -08:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c fsnotify: generate pre-content permission event on page fault 2024-12-11 17:28:41 +01:00
numa.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
numa_emulation.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
numa_memblks.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
oom_kill.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page-writeback.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page_alloc.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page_counter.c kernel/cgroup: Add "dmem" memory accounting cgroup 2025-01-06 17:24:38 +01:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_frag_cache.c mm/page_alloc: export free_frozen_pages() instead of free_unref_page() 2025-01-13 22:40:31 -08:00
page_idle.c mm/page_idle: constify 'struct bin_attribute' 2025-01-25 20:22:19 -08:00
page_io.c mm, swap: clean up device availability check 2025-01-25 20:22:36 -08:00
page_isolation.c mm/page_isolation: don't pass gfp flags to start_isolate_page_range() 2025-01-13 22:40:44 -08:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
pagewalk.c mm: pagewalk: add the ability to install PTEs 2024-11-11 00:26:44 -08:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm: add RCU annotation to pte_offset_map(_lock) 2024-12-18 19:04:43 -08:00
process_vm_access.c mm: refactor mm_access() to not return NULL 2024-11-05 16:56:23 -08:00
pt_reclaim.c mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) 2025-01-13 22:40:48 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
rmap.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
rodata_test.c mm/rodata_test: verify test data is unchanged, rather than non-zero 2025-01-13 22:40:38 -08:00
secretmem.c add a string-to-qstr constructor 2025-01-27 19:25:45 -05:00
shmem.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
show_mem.c mm/show_mem: use str_yes_no() helper in show_free_areas() 2024-11-07 14:38:08 -08:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shrinker_debug.c saner replacement for debugfs_rename() 2025-01-15 13:14:37 +01:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab.h mm/slab: fix kernel-doc func param names 2025-01-13 10:22:04 +01:00
slab_common.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
slub.c Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
sparse-vmemmap.c mm/memmap: prevent double scanning of memmap by kmemleak 2025-01-25 20:22:30 -08:00
sparse.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
swap.c mm/filemap: add read support for RWF_DONTCACHE 2025-01-25 20:22:43 -08:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swap_cgroup.c mm/swap_cgroup: decouple swap cgroup recording and clearing 2025-01-25 20:22:19 -08:00
swap_slots.c mm, swap_slots: remove slot cache for freeing path 2025-01-25 20:22:37 -08:00
swap_state.c mm: remove unnecessary calls to lru_add_drain 2025-01-25 20:22:21 -08:00
swapfile.c mm, swap: fix reclaim offset calculation error during allocation 2025-02-01 03:53:26 -08:00
truncate.c mm/truncate: add folio_unmap_invalidate() helper 2025-01-25 20:22:43 -08:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() 2025-01-13 22:40:46 -08:00
util.c mm: add comments to do_mmap(), mmap_region() and vm_mmap() 2025-01-13 22:40:59 -08:00
vma.c mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
vma.h mm: make mmap_region() internal 2025-01-25 20:22:38 -08:00
vma_internal.h mm/vma: move brk() internals to mm/vma.c 2025-01-13 22:40:42 -08:00
vmalloc.c mm: alloc_pages_bulk: rename API 2025-01-25 20:22:31 -08:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c mm/vmscan: accumulate nr_demoted for accurate demotion statistics 2025-02-01 03:53:24 -08:00
vmstat.c vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-12 19:03:38 -08:00
workingset.c mm/mglru: rework workingset protection 2025-01-25 20:22:39 -08:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpdesc.h mm/zsmalloc: introduce __zpdesc_clear/set_zsmalloc() 2025-01-25 20:22:35 -08:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc() 2025-02-01 03:53:23 -08:00
zswap.c mm/zswap: fix inconsistency when zswap_store_page() fails 2025-02-17 22:40:01 -08:00