- More userfaultfs work from Peter Xu. - Several convert-to-folios series from Sidhartha Kumar and Huang Ying. - Some filemap cleanups from Vishal Moola. - David Hildenbrand added the ability to selftest anon memory COW handling. - Some cpuset simplifications from Liu Shixin. - Addition of vmalloc tracing support by Uladzislau Rezki. - Some pagecache folioifications and simplifications from Matthew Wilcox. - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it. - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series shold have been in the non-MM tree, my bad. - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages. - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages. - Peter Xu utilized the PTE marker code for handling swapin errors. - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient. - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand. - zram support for multiple compression streams from Sergey Senozhatsky. - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway. - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations. - Vishal Moola removed the try_to_release_page() wrapper. - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache. - David Hildenbrand did some cleanup and repair work on KSM COW breaking. - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend. - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range(). - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen. - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect. - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages(). - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting. - David Hildenbrand has fixed some of our MM selftests for 32-bit machines. - Many singleton patches, as usual. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml CodAgiA51qwzId3GRytIo/tfWZSezgA= =d19R -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
195 lines
5.4 KiB
C
195 lines
5.4 KiB
C
// SPDX-License-Identifier: GPL-2.0
|
|
#include <linux/kernel.h>
|
|
#include <linux/errno.h>
|
|
#include <linux/err.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/slab.h>
|
|
#include <linux/vmalloc.h>
|
|
#include <linux/pagemap.h>
|
|
#include <linux/sched.h>
|
|
|
|
#include <media/frame_vector.h>
|
|
|
|
/**
|
|
* get_vaddr_frames() - map virtual addresses to pfns
|
|
* @start: starting user address
|
|
* @nr_frames: number of pages / pfns from start to map
|
|
* @write: the mapped address has write permission
|
|
* @vec: structure which receives pages / pfns of the addresses mapped.
|
|
* It should have space for at least nr_frames entries.
|
|
*
|
|
* This function maps virtual addresses from @start and fills @vec structure
|
|
* with page frame numbers or page pointers to corresponding pages (choice
|
|
* depends on the type of the vma underlying the virtual address). If @start
|
|
* belongs to a normal vma, the function grabs reference to each of the pages
|
|
* to pin them in memory. If @start belongs to VM_IO | VM_PFNMAP vma, we don't
|
|
* touch page structures and the caller must make sure pfns aren't reused for
|
|
* anything else while he is using them.
|
|
*
|
|
* The function returns number of pages mapped which may be less than
|
|
* @nr_frames. In particular we stop mapping if there are more vmas of
|
|
* different type underlying the specified range of virtual addresses.
|
|
* When the function isn't able to map a single page, it returns error.
|
|
*
|
|
* This function takes care of grabbing mmap_lock as necessary.
|
|
*/
|
|
int get_vaddr_frames(unsigned long start, unsigned int nr_frames, bool write,
|
|
struct frame_vector *vec)
|
|
{
|
|
int ret;
|
|
unsigned int gup_flags = FOLL_LONGTERM;
|
|
|
|
if (nr_frames == 0)
|
|
return 0;
|
|
|
|
if (WARN_ON_ONCE(nr_frames > vec->nr_allocated))
|
|
nr_frames = vec->nr_allocated;
|
|
|
|
start = untagged_addr(start);
|
|
|
|
if (write)
|
|
gup_flags |= FOLL_WRITE;
|
|
|
|
ret = pin_user_pages_fast(start, nr_frames, gup_flags,
|
|
(struct page **)(vec->ptrs));
|
|
vec->got_ref = true;
|
|
vec->is_pfns = false;
|
|
vec->nr_frames = ret;
|
|
|
|
if (likely(ret > 0))
|
|
return ret;
|
|
|
|
/* This used to (racily) return non-refcounted pfns. Let people know */
|
|
WARN_ONCE(1, "get_vaddr_frames() cannot follow VM_IO mapping");
|
|
vec->nr_frames = 0;
|
|
return ret ? ret : -EFAULT;
|
|
}
|
|
EXPORT_SYMBOL(get_vaddr_frames);
|
|
|
|
/**
|
|
* put_vaddr_frames() - drop references to pages if get_vaddr_frames() acquired
|
|
* them
|
|
* @vec: frame vector to put
|
|
*
|
|
* Drop references to pages if get_vaddr_frames() acquired them. We also
|
|
* invalidate the frame vector so that it is prepared for the next call into
|
|
* get_vaddr_frames().
|
|
*/
|
|
void put_vaddr_frames(struct frame_vector *vec)
|
|
{
|
|
struct page **pages;
|
|
|
|
if (!vec->got_ref)
|
|
goto out;
|
|
pages = frame_vector_pages(vec);
|
|
/*
|
|
* frame_vector_pages() might needed to do a conversion when
|
|
* get_vaddr_frames() got pages but vec was later converted to pfns.
|
|
* But it shouldn't really fail to convert pfns back...
|
|
*/
|
|
if (WARN_ON(IS_ERR(pages)))
|
|
goto out;
|
|
|
|
unpin_user_pages(pages, vec->nr_frames);
|
|
vec->got_ref = false;
|
|
out:
|
|
vec->nr_frames = 0;
|
|
}
|
|
EXPORT_SYMBOL(put_vaddr_frames);
|
|
|
|
/**
|
|
* frame_vector_to_pages - convert frame vector to contain page pointers
|
|
* @vec: frame vector to convert
|
|
*
|
|
* Convert @vec to contain array of page pointers. If the conversion is
|
|
* successful, return 0. Otherwise return an error. Note that we do not grab
|
|
* page references for the page structures.
|
|
*/
|
|
int frame_vector_to_pages(struct frame_vector *vec)
|
|
{
|
|
int i;
|
|
unsigned long *nums;
|
|
struct page **pages;
|
|
|
|
if (!vec->is_pfns)
|
|
return 0;
|
|
nums = frame_vector_pfns(vec);
|
|
for (i = 0; i < vec->nr_frames; i++)
|
|
if (!pfn_valid(nums[i]))
|
|
return -EINVAL;
|
|
pages = (struct page **)nums;
|
|
for (i = 0; i < vec->nr_frames; i++)
|
|
pages[i] = pfn_to_page(nums[i]);
|
|
vec->is_pfns = false;
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(frame_vector_to_pages);
|
|
|
|
/**
|
|
* frame_vector_to_pfns - convert frame vector to contain pfns
|
|
* @vec: frame vector to convert
|
|
*
|
|
* Convert @vec to contain array of pfns.
|
|
*/
|
|
void frame_vector_to_pfns(struct frame_vector *vec)
|
|
{
|
|
int i;
|
|
unsigned long *nums;
|
|
struct page **pages;
|
|
|
|
if (vec->is_pfns)
|
|
return;
|
|
pages = (struct page **)(vec->ptrs);
|
|
nums = (unsigned long *)pages;
|
|
for (i = 0; i < vec->nr_frames; i++)
|
|
nums[i] = page_to_pfn(pages[i]);
|
|
vec->is_pfns = true;
|
|
}
|
|
EXPORT_SYMBOL(frame_vector_to_pfns);
|
|
|
|
/**
|
|
* frame_vector_create() - allocate & initialize structure for pinned pfns
|
|
* @nr_frames: number of pfns slots we should reserve
|
|
*
|
|
* Allocate and initialize struct pinned_pfns to be able to hold @nr_pfns
|
|
* pfns.
|
|
*/
|
|
struct frame_vector *frame_vector_create(unsigned int nr_frames)
|
|
{
|
|
struct frame_vector *vec;
|
|
int size = sizeof(struct frame_vector) + sizeof(void *) * nr_frames;
|
|
|
|
if (WARN_ON_ONCE(nr_frames == 0))
|
|
return NULL;
|
|
/*
|
|
* This is absurdly high. It's here just to avoid strange effects when
|
|
* arithmetics overflows.
|
|
*/
|
|
if (WARN_ON_ONCE(nr_frames > INT_MAX / sizeof(void *) / 2))
|
|
return NULL;
|
|
/*
|
|
* Avoid higher order allocations, use vmalloc instead. It should
|
|
* be rare anyway.
|
|
*/
|
|
vec = kvmalloc(size, GFP_KERNEL);
|
|
if (!vec)
|
|
return NULL;
|
|
vec->nr_allocated = nr_frames;
|
|
vec->nr_frames = 0;
|
|
return vec;
|
|
}
|
|
EXPORT_SYMBOL(frame_vector_create);
|
|
|
|
/**
|
|
* frame_vector_destroy() - free memory allocated to carry frame vector
|
|
* @vec: Frame vector to free
|
|
*
|
|
* Free structure allocated by frame_vector_create() to carry frames.
|
|
*/
|
|
void frame_vector_destroy(struct frame_vector *vec)
|
|
{
|
|
/* Make sure put_vaddr_frames() got called properly... */
|
|
VM_BUG_ON(vec->nr_frames > 0);
|
|
kvfree(vec);
|
|
}
|
|
EXPORT_SYMBOL(frame_vector_destroy);
|