1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/fs
Zhihao Cheng 3b67db8a6c ubifs: Fix to add refcount once page is set private
MM defined the rule [1] very clearly that once page was set with PG_private
flag, we should increment the refcount in that page, also main flows like
pageout(), migrate_page() will assume there is one additional page
reference count if page_has_private() returns true. Otherwise, we may
get a BUG in page migration:

  page:0000000080d05b9d refcount:-1 mapcount:0 mapping:000000005f4d82a8
  index:0xe2 pfn:0x14c12
  aops:ubifs_file_address_operations [ubifs] ino:8f1 dentry name:"f30e"
  flags: 0x1fffff80002405(locked|uptodate|owner_priv_1|private|node=0|
  zone=1|lastcpupid=0x1fffff)
  page dumped because: VM_BUG_ON_PAGE(page_count(page) != 0)
  ------------[ cut here ]------------
  kernel BUG at include/linux/page_ref.h:184!
  invalid opcode: 0000 [#1] SMP
  CPU: 3 PID: 38 Comm: kcompactd0 Not tainted 5.15.0-rc5
  RIP: 0010:migrate_page_move_mapping+0xac3/0xe70
  Call Trace:
    ubifs_migrate_page+0x22/0xc0 [ubifs]
    move_to_new_page+0xb4/0x600
    migrate_pages+0x1523/0x1cc0
    compact_zone+0x8c5/0x14b0
    kcompactd+0x2bc/0x560
    kthread+0x18c/0x1e0
    ret_from_fork+0x1f/0x30

Before the time, we should make clean a concept, what does refcount means
in page gotten from grab_cache_page_write_begin(). There are 2 situations:
Situation 1: refcount is 3, page is created by __page_cache_alloc.
  TYPE_A - the write process is using this page
  TYPE_B - page is assigned to one certain mapping by calling
	   __add_to_page_cache_locked()
  TYPE_C - page is added into pagevec list corresponding current cpu by
	   calling lru_cache_add()
Situation 2: refcount is 2, page is gotten from the mapping's tree
  TYPE_B - page has been assigned to one certain mapping
  TYPE_A - the write process is using this page (by calling
	   page_cache_get_speculative())
Filesystem releases one refcount by calling put_page() in xxx_write_end(),
the released refcount corresponds to TYPE_A (write task is using it). If
there are any processes using a page, page migration process will skip the
page by judging whether expected_page_refs() equals to page refcount.

The BUG is caused by following process:
    PA(cpu 0)                           kcompactd(cpu 1)
				compact_zone
ubifs_write_begin
  page_a = grab_cache_page_write_begin
    add_to_page_cache_lru
      lru_cache_add
        pagevec_add // put page into cpu 0's pagevec
  (refcnf = 3, for page creation process)
ubifs_write_end
  SetPagePrivate(page_a) // doesn't increase page count !
  unlock_page(page_a)
  put_page(page_a)  // refcnt = 2
				[...]

    PB(cpu 0)
filemap_read
  filemap_get_pages
    add_to_page_cache_lru
      lru_cache_add
        __pagevec_lru_add // traverse all pages in cpu 0's pagevec
	  __pagevec_lru_add_fn
	    SetPageLRU(page_a)
				isolate_migratepages
                                  isolate_migratepages_block
				    get_page_unless_zero(page_a)
				    // refcnt = 3
                                      list_add(page_a, from_list)
				migrate_pages(from_list)
				  __unmap_and_move
				    move_to_new_page
				      ubifs_migrate_page(page_a)
				        migrate_page_move_mapping
					  expected_page_refs get 3
                                  (migration[1] + mapping[1] + private[1])
	 release_pages
	   put_page_testzero(page_a) // refcnt = 3
                                          page_ref_freeze  // refcnt = 0
	     page_ref_dec_and_test(0 - 1 = -1)
                                          page_ref_unfreeze
                                            VM_BUG_ON_PAGE(-1 != 0, page)

UBIFS doesn't increase the page refcount after setting private flag, which
leads to page migration task believes the page is not used by any other
processes, so the page is migrated. This causes concurrent accessing on
page refcount between put_page() called by other process(eg. read process
calls lru_cache_add) and page_ref_unfreeze() called by migration task.

Actually zhangjun has tried to fix this problem [2] by recalculating page
refcnt in ubifs_migrate_page(). It's better to follow MM rules [1], because
just like Kirill suggested in [2], we need to check all users of
page_has_private() helper. Like f2fs does in [3], fix it by adding/deleting
refcount when setting/clearing private for a page. BTW, according to [4],
we set 'page->private' as 1 because ubifs just simply SetPagePrivate().
And, [5] provided a common helper to set/clear page private, ubifs can
use this helper following the example of iomap, afs, btrfs, etc.

Jump [6] to find a reproducer.

[1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com
[2] https://www.spinics.net/lists/linux-mtd/msg04018.html
[3] http://lkml.iu.edu/hypermail/linux/kernel/1903.0/03313.html
[4] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org
[5] https://lore.kernel.org/all/20200517214718.468-1-guoqing.jiang@cloud.ionos.com
[6] https://bugzilla.kernel.org/show_bug.cgi?id=214961

Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-01-10 23:08:14 +01:00
..
9p netfs, 9p, afs, ceph: Use folios 2021-11-10 21:16:56 +00:00
adfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
affs affs: use bdev_nr_sectors instead of open coding it 2021-10-18 14:43:22 -06:00
afs afs: Fix mmap 2021-12-16 09:10:13 -08:00
autofs autofs: fix wait name hash calculation in autofs_wait() 2021-10-20 21:09:02 -04:00
befs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs for-5.16-rc5-tag 2021-12-17 13:50:58 -08:00
cachefiles for-5.16/ki_complete-2021-10-29 2021-11-01 10:17:11 -07:00
ceph ceph: fix up non-directory creation in SGID directories 2021-12-01 17:08:27 +01:00
cifs cifs: sanitize multiple delimiters in prepath 2021-12-17 19:16:49 -06:00
coda coda: bump module version to 7.2 2021-11-09 10:02:51 -08:00
configfs configfs: fix a race in configfs_lookup() 2021-08-25 07:58:49 +02:00
cramfs cramfs: use bdev_nr_bytes instead of open coding it 2021-10-18 14:43:22 -06:00
crypto fscrypt: improve a few comments 2021-10-25 19:11:50 -07:00
debugfs debugfs: debugfs_create_file_size(): use IS_ERR to check for error 2021-09-21 09:09:06 +02:00
devpts
dlm fs: dlm: avoid comms shutdown delay in release_lockspace 2021-09-01 11:29:14 -05:00
ecryptfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
efivarfs efivars: convert to fileattr 2021-04-12 15:04:29 +02:00
efs
erofs erofs: fix deadlock when shrink erofs slab 2021-11-23 14:58:16 +08:00
exfat exfat: fix incorrect loading of i_blocks for large files 2021-11-01 07:49:21 +09:00
exportfs
ext2 ext2: fix sleeping in atomic bugs on error 2021-09-22 13:05:23 +02:00
ext4 Only bug fixes and cleanups for ext4 this merge window. Of note are 2021-11-10 17:05:37 -08:00
f2fs Update to zstd-1.4.10 2021-11-13 15:32:30 -08:00
fat for-5.16/inode-sync-2021-10-29 2021-11-01 10:25:27 -07:00
freevxfs
fscache fscache: Remove an unused static variable 2021-10-04 22:13:12 +01:00
fuse fuse: release pipe buf after last use 2021-11-25 14:05:18 +01:00
gfs2 gfs2: gfs2_create_inode rework 2021-12-02 12:41:10 +01:00
hfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
hfsplus Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
hostfs hostfs: support splice_write 2021-08-26 22:28:02 +02:00
hpfs treewide: Replace open-coded flex arrays in unions 2021-10-18 12:28:53 -07:00
hugetlbfs mm,hugetlb: remove mlock ulimit for SHM_HUGETLB 2021-11-09 10:02:48 -08:00
iomap iomap: iomap_read_inline_data cleanup 2021-11-24 10:15:47 -08:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-10-19 12:51:02 +02:00
jbd2 jbd2: add sparse annotations for add_transaction_credits() 2021-08-30 23:36:50 -04:00
jffs2 jffs2: GC deadlock reading a page that is used in jffs2_write_begin() 2021-12-23 22:33:41 +01:00
jfs Just one JFS patch 2021-11-03 09:23:25 -07:00
kernfs Merge 5.15-rc6 into driver-core-next 2021-10-18 09:43:37 +02:00
ksmbd ksmbd: fix memleak in get_file_stream_info() 2021-11-25 00:09:26 -06:00
lockd A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping 2021-11-10 16:45:54 -08:00
minix mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
netfs netfs: fix parameter of cleanup() 2021-12-07 15:47:09 +00:00
nfs NFS client bugfixes for Linux 5.16 2021-11-27 10:33:55 -08:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd nfsd: fix use-after-free due to delegation race 2021-12-10 11:55:15 -05:00
nilfs2 Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
nls
notify fanotify: Allow users to request FAN_FS_ERROR events 2021-10-27 12:53:45 +02:00
ntfs fs: ntfs: Limit NTFS_RW to page sizes smaller than 64k 2021-11-27 14:34:41 -08:00
ntfs3 gfs2: Fix mmap + page fault deadlocks 2021-11-02 12:25:03 -07:00
ocfs2 Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs openpromfs: don't do unlock_new_inode() until the new inode is set up 2021-03-12 22:15:22 -05:00
orangefs orangefs: three fixes from other folks... 2021-11-09 10:34:06 -08:00
overlayfs overlayfs update for 5.16 2021-11-09 10:51:12 -08:00
proc proc/vmcore: fix clearing user buffer by properly using clear_user() 2021-11-20 10:35:55 -08:00
pstore pstore/blk: Use "%lu" to format unsigned long 2021-11-21 09:44:19 -08:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota \n 2021-11-06 16:40:48 -07:00
ramfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
reiserfs \n 2021-11-06 16:40:48 -07:00
romfs
smbfs_common cifs: Fix crash on unload of cifs_arc4.ko 2021-12-07 22:38:03 -06:00
squashfs lib: zstd: Add kernel-specific API 2021-11-08 16:55:21 -08:00
sysfs fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() 2021-10-05 16:35:05 +02:00
sysv sysv: use BUILD_BUG_ON instead of runtime check 2021-11-09 10:02:52 -08:00
tracefs tracefs: Set all files to the same group ownership as the mount option 2021-12-08 08:06:40 -05:00
ubifs ubifs: Fix to add refcount once page is set private 2022-01-10 23:08:14 +01:00
udf udf: Fix crash after seekdir 2021-11-09 12:53:58 +01:00
ufs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
unicode .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
vboxsf vboxfs: fix broken legacy mount signature checking 2021-09-27 11:26:21 -07:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-09-22 10:56:34 -07:00
xfs xfs: remove all COW fork extents when remounting readonly 2021-12-07 10:17:29 -08:00
zonefs zonefs: add MODULE_ALIAS_FS 2021-12-17 16:56:35 +09:00
aio.c aio: Fix incorrect usage of eventfd_signal_allowed() 2021-12-09 10:52:55 -08:00
anon_inodes.c fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() 2021-09-19 22:35:37 -04:00
attr.c fs: handle circular mappings correctly 2021-11-17 09:26:09 +01:00
bad_inode.c vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf.c Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
binfmt_elf_fdpic.c coredump: Limit coredumps to a single thread group 2021-10-08 12:06:02 -05:00
binfmt_flat.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_misc.c binfmt_misc: fix possible deadlock in bm_register_write 2021-03-13 11:27:30 -08:00
binfmt_script.c
buffer.c fs: simplify init_page_buffers 2021-10-18 14:43:22 -06:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: Limit coredumps to a single thread group 2021-10-08 12:06:02 -05:00
d_path.c d_path: fix Kernel doc validator complaining 2021-11-06 13:30:32 -07:00
dax.c New code for 5.15: 2021-08-31 11:13:35 -07:00
dcache.c useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
direct-io.c fs: get rid of the res2 iocb->ki_complete argument 2021-10-25 10:36:24 -06:00
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: Export eventfd_wake_count to modules 2021-09-06 07:20:56 -04:00
eventpoll.c ARM development updates for 5.15: 2021-09-09 13:25:49 -07:00
exec.c Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c switch file_open_root() to struct path 2021-04-07 13:56:43 -04:00
file.c fget: clarify and improve __fget_files() implementation 2021-12-13 10:55:30 -08:00
file_table.c
filesystems.c fs: simplify get_filesystem_list / get_all_fs_names 2021-08-23 01:25:40 -04:00
fs-writeback.c Various hardening fixes and cleanups for 5.16-rc1 2021-11-01 17:29:10 -07:00
fs_context.c memcg: charge fs_context and legacy_fs_context 2021-09-03 09:58:12 -07:00
fs_parser.c namei: Standardize callers of filename_lookup() 2021-09-07 16:07:47 -04:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fs: Remove FS_THP_SUPPORT 2021-11-17 10:36:35 -05:00
internal.h Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
io-wq.c io-wq: drop wqe lock before creating new worker 2021-12-13 09:04:01 -07:00
io-wq.h io_uring: optimise INIT_WQ_LIST 2021-10-19 05:49:54 -06:00
io_uring.c io_uring: ensure task_work gets run as part of cancelations 2021-12-10 13:56:28 -07:00
ioctl.c New code for 5.15: 2021-08-31 11:06:32 -07:00
Kconfig 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c libfs: Support RENAME_EXCHANGE in simple_rename() 2021-11-03 15:43:08 +01:00
locks.c locks: remove changelog comments 2021-10-19 14:11:39 -04:00
Makefile 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
mbcache.c
mount.h
mpage.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
namei.c File locking changes for v5.16 2021-11-01 09:06:53 -07:00
namespace.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
no-block.c
nsfs.c
open.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
pipe.c Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" 2021-09-07 11:03:45 -07:00
pnode.c
pnode.h mount: fix mounting of detached mounts onto targets that reside on shared mounts 2021-03-08 15:18:43 +01:00
posix_acl.c fs/posix_acl.c: avoid -Wempty-body warning 2021-11-06 13:30:32 -07:00
proc_namespace.c
read_write.c fs: remove leftover comments from mandatory locking removal 2021-10-26 12:20:50 -04:00
readdir.c readdir: make sure to verify directory entry for legacy interfaces too 2021-04-17 11:39:49 -07:00
remap_range.c fs: remove mandatory file locking support 2021-08-23 06:15:36 -04:00
select.c Revert "memcg: enable accounting for pollfd and select bits arrays" 2021-09-07 11:26:23 -07:00
seq_file.c seq_file: move seq_escape() to a header 2021-11-09 10:02:52 -08:00
signalfd.c signalfd: use wake_up_pollfree() 2021-12-09 10:49:56 -08:00
splice.c
stack.c
stat.c fs: add generic helper for filling statx attribute flags 2021-08-17 11:47:43 +02:00
statfs.c
super.c fs: explicitly unregister per-superblock BDIs 2021-11-06 13:30:34 -07:00
sync.c block: simplify the block device syncing code 2021-10-22 08:36:55 -06:00
timerfd.c timerfd: Provide timerfd_resume() 2021-08-10 17:57:22 +02:00
userfaultfd.c userfaultfd: fix a race between writeprotect and exit_mmap() 2021-10-18 20:22:02 -10:00
utimes.c
xattr.c xattr: fix kernel-doc for mnt_userns and vfs xattr helpers 2021-03-23 11:20:26 +01:00