1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

56429 commits

Author SHA1 Message Date
David S. Miller
89b1698c93 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
The BTF conflicts were simple overlapping changes.

The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02 10:55:32 -07:00
Phillip Lougher
a3f94cb99a Squashfs: Compute expected length from inode size rather than block length
Previously in squashfs_readpage() when copying data into the page
cache, it used the length of the datablock read from the filesystem
(after decompression).  However, if the filesystem has been corrupted
this data block may be short, which will leave pages unfilled.

The fix for this is to compute the expected number of bytes to copy
from the inode size, and use this to detect if the block is short.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Tested-by: Willy Tarreau <w@1wt.eu>
Cc: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02 09:34:02 -07:00
Linus Torvalds
71755ee535 squashfs: more metadata hardening
The squashfs fragment reading code doesn't actually verify that the
fragment is inside the fragment table.  The end result _is_ verified to
be inside the image when actually reading the fragment data, but before
that is done, we may end up taking a page fault because the fragment
table itself might not even exist.

Another report from Anatoly and his endless squashfs image fuzzing.

Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com>
Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>,
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02 09:32:23 -07:00
Liu Song
bc71652346 ext4: improve code readability in ext4_iget()
Merge the duplicated complex conditions to improve code readability.

Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
2018-08-02 00:11:16 -04:00
Jeremy Cline
1a5d5e5d51 ext4: fix spectre gadget in ext4_mb_regular_allocator()
'ac->ac_g_ex.fe_len' is a user-controlled value which is used in the
derivation of 'ac->ac_2order'. 'ac->ac_2order', in turn, is used to
index arrays which makes it a potential spectre gadget. Fix this by
sanitizing the value assigned to 'ac->ac2_order'.  This covers the
following accesses found with the help of smatch:

* fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential
  spectre issue 'grp->bb_counters' [w] (local cap)

* fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue
  'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap)

* fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue
  'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap)

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-08-02 00:03:40 -04:00
Al Viro
c971e6a006 kill d_instantiate_no_diralias()
The only user is fuse_create_new_entry(), and there it's used to
mitigate the same mkdir/open-by-handle race as in nfs_mkdir().
The same solution applies - unhash the mkdir argument, then
call d_splice_alias() and if that returns a reference to preexisting
alias, dput() and report success.  ->mkdir() argument left unhashed
negative with the preexisting alias moved in the right place is just
fine from the ->mkdir() callers point of view.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-01 23:18:53 -04:00
Trond Myklebust
6ea76bf513 NFSv4: Fix _nfs4_do_setlk()
The patch to fix the case where a lock request was interrupted ended up
changing default handling of errors such as NFS4ERR_DENIED and caused the
client to immediately resend the lock request. Let's do a partial revert
of that request so that the default is now to exit, but change the way
we handle resends to take into account the fact that the user may have
interrupted the request.

Reported-by: Kenneth Johansson <ken@kenjo.org>
Fixes: a3cf9bca2a ("NFSv4: Don't add a new lock on an interrupted wait..")
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2018-08-01 23:17:06 -04:00
Christoph Hellwig
006477f40d kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt
No need to have this in the top-level Kconfig.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:06:55 +09:00
Chao Yu
82cf4f132e f2fs: fix to active page in lru list for read path
If config CONFIG_F2FS_FAULT_INJECTION is on, for both read or write path
we will call find_lock_page() to get the page, but for read path, it
missed to passing FGP_ACCESSED to allocator to active the page in LRU
list, result in being reclaimed in advance incorrectly, fix it.

Reported-by: Xianrong Zhou <zhouxianrong@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
18767e6263 f2fs: don't keep meta pages used for block migration
For migration of encrypted inode's block, we load data of encrypted block
into meta inode's page cache, after checkpoint, those all intermediate
pages should be clean, and no one will read them again, so let's just
release them for more memory.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
4ddc1b28aa f2fs: fix to restrict mount condition when without CONFIG_QUOTA
Like quota_ino feature, we need to reject mounting RDWR with image
which enables project_quota feature when there is no CONFIG_QUOTA
be set in kernel.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Sheng Yong
00960c2cd8 f2fs: quota: do not mount as RDWR without QUOTA if quota feature enabled
If quota feature is enabled, quota is on by default. However, if
CONFIG_QUOTA is not built in kernel, dquot entries will not get updated,
which leads to quota inconsistency.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Sheng Yong
76cf05d79c f2fs: quota: fix incorrect comments
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Sheng Yong
955ac6e523 f2fs: quota: decrease the lock granularity of statfs_project
According to fs/quota/dquot.c, `dq_data_lock' protects mem_dqinfo
structures and modifications of dquot pointers in the inode, and
`dquot->dq_dqb_lock' protects data from dq_dqb.

We should use dquot->dq_dqb_lock in statfs_project instead of
dq_dat_lock.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Yunlong Song
970e348d98 f2fs: add proc entry to show victim_secmap bitmap
This patch adds a new proc entry to show victim_secmap information in
more detail, which is very helpful to know the get_victim candidate
status clearly, and helpful to debug problems (e.g., some sections can
not gc all of its blocks, since some blocks belong to atomic file,
leaving victim_secmap with section bit setting, in extrem case, this
will lead all bytes of victim_secmap setting with 0xff).

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
fd8c8caf7e f2fs: let checkpoint flush dnode page of regular
Fsyncer will wait on all dnode pages of regular writeback before flushing,
if there are async dnode pages blocked by IO scheduler, it may decrease
fsync's performance.

In this patch, we choose to let f2fs_balance_fs_bg() to trigger checkpoint
to flush these dnode pages of regular, so async IO of dnode page can be
elimitnated, making fsyncer only need to wait for sync IO.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Yunlong Song
ad6672bbc5 f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:

round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued

round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;

round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...

To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Jaegeuk Kim
455e3a5887 f2fs: don't allow any writes on aborted atomic writes
In order to prevent abusing atomic writes by abnormal users, we've added a
threshold, 20% over memory footprint, which disallows further atomic writes.
Previously, however, SQLite doesn't know the files became normal, so that
it could write stale data and commit on revoked normal database file.

Once f2fs detects such the abnormal behavior, this patch tries to avoid further
writes in write_begin().

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
797c1cb56b f2fs: restrict setting up inode.i_advise
In order to give advise to f2fs to recognize hot/cold file, it is possible
that we can set specific bit in inode.i_advise through setxattr(), but
there are several bits which are used internally, such as encrypt_bit,
keep_size_bit, they should never be changed through setxattr().

So that this patch 1) adds FADVISE_MODIFIABLE_BITS to filter modifiable
bits user given, 2) supports to clear {hot,cold}_file bits.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Yunlei He
e6b0b159cf f2fs: fix wrong kernel message when recover fsync data on ro fs
This patch fix wrong message info for recover fsync data
on readonly fs.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
059c0648c6 f2fs: clean up ioctl interface naming
Romve redundant prefix 'f2fs_' in the middle of f2fs_ioc_f2fs_write_checkpoint().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
2079f115e7 f2fs: clean up with f2fs_is_{atomic,volatile}_file()
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
5b72d5e0df f2fs: clean up with f2fs_encrypted_inode()
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
80551d1773 f2fs: clean up with get_current_nat_page
Just cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
6122003a1a f2fs: kill EXT_TREE_VEC_SIZE
Since commit 201ef5e080 ("f2fs: improve shrink performance of extent nodes"),
there is no user of EXT_TREE_VEC_SIZE, just kill it for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Hyunchul Lee
5d3ce4f701 f2fs: avoid duplicated permission check for "trusted." xattrs
Because xattr_permission already checks CAP_SYS_ADMIN
capability, we don't need to check it.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
7735730d39 f2fs: fix to propagate error from __get_meta_page()
If caller of __get_meta_page() can handle error, let's propagate error
from __get_meta_page().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
18dd6470c2 f2fs: fix to do sanity check with i_extra_isize
If inode.i_extra_isize was fuzzed to an abnormal value, when
calculating inline data size, the result will overflow, result
in accessing invalid memory area when operating inline data.

Let's do sanity check with i_extra_isize during inode loading
for fixing.

https://bugzilla.kernel.org/show_bug.cgi?id=200421

- Reproduce

- POC (poc.c)
    #define _GNU_SOURCE
    #include <sys/types.h>
    #include <sys/mount.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <sys/xattr.h>

    #include <dirent.h>
    #include <errno.h>
    #include <error.h>
    #include <fcntl.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>

    #include <linux/falloc.h>
    #include <linux/loop.h>

    static void activity(char *mpoint) {

      char *foo_bar_baz;
      char *foo_baz;
      char *xattr;
      int err;

      err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
      err = asprintf(&foo_baz, "%s/foo/baz", mpoint);
      err = asprintf(&xattr, "%s/foo/bar/xattr", mpoint);

      rename(foo_bar_baz, foo_baz);

      char buf2[113];
      memset(buf2, 0, sizeof(buf2));
      listxattr(xattr, buf2, sizeof(buf2));
      removexattr(xattr, "user.mime_type");

    }

    int main(int argc, char *argv[]) {
      activity(argv[1]);
      return 0;
    }

- Kernel message
Umount the image will leave the following message
[ 2910.995489] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 2918.416465] ==================================================================
[ 2918.416807] BUG: KASAN: slab-out-of-bounds in f2fs_iget+0xcb9/0x1a80
[ 2918.417009] Read of size 4 at addr ffff88018efc2068 by task a.out/1229

[ 2918.417311] CPU: 1 PID: 1229 Comm: a.out Not tainted 4.17.0+ #1
[ 2918.417314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 2918.417323] Call Trace:
[ 2918.417366]  dump_stack+0x71/0xab
[ 2918.417401]  print_address_description+0x6b/0x290
[ 2918.417407]  kasan_report+0x28e/0x390
[ 2918.417411]  ? f2fs_iget+0xcb9/0x1a80
[ 2918.417415]  f2fs_iget+0xcb9/0x1a80
[ 2918.417422]  ? f2fs_lookup+0x2e7/0x580
[ 2918.417425]  f2fs_lookup+0x2e7/0x580
[ 2918.417433]  ? __recover_dot_dentries+0x400/0x400
[ 2918.417447]  ? legitimize_path.isra.29+0x5a/0xa0
[ 2918.417453]  __lookup_slow+0x11c/0x220
[ 2918.417457]  ? may_delete+0x2a0/0x2a0
[ 2918.417475]  ? deref_stack_reg+0xe0/0xe0
[ 2918.417479]  ? __lookup_hash+0xb0/0xb0
[ 2918.417483]  lookup_slow+0x3e/0x60
[ 2918.417488]  walk_component+0x3ac/0x990
[ 2918.417492]  ? generic_permission+0x51/0x1e0
[ 2918.417495]  ? inode_permission+0x51/0x1d0
[ 2918.417499]  ? pick_link+0x3e0/0x3e0
[ 2918.417502]  ? link_path_walk+0x4b1/0x770
[ 2918.417513]  ? _raw_spin_lock_irqsave+0x25/0x50
[ 2918.417518]  ? walk_component+0x990/0x990
[ 2918.417522]  ? path_init+0x2e6/0x580
[ 2918.417526]  path_lookupat+0x13f/0x430
[ 2918.417531]  ? trailing_symlink+0x3a0/0x3a0
[ 2918.417534]  ? do_renameat2+0x270/0x7b0
[ 2918.417538]  ? __kasan_slab_free+0x14c/0x190
[ 2918.417541]  ? do_renameat2+0x270/0x7b0
[ 2918.417553]  ? kmem_cache_free+0x85/0x1e0
[ 2918.417558]  ? do_renameat2+0x270/0x7b0
[ 2918.417563]  filename_lookup+0x13c/0x280
[ 2918.417567]  ? filename_parentat+0x2b0/0x2b0
[ 2918.417572]  ? kasan_unpoison_shadow+0x31/0x40
[ 2918.417575]  ? kasan_kmalloc+0xa6/0xd0
[ 2918.417593]  ? strncpy_from_user+0xaa/0x1c0
[ 2918.417598]  ? getname_flags+0x101/0x2b0
[ 2918.417614]  ? path_listxattr+0x87/0x110
[ 2918.417619]  path_listxattr+0x87/0x110
[ 2918.417623]  ? listxattr+0xc0/0xc0
[ 2918.417637]  ? mm_fault_error+0x1b0/0x1b0
[ 2918.417654]  do_syscall_64+0x73/0x160
[ 2918.417660]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2918.417676] RIP: 0033:0x7f2f3a3480d7
[ 2918.417677] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48
[ 2918.417732] RSP: 002b:00007fff4095b7d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000c2
[ 2918.417744] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2f3a3480d7
[ 2918.417746] RDX: 0000000000000071 RSI: 00007fff4095b810 RDI: 000000000126a0c0
[ 2918.417749] RBP: 00007fff4095b890 R08: 000000000126a010 R09: 0000000000000000
[ 2918.417751] R10: 00000000000001ab R11: 0000000000000206 R12: 00000000004005e0
[ 2918.417753] R13: 00007fff4095b990 R14: 0000000000000000 R15: 0000000000000000

[ 2918.417853] Allocated by task 329:
[ 2918.418002]  kasan_kmalloc+0xa6/0xd0
[ 2918.418007]  kmem_cache_alloc+0xc8/0x1e0
[ 2918.418023]  mempool_init_node+0x194/0x230
[ 2918.418027]  mempool_init+0x12/0x20
[ 2918.418042]  bioset_init+0x2bd/0x380
[ 2918.418052]  blk_alloc_queue_node+0xe9/0x540
[ 2918.418075]  dm_create+0x2c0/0x800
[ 2918.418080]  dev_create+0xd2/0x530
[ 2918.418083]  ctl_ioctl+0x2a3/0x5b0
[ 2918.418087]  dm_ctl_ioctl+0xa/0x10
[ 2918.418092]  do_vfs_ioctl+0x13e/0x8c0
[ 2918.418095]  ksys_ioctl+0x66/0x70
[ 2918.418098]  __x64_sys_ioctl+0x3d/0x50
[ 2918.418102]  do_syscall_64+0x73/0x160
[ 2918.418106]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 2918.418204] Freed by task 0:
[ 2918.418301] (stack is not available)

[ 2918.418521] The buggy address belongs to the object at ffff88018efc0000
                which belongs to the cache biovec-max of size 8192
[ 2918.418894] The buggy address is located 104 bytes to the right of
                8192-byte region [ffff88018efc0000, ffff88018efc2000)
[ 2918.419257] The buggy address belongs to the page:
[ 2918.419431] page:ffffea00063bf000 count:1 mapcount:0 mapping:ffff8801f2242540 index:0x0 compound_mapcount: 0
[ 2918.419702] flags: 0x17fff8000008100(slab|head)
[ 2918.419879] raw: 017fff8000008100 dead000000000100 dead000000000200 ffff8801f2242540
[ 2918.420101] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
[ 2918.420322] page dumped because: kasan: bad access detected

[ 2918.420599] Memory state around the buggy address:
[ 2918.420764]  ffff88018efc1f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2918.420975]  ffff88018efc1f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2918.421194] >ffff88018efc2000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2918.421406]                                                           ^
[ 2918.421627]  ffff88018efc2080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2918.421838]  ffff88018efc2100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2918.422046] ==================================================================
[ 2918.422264] Disabling lock debugging due to kernel taint
[ 2923.901641] BUG: unable to handle kernel paging request at ffff88018f0db000
[ 2923.901884] PGD 22226a067 P4D 22226a067 PUD 222273067 PMD 18e642063 PTE 800000018f0db061
[ 2923.902120] Oops: 0003 [#1] SMP KASAN PTI
[ 2923.902274] CPU: 1 PID: 1231 Comm: umount Tainted: G    B             4.17.0+ #1
[ 2923.902490] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 2923.902761] RIP: 0010:__memset+0x24/0x30
[ 2923.902906] Code: 90 90 90 90 90 90 66 66 90 66 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
[ 2923.903446] RSP: 0018:ffff88018ddf7ae0 EFLAGS: 00010206
[ 2923.903622] RAX: 0000000000000000 RBX: ffff8801d549d888 RCX: 1ffffffffffdaffb
[ 2923.903833] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018f0daffc
[ 2923.904062] RBP: ffff88018efc206c R08: 1ffff10031df840d R09: ffff88018efc206c
[ 2923.904273] R10: ffffffffffffe1ee R11: ffffed0031df65fa R12: 0000000000000000
[ 2923.904485] R13: ffff8801d549dc98 R14: 00000000ffffc3db R15: ffffea00063bec80
[ 2923.904693] FS:  00007fa8b2f8a840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 2923.904937] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2923.910080] CR2: ffff88018f0db000 CR3: 000000018f892000 CR4: 00000000000006e0
[ 2923.914930] Call Trace:
[ 2923.919724]  f2fs_truncate_inline_inode+0x114/0x170
[ 2923.924487]  f2fs_truncate_blocks+0x11b/0x7c0
[ 2923.929178]  ? f2fs_truncate_data_blocks+0x10/0x10
[ 2923.933834]  ? dqget+0x670/0x670
[ 2923.938437]  ? f2fs_destroy_extent_tree+0xd6/0x270
[ 2923.943107]  ? __radix_tree_lookup+0x2f/0x150
[ 2923.947772]  f2fs_truncate+0xd4/0x1a0
[ 2923.952491]  f2fs_evict_inode+0x5ab/0x610
[ 2923.957204]  evict+0x15f/0x280
[ 2923.961898]  __dentry_kill+0x161/0x250
[ 2923.966634]  shrink_dentry_list+0xf3/0x250
[ 2923.971897]  shrink_dcache_parent+0xa9/0x100
[ 2923.976561]  ? shrink_dcache_sb+0x1f0/0x1f0
[ 2923.981177]  ? wait_for_completion+0x8a/0x210
[ 2923.985781]  ? migrate_swap_stop+0x2d0/0x2d0
[ 2923.990332]  do_one_tree+0xe/0x40
[ 2923.994735]  shrink_dcache_for_umount+0x3a/0xa0
[ 2923.999077]  generic_shutdown_super+0x3e/0x1c0
[ 2924.003350]  kill_block_super+0x4b/0x70
[ 2924.007619]  deactivate_locked_super+0x65/0x90
[ 2924.011812]  cleanup_mnt+0x5c/0xa0
[ 2924.015995]  task_work_run+0xce/0xf0
[ 2924.020174]  exit_to_usermode_loop+0x115/0x120
[ 2924.024293]  do_syscall_64+0x12f/0x160
[ 2924.028479]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2924.032709] RIP: 0033:0x7fa8b2868487
[ 2924.036888] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 2924.045750] RSP: 002b:00007ffc39824d58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 2924.050190] RAX: 0000000000000000 RBX: 00000000008ea030 RCX: 00007fa8b2868487
[ 2924.054604] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000008f4360
[ 2924.058940] RBP: 00000000008f4360 R08: 0000000000000000 R09: 0000000000000014
[ 2924.063186] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007fa8b2d7183c
[ 2924.067418] R13: 0000000000000000 R14: 00000000008ea210 R15: 00007ffc39824fe0
[ 2924.071534] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
[ 2924.098044] CR2: ffff88018f0db000
[ 2924.102520] ---[ end trace a8e0d899985faf31 ]---
[ 2924.107012] RIP: 0010:__memset+0x24/0x30
[ 2924.111448] Code: 90 90 90 90 90 90 66 66 90 66 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
[ 2924.120724] RSP: 0018:ffff88018ddf7ae0 EFLAGS: 00010206
[ 2924.125312] RAX: 0000000000000000 RBX: ffff8801d549d888 RCX: 1ffffffffffdaffb
[ 2924.129931] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018f0daffc
[ 2924.134537] RBP: ffff88018efc206c R08: 1ffff10031df840d R09: ffff88018efc206c
[ 2924.139175] R10: ffffffffffffe1ee R11: ffffed0031df65fa R12: 0000000000000000
[ 2924.143825] R13: ffff8801d549dc98 R14: 00000000ffffc3db R15: ffffea00063bec80
[ 2924.148500] FS:  00007fa8b2f8a840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 2924.153247] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2924.158003] CR2: ffff88018f0db000 CR3: 000000018f892000 CR4: 00000000000006e0
[ 2924.164641] BUG: Bad rss-counter state mm:00000000fa04621e idx:0 val:4
[ 2924.170007] BUG: Bad rss-counter
tate mm:00000000fa04621e idx:1 val:2

- Location
https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/inline.c#L78
	memset(addr + from, 0, MAX_INLINE_DATA(inode) - from);
Here the length can be negative.

Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Yunlong Song
66415cee3d f2fs: blk_finish_plug of submit_bio in lfs mode
Expand the blk_finish_plug action from blkzoned to normal lfs mode,
since plug will cause the out-of-order IO submission, which is not
friendly to flash in lfs mode.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Yunlong Song
3611ce9911 f2fs: do not set free of current section
For the case when sbi->segs_per_sec > 1, take section:segment = 5 for
example, if segment 1 is just used and allocate new segment 2, and the
blocks of segment 1 is invalidated, at this time, the previous code will
use __set_test_and_free to free the free_secmap and free_sections++,
this is not correct since it is still a current section, so fix it.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Daniel Rosenberg
36b877af79 f2fs: Keep alloc_valid_block_count in sync
If we attempt to request more blocks than we have room for, we try to
instead request as much as we can, however, alloc_valid_block_count
is not decremented to match the new value, allowing it to drift higher
until the next checkpoint. This always decrements it when the requested
amount cannot be fulfilled.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
20ee438232 f2fs: issue small discard by LBA order
For small granularity discard which size is smaller than 64KB, if we
issue those kind of discards orderly by size, their IOs will be spread
into entire logical address, so that in FTL, L2P table will be updated
randomly, result bad wear rate in the table.

In this patch, we choose to issue small discard by LBA order, by this
way, we can expect that L2P table updates from adjacent discard IOs can
be merged in the cache, so it can reduce lifetime wearing of flash.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
522d1711d6 f2fs: stop issuing discard immediately if there is queued IO
For background discard policy, even if there is queued user IO, still
we will check max_requests times for next discard entry, it is unneeded,
let's just stop this round submission immediately.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
4c6b56c002 f2fs: clean up with IS_INODE()
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
2482c4325d f2fs: detect bug_on in f2fs_wait_discard_bios
Add bug_on to detect potential non-empty discard wait list.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Randy Dunlap
cb15d1e43d f2fs: fix defined but not used build warnings
Fix build warnings in f2fs when CONFIG_PROC_FS is not enabled
by marking the unused functions as __maybe_unused.

../fs/f2fs/sysfs.c:519:12: warning: 'segment_info_seq_show' defined but not used [-Wunused-function]
../fs/f2fs/sysfs.c:546:12: warning: 'segment_bits_seq_show' defined but not used [-Wunused-function]
../fs/f2fs/sysfs.c:570:12: warning: 'iostat_info_seq_show' defined but not used [-Wunused-function]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
a39e536583 f2fs: enable real-time discard by default
f2fs is focused on flash based storage, so let's enable real-time
discard by default, if user don't want to enable it, 'nodiscard'
mount option should be used on mount.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
82902c06bd f2fs: fix to detect looped node chain correctly
Below dmesg was printed when testing generic/388 of fstest:

F2FS-fs (zram1): find_fsync_dnodes: detect looped node chain, blkaddr:526615, next:526616
F2FS-fs (zram1): Cannot recover all fsync data errno=-22
F2FS-fs (zram1): Mounted with checkpoint version = 22300d0e
F2FS-fs (zram1): find_fsync_dnodes: detect looped node chain, blkaddr:526615, next:526616
F2FS-fs (zram1): Cannot recover all fsync data errno=-22

The reason is that we initialize free_blocks with free blocks of
filesystem, so if filesystem is full, free_blocks can be zero,
below condition will be true, so that, it will fail recovery.

if (++loop_cnt >= free_blocks ||
	blkaddr == next_blkaddr_of_node(page))

To fix this issue, initialize free_blocks with correct value which
includes over-privision blocks.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Chao Yu
c9b60788fc f2fs: fix to do sanity check with block address in main area
This patch add to do sanity check with below field:
- cp_pack_total_block_count
- blkaddr of data/node
- extent info

- Overview
BUG() in verify_block_addr() when writing to a corrupted f2fs image

- Reproduce (4.18 upstream kernel)

- POC (poc.c)

static void activity(char *mpoint) {

  char *foo_bar_baz;
  int err;

  static int buf[8192];
  memset(buf, 0, sizeof(buf));

  err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);

  int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
  if (fd >= 0) {
    write(fd, (char *)buf, sizeof(buf));
    fdatasync(fd);
    close(fd);
  }
}

int main(int argc, char *argv[]) {
  activity(argv[1]);
  return 0;
}

- Kernel message
[  689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
[  699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
[  699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[  699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
[  699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
[  699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
[  699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
[  699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
[  699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
[  699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
[  699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
[  699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
[  699.729154] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[  699.729156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[  699.729171] Call Trace:
[  699.729192]  f2fs_do_write_data_page+0x2e2/0xe00
[  699.729203]  ? f2fs_should_update_outplace+0xd0/0xd0
[  699.729238]  ? memcg_drain_all_list_lrus+0x280/0x280
[  699.729269]  ? __radix_tree_replace+0xa3/0x120
[  699.729276]  __write_data_page+0x5c7/0xe30
[  699.729291]  ? kasan_check_read+0x11/0x20
[  699.729310]  ? page_mapped+0x8a/0x110
[  699.729321]  ? page_mkclean+0xe9/0x160
[  699.729327]  ? f2fs_do_write_data_page+0xe00/0xe00
[  699.729331]  ? invalid_page_referenced_vma+0x130/0x130
[  699.729345]  ? clear_page_dirty_for_io+0x332/0x450
[  699.729351]  f2fs_write_cache_pages+0x4ca/0x860
[  699.729358]  ? __write_data_page+0xe30/0xe30
[  699.729374]  ? percpu_counter_add_batch+0x22/0xa0
[  699.729380]  ? kasan_check_write+0x14/0x20
[  699.729391]  ? _raw_spin_lock+0x17/0x40
[  699.729403]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
[  699.729413]  ? iov_iter_advance+0x113/0x640
[  699.729418]  ? f2fs_write_end+0x133/0x2e0
[  699.729423]  ? balance_dirty_pages_ratelimited+0x239/0x640
[  699.729428]  f2fs_write_data_pages+0x329/0x520
[  699.729433]  ? generic_perform_write+0x250/0x320
[  699.729438]  ? f2fs_write_cache_pages+0x860/0x860
[  699.729454]  ? current_time+0x110/0x110
[  699.729459]  ? f2fs_preallocate_blocks+0x1ef/0x370
[  699.729464]  do_writepages+0x37/0xb0
[  699.729468]  ? f2fs_write_cache_pages+0x860/0x860
[  699.729472]  ? do_writepages+0x37/0xb0
[  699.729478]  __filemap_fdatawrite_range+0x19a/0x1f0
[  699.729483]  ? delete_from_page_cache_batch+0x4e0/0x4e0
[  699.729496]  ? __vfs_write+0x2b2/0x410
[  699.729501]  file_write_and_wait_range+0x66/0xb0
[  699.729506]  f2fs_do_sync_file+0x1f9/0xd90
[  699.729511]  ? truncate_partial_data_page+0x290/0x290
[  699.729521]  ? __sb_end_write+0x30/0x50
[  699.729526]  ? vfs_write+0x20f/0x260
[  699.729530]  f2fs_sync_file+0x9a/0xb0
[  699.729534]  ? f2fs_do_sync_file+0xd90/0xd90
[  699.729548]  vfs_fsync_range+0x68/0x100
[  699.729554]  ? __fget_light+0xc9/0xe0
[  699.729558]  do_fsync+0x3d/0x70
[  699.729562]  __x64_sys_fdatasync+0x24/0x30
[  699.729585]  do_syscall_64+0x78/0x170
[  699.729595]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  699.729613] RIP: 0033:0x7f9bf930d800
[  699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
[  699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[  699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[  699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[  699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[  699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[  699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
[  699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
[  699.729782] ------------[ cut here ]------------
[  699.729785] kernel BUG at fs/f2fs/segment.h:654!
[  699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
[  699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G        W         4.18.0-rc1+ #4
[  699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
[  699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
[  699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
[  699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
[  699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
[  699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
[  699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
[  699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
[  699.748683] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[  699.750293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[  699.752874] Call Trace:
[  699.753386]  ? f2fs_inplace_write_data+0x93/0x240
[  699.754341]  f2fs_inplace_write_data+0xd2/0x240
[  699.755271]  f2fs_do_write_data_page+0x2e2/0xe00
[  699.756214]  ? f2fs_should_update_outplace+0xd0/0xd0
[  699.757215]  ? memcg_drain_all_list_lrus+0x280/0x280
[  699.758209]  ? __radix_tree_replace+0xa3/0x120
[  699.759164]  __write_data_page+0x5c7/0xe30
[  699.760002]  ? kasan_check_read+0x11/0x20
[  699.760823]  ? page_mapped+0x8a/0x110
[  699.761573]  ? page_mkclean+0xe9/0x160
[  699.762345]  ? f2fs_do_write_data_page+0xe00/0xe00
[  699.763332]  ? invalid_page_referenced_vma+0x130/0x130
[  699.764374]  ? clear_page_dirty_for_io+0x332/0x450
[  699.765347]  f2fs_write_cache_pages+0x4ca/0x860
[  699.766276]  ? __write_data_page+0xe30/0xe30
[  699.767161]  ? percpu_counter_add_batch+0x22/0xa0
[  699.768112]  ? kasan_check_write+0x14/0x20
[  699.768951]  ? _raw_spin_lock+0x17/0x40
[  699.769739]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
[  699.770885]  ? iov_iter_advance+0x113/0x640
[  699.771743]  ? f2fs_write_end+0x133/0x2e0
[  699.772569]  ? balance_dirty_pages_ratelimited+0x239/0x640
[  699.773680]  f2fs_write_data_pages+0x329/0x520
[  699.774603]  ? generic_perform_write+0x250/0x320
[  699.775544]  ? f2fs_write_cache_pages+0x860/0x860
[  699.776510]  ? current_time+0x110/0x110
[  699.777299]  ? f2fs_preallocate_blocks+0x1ef/0x370
[  699.778279]  do_writepages+0x37/0xb0
[  699.779026]  ? f2fs_write_cache_pages+0x860/0x860
[  699.779978]  ? do_writepages+0x37/0xb0
[  699.780755]  __filemap_fdatawrite_range+0x19a/0x1f0
[  699.781746]  ? delete_from_page_cache_batch+0x4e0/0x4e0
[  699.782820]  ? __vfs_write+0x2b2/0x410
[  699.783597]  file_write_and_wait_range+0x66/0xb0
[  699.784540]  f2fs_do_sync_file+0x1f9/0xd90
[  699.785381]  ? truncate_partial_data_page+0x290/0x290
[  699.786415]  ? __sb_end_write+0x30/0x50
[  699.787204]  ? vfs_write+0x20f/0x260
[  699.787941]  f2fs_sync_file+0x9a/0xb0
[  699.788694]  ? f2fs_do_sync_file+0xd90/0xd90
[  699.789572]  vfs_fsync_range+0x68/0x100
[  699.790360]  ? __fget_light+0xc9/0xe0
[  699.791128]  do_fsync+0x3d/0x70
[  699.791779]  __x64_sys_fdatasync+0x24/0x30
[  699.792614]  do_syscall_64+0x78/0x170
[  699.793371]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  699.794406] RIP: 0033:0x7f9bf930d800
[  699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
[  699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[  699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[  699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[  699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[  699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[  699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
[  699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[  699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
[  699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
[  699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
[  699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
[  699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
[  699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
[  699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
[  699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
[  699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
[  699.831192] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[  699.832793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
[  699.835556] ==================================================================
[  699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
[  699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309

[  699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G      D W         4.18.0-rc1+ #4
[  699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  699.843475] Call Trace:
[  699.843982]  dump_stack+0x7b/0xb5
[  699.844661]  print_address_description+0x70/0x290
[  699.845607]  kasan_report+0x291/0x390
[  699.846351]  ? update_stack_state+0x38c/0x3e0
[  699.853831]  __asan_load8+0x54/0x90
[  699.854569]  update_stack_state+0x38c/0x3e0
[  699.855428]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
[  699.856601]  ? __save_stack_trace+0x5e/0x100
[  699.857476]  unwind_next_frame.part.5+0x18e/0x490
[  699.858448]  ? unwind_dump+0x290/0x290
[  699.859217]  ? clear_page_dirty_for_io+0x332/0x450
[  699.860185]  __unwind_start+0x106/0x190
[  699.860974]  __save_stack_trace+0x5e/0x100
[  699.861808]  ? __save_stack_trace+0x5e/0x100
[  699.862691]  ? unlink_anon_vmas+0xba/0x2c0
[  699.863525]  save_stack_trace+0x1f/0x30
[  699.864312]  save_stack+0x46/0xd0
[  699.864993]  ? __alloc_pages_slowpath+0x1420/0x1420
[  699.865990]  ? flush_tlb_mm_range+0x15e/0x220
[  699.866889]  ? kasan_check_write+0x14/0x20
[  699.867724]  ? __dec_node_state+0x92/0xb0
[  699.868543]  ? lock_page_memcg+0x85/0xf0
[  699.869350]  ? unlock_page_memcg+0x16/0x80
[  699.870185]  ? page_remove_rmap+0x198/0x520
[  699.871048]  ? mark_page_accessed+0x133/0x200
[  699.871930]  ? _cond_resched+0x1a/0x50
[  699.872700]  ? unmap_page_range+0xcd4/0xe50
[  699.873551]  ? rb_next+0x58/0x80
[  699.874217]  ? rb_next+0x58/0x80
[  699.874895]  __kasan_slab_free+0x13c/0x1a0
[  699.875734]  ? unlink_anon_vmas+0xba/0x2c0
[  699.876563]  kasan_slab_free+0xe/0x10
[  699.877315]  kmem_cache_free+0x89/0x1e0
[  699.878095]  unlink_anon_vmas+0xba/0x2c0
[  699.878913]  free_pgtables+0x101/0x1b0
[  699.879677]  exit_mmap+0x146/0x2a0
[  699.880378]  ? __ia32_sys_munmap+0x50/0x50
[  699.881214]  ? kasan_check_read+0x11/0x20
[  699.882052]  ? mm_update_next_owner+0x322/0x380
[  699.882985]  mmput+0x8b/0x1d0
[  699.883602]  do_exit+0x43a/0x1390
[  699.884288]  ? mm_update_next_owner+0x380/0x380
[  699.885212]  ? f2fs_sync_file+0x9a/0xb0
[  699.885995]  ? f2fs_do_sync_file+0xd90/0xd90
[  699.886877]  ? vfs_fsync_range+0x68/0x100
[  699.887694]  ? __fget_light+0xc9/0xe0
[  699.888442]  ? do_fsync+0x3d/0x70
[  699.889118]  ? __x64_sys_fdatasync+0x24/0x30
[  699.889996]  rewind_stack_do_exit+0x17/0x20
[  699.890860] RIP: 0033:0x7f9bf930d800
[  699.891585] Code: Bad RIP value.
[  699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[  699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
[  699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
[  699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
[  699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
[  699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000

[  699.901241] The buggy address belongs to the page:
[  699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[  699.903811] flags: 0x2ffff0000000000()
[  699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
[  699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
[  699.907673] page dumped because: kasan: bad access detected

[  699.909108] Memory state around the buggy address:
[  699.910077]  ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
[  699.911528]  ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
[  699.914392]                                                              ^
[  699.915758]  ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
[  699.917193]  ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
[  699.918634] ==================================================================

- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644

Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:32 -07:00
Linus Torvalds
cdbb65c4c7 squashfs metadata 2: electric boogaloo
Anatoly continues to find issues with fuzzed squashfs images.

This time, corrupt, missing, or undersized data for the page filling
wasn't checked for, because the squashfs_{copy,read}_cache() functions
did the squashfs_copy_data() call without checking the resulting data
size.

Which could result in the page cache pages being incompletely filled in,
and no error indication to the user space reading garbage data.

So make a helper function for the "fill in pages" case, because the
exact same incomplete sequence existed in two places.

[ I should have made a squashfs branch for these things, but I didn't
  intend to start doing them in the first place.

  My historical connection through cramfs is why I got into looking at
  these issues at all, and every time I (continue to) think it's a
  one-off.

  Because _this_ time is always the last time. Right?   - Linus ]

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01 10:38:43 -07:00
Theodore Ts'o
7d95178c77 ext4: check for NUL characters in extended attribute's name
Extended attribute names are defined to be NUL-terminated, so the name
must not contain a NUL character.  This is important because there are
places when remove extended attribute, the code uses strlen to
determine the length of the entry.  That should probably be fixed at
some point, but code is currently really messy, so the simplest fix
for now is to simply validate that the extended attributes are sane.

https://bugzilla.kernel.org/show_bug.cgi?id=200401

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-08-01 12:36:52 -04:00
Wang Shilong
5ef2a69993 ext4: use ext4_warning() for sb_getblk failure
Out of memory should not be considered as critical errors; so replace
ext4_error() with ext4_warnig().

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-08-01 12:02:31 -04:00
Darrick J. Wong
56830d6cc1 xfs: check da node magic in _node_lookup_int
Before we start processing what we /think/ is a da3 node block, actually
check the magic to make sure that we're looking at a node block.  This
way we won't blow the asserts in _node_hdr_from_disk on corrupted
metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2018-08-01 07:42:43 -07:00
Darrick J. Wong
611995db2c xfs: use a local variable for magic number in xfs_da3_node_lookup_int
Use a local variable for the block magic number checks instead of
abusing blk->magic.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2018-08-01 07:42:18 -07:00
Darrick J. Wong
0c60d3aa0e xfs: refactor log recovery check
Add a predicate to decide if the log is actively in recovery and use
that instead of open-coding a pagf_init check in the attr leaf verifier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2018-08-01 07:40:48 -07:00
Darrick J. Wong
ff23f4af7e xfs: move extent busy tree initialization to xfs_initialize_perag
Move the per-AG busy extent tree initialization to the per-ag structure
initialization since we don't want online repair to leak the old tree.
We only deconstruct the tree at unmount time, so this should be safe.
This also enables us to eliminate the commented out initialization in
the xfsprogs libxfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-07-31 13:18:09 -07:00
Christoph Hellwig
e666aa37f4 xfs: avoid COW fork extent lookups in writeback if the fork didn't change
Used the per-fork sequence counter to avoid lookups in the writeback code
unless the COW fork actually changed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-31 13:18:09 -07:00
Christoph Hellwig
745b3f76d1 xfs: maintain a sequence count for inode fork manipulations
Add a simple 32-bit unsigned integer as the sequence count for
modifications to the extent list in the inode fork.  This will be
used to optimize away extent list lookups in the writeback code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-31 13:18:09 -07:00
Darrick J. Wong
9e037cb797 xfs: check for unknown v5 feature bits in superblock write verifier
Make sure we never try to write the superblock with unknown feature bits
set.  We checked those at mount time, so if they're set now then memory
is corrupt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-07-31 13:18:09 -07:00
Darrick J. Wong
69775fd15d xfs: verify icount in superblock write
Add a helper predicate to check the inode count for sanity, then use it
in the superblock write verifier to inspect sb_icount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-07-31 13:18:09 -07:00