1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

712 commits

Author SHA1 Message Date
Daeho Jeong
4931e0c93e f2fs: turn back remapped address in compressed page endio
Turned back the remmaped sector address to the address in the partition,
when ending io, for compress cache to work properly.

Fixes: 6ce19aff0b ("f2fs: compress: add compress_inode to cache
compressed blocks")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Hyeong Jun Kim <hj514.kim@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-02 11:24:25 -07:00
Daeho Jeong
093f0bac32 f2fs: change fiemap way in printing compression chunk
When we print out a discontinuous compression chunk, it shows like a
continuous chunk now. To show it more correctly, I've changed the way of
printing fiemap info like below. Plus, eliminated NEW_ADDR(-1) in fiemap
info, since it is not in fiemap user api manual.

Let's assume 16KB compression cluster.

<before>
   Logical          Physical         Length           Flags
0:  0000000000000000 00000002c091f000 0000000000004000 1008
1:  0000000000004000 00000002c0920000 0000000000004000 1008
  ...
9:  0000000000034000 0000000f8c623000 0000000000004000 1008
10: 0000000000038000 000000101a6eb000 0000000000004000 1008

<after>
0:  0000000000000000 00000002c091f000 0000000000004000 1008
1:  0000000000004000 00000002c0920000 0000000000004000 1008
  ...
9:  0000000000034000 0000000f8c623000 0000000000001000 1008
10: 0000000000035000 000000101a6ea000 0000000000003000 1008
11: 0000000000038000 000000101a6eb000 0000000000002000 1008
12: 000000000003a000 00000002c3544000 0000000000002000 1008

Flags
0x1000 => FIEMAP_EXTENT_MERGED
0x0008 => FIEMAP_EXTENT_ENCODED

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-02 11:24:25 -07:00
Fengnan Chang
7eab7a6968 f2fs: compress: remove unneeded read when rewrite whole cluster
when we overwrite the whole page in cluster, we don't need read original
data before write, because after write_end(), writepages() can help to
load left data in that cluster.

Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-02 11:24:25 -07:00
Eric Biggers
6de8687ccd f2fs: remove allow_outplace_dio()
We can just check f2fs_lfs_mode() directly.  The block_unaligned_IO()
check is redundant because in LFS mode, f2fs doesn't do direct I/O
writes that aren't block-aligned (due to f2fs_force_buffered_io()
returning true in this case, triggering the fallback to buffered I/O).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-07-25 08:42:37 -07:00
Eric Biggers
3e679dc78c f2fs: make f2fs_write_failed() take struct inode
Make f2fs_write_failed() take a 'struct inode' directly rather than a
'struct address_space', as this simplifies it slightly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-07-25 08:42:31 -07:00
Jaegeuk Kim
1ffc8f5f77 f2fs: let's keep writing IOs on SBI_NEED_FSCK
SBI_NEED_FSCK is an indicator that fsck.f2fs needs to be triggered, so it
is not fully critical to stop any IO writes. So, let's allow to write data
instead of reporting EIO forever given SBI_NEED_FSCK, but do keep OPU.

Fixes: 9557727876 ("f2fs: drop inplace IO if fs status is abnormal")
Cc: <stable@kernel.org> # v5.13+
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-07-19 18:16:40 -07:00
Jan Kara
edc6d01bad f2fs: Convert to using invalidate_lock
Use invalidate_lock instead of f2fs' private i_mmap_sem. The intended
purpose is exactly the same. By this conversion we fix a long standing
race between hole punching and read(2) / readahead(2) paths that can
lead to stale page cache contents.

CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: Chao Yu <yuchao0@huawei.com>
CC: linux-f2fs-devel@lists.sourceforge.net
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-07-13 14:29:01 +02:00
Jaegeuk Kim
c9ebd3df43 f2fs: initialize page->private when using for our internal use
We need to guarantee it's initially zero. Otherwise, it'll hurt entire flag
operations.

Fixes: b763f3bedc ("f2fs: restructure f2fs page.private layout")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-07-05 17:17:30 -07:00
Chao Yu
859fca6b70 f2fs: swap: support migrating swapfile in aligned write mode
This patch supports to migrate swapfile in aligned write mode during
swapon in order to keep swapfile being aligned to section as much as
possible, then pinned swapfile will locates fully filled section which
may not affected by GC.

However, for the case that swapfile's size is not aligned to section
size, it will still leave last extent in file's tail as unaligned due
to its size is smaller than section size, like case #2.

case #1
xfs_io -f /mnt/f2fs/file -c "pwrite 0 4M" -c "fsync"

Before swapon:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..3047]:       1123352..1126399  3048 0x1000
   1: [3048..7143]:    237568..241663    4096 0x1000
   2: [7144..8191]:    245760..246807    1048 0x1001
After swapon:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..8191]:       249856..258047    8192 0x1001
Kmsg:
F2FS-fs (zram0): Swapfile (2) is not align to section:
1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(2097152 * n)

case #2
xfs_io -f /mnt/f2fs/file -c "pwrite 0 3M" -c "fsync"

Before swapon:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..3047]:       246808..249855    3048 0x1000
   1: [3048..6143]:    237568..240663    3096 0x1001
After swapon:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..4095]:       258048..262143    4096 0x1000
   1: [4096..6143]:    238616..240663    2048 0x1001
Kmsg:
F2FS-fs (zram0): Swapfile: last extent is not aligned to section
F2FS-fs (zram0): Swapfile (2) is not align to section:
1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(2097152 * n)

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23 01:09:35 -07:00
Chao Yu
0b8fc00601 f2fs: swap: remove dead codes
After commit af4b6b8edf ("f2fs: introduce check_swap_activate_fast()"),
we will never run into original logic of check_swap_activate() before
f2fs supports non 4k-sized page, so let's delete those dead codes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23 01:09:35 -07:00
Chao Yu
6ce19aff0b f2fs: compress: add compress_inode to cache compressed blocks
Support to use address space of inner inode to cache compressed block,
in order to improve cache hit ratio of random read.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23 01:09:35 -07:00
Joe Perches
833dcd3545 f2fs: logging neatening
Update the logging uses that have unnecessary newlines as the f2fs_printk
function and so its f2fs_<level> macro callers already adds one.

This allows searching single line logging entries with an easier grep and
also avoids unnecessary blank lines in the logging.

Miscellanea:

o Coalesce formats
o Align to open parenthesis

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23 01:09:34 -07:00
Shin'ichiro Kawasaki
d927ccfccb f2fs: Prevent swap file in LFS mode
The kernel writes to swap files on f2fs directly without the assistance
of the filesystem. This direct write by kernel can be non-sequential
even when the f2fs is in LFS mode. Such non-sequential write conflicts
with the LFS semantics. Especially when f2fs is set up on zoned block
devices, the non-sequential write causes unaligned write command errors.

To avoid the non-sequential writes to swap files, prevent swap file
activation when the filesystem is in LFS mode.

Fixes: 4969c06a0d ("f2fs: support swap file w/ DIO")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: stable@vger.kernel.org # v5.10+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14 11:22:08 -07:00
Chao Yu
b763f3bedc f2fs: restructure f2fs page.private layout
Restruct f2fs page private layout for below reasons:

There are some cases that f2fs wants to set a flag in a page to
indicate a specified status of page:
a) page is in transaction list for atomic write
b) page contains dummy data for aligned write
c) page is migrating for GC
d) page contains inline data for inline inode flush
e) page belongs to merkle tree, and is verified for fsverity
f) page is dirty and has filesystem/inode reference count for writeback
g) page is temporary and has decompress io context reference for compression

There are existed places in page structure we can use to store
f2fs private status/data:
- page.flags: PG_checked, PG_private
- page.private

However it was a mess when we using them, which may cause potential
confliction:
		page.private	PG_private	PG_checked	page._refcount (+1 at most)
a)		-1		set				+1
b)		-2		set
c), d), e)					set
f)		0		set				+1
g)		pointer		set

The other problem is page.flags has no free slot, if we can avoid set
zero to page.private and set PG_private flag, then we use non-zero value
to indicate PG_private status, so that we may have chance to reclaim
PG_private slot for other usage. [1]

The other concern is f2fs has bad scalability in aspect of indicating
more page status.

So in this patch, let's restructure f2fs' page.private as below to
solve above issues:

Layout A: lowest bit should be 1
| bit0 = 1 | bit1 | bit2 | ... | bit MAX | private data .... |
 bit 0	PAGE_PRIVATE_NOT_POINTER
 bit 1	PAGE_PRIVATE_ATOMIC_WRITE
 bit 2	PAGE_PRIVATE_DUMMY_WRITE
 bit 3	PAGE_PRIVATE_ONGOING_MIGRATION
 bit 4	PAGE_PRIVATE_INLINE_INODE
 bit 5	PAGE_PRIVATE_REF_RESOURCE
 bit 6-	f2fs private data

Layout B: lowest bit should be 0
 page.private is a wrapped pointer.

After the change:
		page.private	PG_private	PG_checked	page._refcount (+1 at most)
a)		11		set				+1
b)		101		set				+1
c)		1001		set				+1
d)		10001		set				+1
e)						set
f)		100001		set				+1
g)		pointer		set				+1

[1] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org/T/#u

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14 11:22:08 -07:00
Jaegeuk Kim
f395183f95 f2fs: return EINVAL for hole cases in swap file
This tries to fix xfstests/generic/495.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-12 07:38:00 -07:00
Jaegeuk Kim
ca298241bc f2fs: avoid swapon failure by giving a warning first
The final solution can be migrating blocks to form a section-aligned file
internally. Meanwhile, let's ask users to do that when preparing the swap
file initially like:
1) create()
2) ioctl(F2FS_IOC_SET_PIN_FILE)
3) fallocate()

Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 36e4d95891 ("f2fs: check if swapfile is section-alligned")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11 20:51:53 -07:00
Chao Yu
8bfbfb0ddd f2fs: compress: fix to assign cc.cluster_idx correctly
In f2fs_destroy_compress_ctx(), after f2fs_destroy_compress_ctx(),
cc.cluster_idx will be cleared w/ NULL_CLUSTER, f2fs_cluster_blocks()
may check wrong cluster metadata, fix it.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-11 14:48:12 -07:00
Linus Torvalds
d0195c7d7a f2fs-for-5.13-rc1
In this round, we added a new mount option, "checkpoint_merge", which introduces
 a kernel thread dealing with the f2fs checkpoints. Once we start to manage the
 IO priority along with blk-cgroup, the checkpoint operation can be processed in
 a lower priority under the process context. Since the checkpoint holds all the
 filesystem operations, we give a higher priority to the checkpoint thread all
 the time.
 
 Enhancement:
 - introduce gc_merge mount option to introduce a checkpoint thread
 - improve to run discard thread efficiently
 - allow modular compression algorithms
 - expose # of overprivision segments to sysfs
 - expose runtime compression stat to sysfs
 
 Bug fix:
 - fix OOB memory access by the node id lookup
 - avoid touching checkpointed data in the checkpoint-disabled mode
 - fix the resizing flow to avoid kernel panic and race conditions
 - fix block allocation issues on pinned files
 - address some swapfile issues
 - fix hugtask problem and kernel panic during atomic write operations
 - don't start checkpoint thread in RO
 
 And, we've cleaned up some kernel coding style and build warnings. In addition,
 we fixed some minor race conditions and error handling routines.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmCQhVIACgkQQBSofoJI
 UNIggA/8DZINzFLMCj6+6P5wNAWj3nYtx/FnwZ7C31f8qkiZjgA4LfONUnDvV7sU
 GS8MuLQz4eTYfqU2rVgGiSm+aCkEOovnk7C7Huo7pezgqYb+5J6ACXsqdU3dcD5M
 kShJMqLKcTKqtOMbnJrdGvmw1/ysuAi7UhSSgVV+9NQxlhxADnagOGbQ7lXNSV3R
 spGMWazGY2uA5DFCCa4lMX79lyFATCzEKB3SKAW5r+8QSmxJY8ViK2Er7AnvwRJz
 XJ/QJ8ALNb/GGyHzBWFv3P6Yxo/G3FkUvTIc5Rhi9P2lUgjI2NALokj7AOnfNh4a
 uSXHVlNrrfH+gpx9xr5z8MUmroCYCCOJ6EhnVweqViRmekY8jSb2HxmUtDTIf19U
 LWl3gtD2GDQx6CY0a0K58Oa2Lp0Bp9MWUdPA/4P21EymZwXum7aCkhV+DnigcoCj
 yCmKlI8nIpCS97dIO/7MsnG6Tu/7c+Prytd2ezUo+6hlkXPZk8shs+elnjlWu/6V
 3ZpSWKzQsPJL8U7eB9H04AxEokrrXm3fRhR86C7JdkEe0gyGFf3dB/G+jqzcNi5m
 ZpxiCeZ8RbNmPpnH8NWeYHk9uDKMOXuUPFYDoaOwImNWfqj0jfhiTxHo4MyBLAuk
 MT632ICcuJLvwgnSbMAI3U7v6+dZXKH4y6U7IHFjxhI7beMzxmI=
 =P5uk
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we added a new mount option, "checkpoint_merge", which
  introduces a kernel thread dealing with the f2fs checkpoints. Once we
  start to manage the IO priority along with blk-cgroup, the checkpoint
  operation can be processed in a lower priority under the process
  context. Since the checkpoint holds all the filesystem operations, we
  give a higher priority to the checkpoint thread all the time.

  Enhancements:
   - introduce gc_merge mount option to introduce a checkpoint thread
   - improve to run discard thread efficiently
   - allow modular compression algorithms
   - expose # of overprivision segments to sysfs
   - expose runtime compression stat to sysfs

  Bug fixes:
   - fix OOB memory access by the node id lookup
   - avoid touching checkpointed data in the checkpoint-disabled mode
   - fix the resizing flow to avoid kernel panic and race conditions
   - fix block allocation issues on pinned files
   - address some swapfile issues
   - fix hugtask problem and kernel panic during atomic write operations
   - don't start checkpoint thread in RO

  And, we've cleaned up some kernel coding style and build warnings. In
  addition, we fixed some minor race conditions and error handling
  routines"

* tag 'f2fs-for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (48 commits)
  f2fs: drop inplace IO if fs status is abnormal
  f2fs: compress: remove unneed check condition
  f2fs: clean up left deprecated IO trace codes
  f2fs: avoid using native allocate_segment_by_default()
  f2fs: remove unnecessary struct declaration
  f2fs: fix to avoid NULL pointer dereference
  f2fs: avoid duplicated codes for cleanup
  f2fs: document: add description about compressed space handling
  f2fs: clean up build warnings
  f2fs: fix the periodic wakeups of discard thread
  f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
  f2fs: fix to avoid GC/mmap race with f2fs_truncate()
  f2fs: set checkpoint_merge by default
  f2fs: Fix a hungtask problem in atomic write
  f2fs: fix to restrict mount condition on readonly block device
  f2fs: introduce gc_merge mount option
  f2fs: fix to cover __allocate_new_section() with curseg_lock
  f2fs: fix wrong alloc_type in f2fs_do_replace_block
  f2fs: delete empty compress.h
  f2fs: fix a typo in inode.c
  ...
2021-05-04 18:03:38 -07:00
Yi Zhuang
5f029c045c f2fs: clean up build warnings
This patch combined the below three clean-up patches.

- modify open brace '{' following function definitions
- ERROR: spaces required around that ':'
- ERROR: spaces required before the open parenthesis '('
- ERROR: spaces prohibited before that ','
- Made suggested modifications from checkpatch in reference to WARNING:
 Missing a blank line after declarations

Signed-off-by: Yi Zhuang <zhuangyi1@huawei.com>
Signed-off-by: Jia Yang <jiayang5@huawei.com>
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-04-10 10:36:39 -07:00
Chengguang Xu
0bb2045ce5 f2fs: fix to use per-inode maxbytes in f2fs_fiemap
F2FS inode may have different max size,
so change to use per-inode maxbytes.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-25 18:20:50 -07:00
huangjianan@oppo.com
36e4d95891 f2fs: check if swapfile is section-alligned
If the swapfile isn't created by pin and fallocate, it can't be
guaranteed section-aligned, so it may be selected by f2fs gc. When
gc_pin_file_threshold is reached, the address of swapfile may change,
but won't be synchronized to swap_extent, so swap will write to wrong
address, which will cause data corruption.

Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-12 13:16:43 -08:00
huangjianan@oppo.com
1da6610383 f2fs: fix last_lblock check in check_swap_activate_fast
Because page_no < sis->max guarantees that the while loop break out
normally, the wrong check contidion here doesn't cause a problem.

Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-12 13:16:43 -08:00
huangjianan@oppo.com
ebc29b62a1 f2fs: remove unnecessary IS_SWAPFILE check
Now swapfile in f2fs directly submit IO to blockdev according to
swapfile extents reported by f2fs when swapon, therefore there is
no need to check IS_SWAPFILE when exec filesystem operation.

Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-03-12 13:16:43 -08:00
Christoph Hellwig
a8affc03a9 block: rename BIO_MAX_PAGES to BIO_MAX_VECS
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed.  Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 07:47:48 -07:00
Matthew Wilcox (Oracle)
5f7136db82 block: Add bio_max_segs
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same.  Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-26 15:49:51 -07:00
Linus Torvalds
582cd91f69 for-5.12/block-2021-02-17
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtmIwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplzLEAC5O+3rBM8QuiJdo39Yppmuw4hDJ6hOKynP
 EJQLKQQi0VfXgU+MprGvcbpFYmNbgICvUICQkEzJuk++kPCu/BJtJz0yErQeLgS+
 RdXiPV6enbF7iRML5TVRTr1q/z7sJMXcIIJ8Pz/rU/JNfGYExVd0WfnEY9mp1jOt
 Bl9V+qyTazdP+Ma4+uEPatSayqcdi1rxB5I+7v/sLiOvKZZWkaRZjUZ/mxAjUfvK
 dBOOPjMygEo3tCLkIyyA6lpLvr1r+SUZhLuebRLEKa3To3TW6RtoG0qwpKmI2iKw
 ylLeVLB60nM9RUxjflVOfBsHxz1bDg5Ve86y5nCjQd4Jo8x1c4DnecyGE5/Tu8Rg
 rgbsfD6nFWzhDCvcZT0XrfQ4ZAjIL2IfT+ypQiQ6UlRd3hvIKRmzWMkjuH2svr0u
 ey9Kq+lYerI4cM0F3W73gzUKdIQOuCzBCYxQuSQQomscBa7FCInyU192dAI9Aj6l
 Yd06mgKu6qCx6zLv6JfpBqaBHZMwyGE4dmZgPQFuuwO+b4N+Ck3Jm5fzEzw/xIxQ
 wdo/DlsAl60BXentB6FByGBJaCjVdSymRqN/xNCAbFKCjmr6TLBuXPfg1gYYO7xC
 VOcVjWe8iN3wWHZab3t2mxMKH9B9B/KKzIhu6TNHSmgtQ5paZPRCBx995pDyRw26
 WC22RGC2MA==
 =os1E
 -----END PGP SIGNATURE-----

Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block

Pull core block updates from Jens Axboe:
 "Another nice round of removing more code than what is added, mostly
  due to Christoph's relentless pursuit of tech debt removal/cleanups.
  This pull request contains:

   - Two series of BFQ improvements (Paolo, Jan, Jia)

   - Block iov_iter improvements (Pavel)

   - bsg error path fix (Pan)

   - blk-mq scheduler improvements (Jan)

   - -EBUSY discard fix (Jan)

   - bvec allocation improvements (Ming, Christoph)

   - bio allocation and init improvements (Christoph)

   - Store bdev pointer in bio instead of gendisk + partno (Christoph)

   - Block trace point cleanups (Christoph)

   - hard read-only vs read-only split (Christoph)

   - Block based swap cleanups (Christoph)

   - Zoned write granularity support (Damien)

   - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)"

* tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits)
  mm: simplify swapdev_block
  sd_zbc: clear zone resources for non-zoned case
  block: introduce blk_queue_clear_zone_settings()
  zonefs: use zone write granularity as block size
  block: introduce zone_write_granularity limit
  block: use blk_queue_set_zoned in add_partition()
  nullb: use blk_queue_set_zoned() to setup zoned devices
  nvme: cleanup zone information initialization
  block: document zone_append_max_bytes attribute
  block: use bi_max_vecs to find the bvec pool
  md/raid10: remove dead code in reshape_request
  block: mark the bio as cloned in bio_iov_bvec_set
  block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
  block: remove a layer of indentation in bio_iov_iter_get_pages
  block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
  block: remove the 1 and 4 vec bvec_slabs entries
  block: streamline bvec_alloc
  block: factor out a bvec_alloc_gfp helper
  block: move struct biovec_slab to bio.c
  block: reuse BIO_INLINE_VECS for integrity bvecs
  ...
2021-02-21 11:02:48 -08:00
Dehe Gu
39f71b7e40 f2fs: fix a wrong condition in __submit_bio
We should use !F2FS_IO_ALIGNED() to check and submit_io directly.

Fixes: 8223ecc456 ("f2fs: fix to add missing F2FS_IO_ALIGNED() condition")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Dehe Gu <gudehe@huawei.com>
Signed-off-by: Ge Qiu <qiuge@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-02-02 09:14:27 -08:00
Jaegeuk Kim
d5f7bc0064 f2fs: deprecate f2fs_trace_io
This patch deprecates f2fs_trace_io, since f2fs uses page->private more broadly,
resulting in more buggy cases.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27 15:20:07 -08:00
Matthew Wilcox (Oracle)
12699fb781 f2fs: Remove readahead collision detection
With the new ->readahead operation, locked pages are added to the page
cache, preventing two threads from racing with each other to read the
same chunk of file, so this is dead code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27 15:20:07 -08:00
Chengguang Xu
6d1451bf7f f2fs: fix to use per-inode maxbytes
F2FS inode may have different max size, e.g. compressed file have
less blkaddr entries in all its direct-node blocks, result in being
with less max filesize. So change to use per-inode maxbytes.

Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27 15:20:06 -08:00
Chao Yu
3afae09ffe f2fs: compress: fix potential deadlock
generic/269 reports a hangtask issue, the root cause is ABBA deadlock
described as below:

Thread A			Thread B
- down_write(&sbi->gc_lock) -- A
				- f2fs_write_data_pages
				 - lock all pages in cluster -- B
				 - f2fs_write_multi_pages
				  - f2fs_write_raw_pages
				   - f2fs_write_single_data_page
				    - f2fs_balance_fs
				     - down_write(&sbi->gc_lock) -- A
- f2fs_gc
 - do_garbage_collect
  - ra_data_block
   - pagecache_get_page -- B

To fix this, it needs to avoid calling f2fs_balance_fs() if there is
still cluster pages been locked in context of cluster writeback, so
instead, let's call f2fs_balance_fs() in the end of
f2fs_write_raw_pages() when all cluster pages were unlocked.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27 15:20:05 -08:00
Eric Biggers
7f59b277f7 f2fs: clean up post-read processing
Rework the post-read processing logic to be much easier to understand.

At least one bug is fixed by this: if an I/O error occurred when reading
from disk, decryption and verity would be performed on the uninitialized
data, causing misleading messages in the kernel log.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27 15:20:04 -08:00
Chao Yu
0953fe864c f2fs: fix to tag FIEMAP_EXTENT_MERGED in f2fs_fiemap()
f2fs does not natively support extents in metadata, 'extent' in f2fs
is used as a virtual concept, so in f2fs_fiemap() interface, it needs
to tag FIEMAP_EXTENT_MERGED flag to indicated the extent status is a
result of merging.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27 15:20:03 -08:00
Chao Yu
0b979f1bde f2fs: relocate f2fs_precache_extents()
Relocate f2fs_precache_extents() in prior to check_swap_activate(),
then extent cache can be enabled before its use.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-01-27 15:20:01 -08:00
Christoph Hellwig
67883ade7a f2fs: remove FAULT_ALLOC_BIO
Sleeping bio allocations do not fail, which means that injecting an error
into sleeping bio allocations is a little silly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
25ac84262c f2fs: use blkdev_issue_flush in __submit_flush_wait
Use the blkdev_issue_flush helper instead of duplicating it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
309dca309f block: store a block_device pointer in struct bio
Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device.  From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24 18:17:20 -07:00
Daeho Jeong
6422a71ef4 f2fs: fix race of pending_pages in decompression
I found out f2fs_free_dic() is invoked in a wrong timing, but
f2fs_verify_bio() still needed the dic info and it triggered the
below kernel panic. It has been caused by the race condition of
pending_pages value between decompression and verity logic, when
the same compression cluster had been split in different bios.
By split bios, f2fs_verify_bio() ended up with decreasing
pending_pages value before it is reset to nr_cpages by
f2fs_decompress_pages() and caused the kernel panic.

[ 4416.564763] Unable to handle kernel NULL pointer dereference
               at virtual address 0000000000000000
...
[ 4416.896016] Workqueue: fsverity_read_queue f2fs_verity_work
[ 4416.908515] pc : fsverity_verify_page+0x20/0x78
[ 4416.913721] lr : f2fs_verify_bio+0x11c/0x29c
[ 4416.913722] sp : ffffffc019533cd0
[ 4416.913723] x29: ffffffc019533cd0 x28: 0000000000000402
[ 4416.913724] x27: 0000000000000001 x26: 0000000000000100
[ 4416.913726] x25: 0000000000000001 x24: 0000000000000004
[ 4416.913727] x23: 0000000000001000 x22: 0000000000000000
[ 4416.913728] x21: 0000000000000000 x20: ffffffff2076f9c0
[ 4416.913729] x19: ffffffff2076f9c0 x18: ffffff8a32380c30
[ 4416.913731] x17: ffffffc01f966d97 x16: 0000000000000298
[ 4416.913732] x15: 0000000000000000 x14: 0000000000000000
[ 4416.913733] x13: f074faec89ffffff x12: 0000000000000000
[ 4416.913734] x11: 0000000000001000 x10: 0000000000001000
[ 4416.929176] x9 : ffffffff20d1f5c7 x8 : 0000000000000000
[ 4416.929178] x7 : 626d7464ff286b6b x6 : ffffffc019533ade
[ 4416.929179] x5 : 000000008049000e x4 : ffffffff2793e9e0
[ 4416.929180] x3 : 000000008049000e x2 : ffffff89ecfa74d0
[ 4416.929181] x1 : 0000000000000c40 x0 : ffffffff2076f9c0
[ 4416.929184] Call trace:
[ 4416.929187]  fsverity_verify_page+0x20/0x78
[ 4416.929189]  f2fs_verify_bio+0x11c/0x29c
[ 4416.929192]  f2fs_verity_work+0x58/0x84
[ 4417.050667]  process_one_work+0x270/0x47c
[ 4417.055354]  worker_thread+0x27c/0x4d8
[ 4417.059784]  kthread+0x13c/0x320
[ 4417.063693]  ret_from_fork+0x10/0x18

Chao pointed this can happen by the below race condition.

Thread A        f2fs_post_read_wq          fsverity_wq
- f2fs_read_multi_pages()
  - f2fs_alloc_dic
   - dic->pending_pages = 2
   - submit_bio()
   - submit_bio()
               - f2fs_post_read_work() handle first bio
                - f2fs_decompress_work()
                 - __read_end_io()
                  - f2fs_decompress_pages()
                   - dic->pending_pages--
                - enqueue f2fs_verity_work()
                                           - f2fs_verity_work() handle first bio
                                            - f2fs_verify_bio()
                                             - dic->pending_pages--
               - f2fs_post_read_work() handle second bio
                - f2fs_decompress_work()
                - enqueue f2fs_verity_work()
                                            - f2fs_verify_pages()
                                            - f2fs_free_dic()

                                          - f2fs_verity_work() handle second bio
                                           - f2fs_verfy_bio()
                                                 - use-after-free on dic

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-08 15:39:14 -08:00
Jaegeuk Kim
10208567f1 f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size
This patch adds max_io_bytes to limit bio size when f2fs tries to merge
consecutive IOs. This can give a testing point to split out bios and check
end_io handles those bios correctly. This is used to capture a recent bug
on the decompression and fsverity flow.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-07 08:41:25 -08:00
Daeho Jeong
602a16d58e f2fs: add compress_mode mount option
We will add a new "compress_mode" mount option to control file
compression mode. This supports "fs" and "user". In "fs" mode (default),
f2fs does automatic compression on the compression enabled files.
In "user" mode, f2fs disables the automaic compression and gives the
user discretion of choosing the target file and the timing. It means
the user can do manual compression/decompression on the compression
enabled files using ioctls.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-03 00:11:57 -08:00
Jaegeuk Kim
b876f4c94c f2fs: remove buffer_head which has 32bits limit
This patch removes buffer_head dependency when getting block addresses.
Light reported there's a 32bit issue in f2fs_fiemap where map_bh.b_size is
32bits while len is 64bits given by user. This will give wrong length to
f2fs_map_block.

Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-02 22:00:23 -08:00
Jaegeuk Kim
963ba7f983 f2fs: fix wrong block count instead of bytes
We should convert cur_lblock, a block count, to bytes for len.

Fixes: af4b6b8edf ("f2fs: introduce check_swap_activate_fast()")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-02 22:00:22 -08:00
Jaegeuk Kim
43b9d4b4d9 f2fs: use new conversion functions between blks and bytes
This patch cleans up blks and bytes conversions.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-02 22:00:22 -08:00
Jaegeuk Kim
6cbfcab5ff f2fs: rename logical_to_blk and blk_to_logical
This patch renames two functions like below having u64.
 - logical_to_blk to bytes_to_blks
 - blk_to_logical to blks_to_bytes

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-02 22:00:22 -08:00
Chao Yu
af4b6b8edf f2fs: introduce check_swap_activate_fast()
check_swap_activate() will lookup block mapping via bmap() one by one, so
its performance is very bad, this patch introduces check_swap_activate_fast()
to use f2fs_fiemap() to boost this process, since f2fs_fiemap() will lookup
block mappings in batch, therefore, it can improve swapon()'s performance
significantly.

Note that this enhancement only works when page size is equal to f2fs' block
size.

Testcase: (backend device: zram)
- touch file
- pin & fallocate file to 8GB
- mkswap file
- swapon file

Before:
real	0m2.999s
user	0m0.000s
sys	0m2.980s

After:
real	0m0.081s
user	0m0.000s
sys	0m0.064s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-10-13 23:23:34 -07:00
Jaegeuk Kim
adfc694330 f2fs: fix slab leak of rpages pointer
This fixes the below mem leak.

[  130.157600] =============================================================================
[  130.159662] BUG f2fs_page_array_entry-252:16 (Tainted: G        W  O     ): Objects remaining in f2fs_page_array_entry-252:16 on __kmem_cache_shutdown()
[  130.162742] -----------------------------------------------------------------------------
[  130.162742]
[  130.164979] Disabling lock debugging due to kernel taint
[  130.166188] INFO: Slab 0x000000009f5a52d2 objects=22 used=4 fp=0x00000000ba72c3e9 flags=0xfffffc0010200
[  130.168269] CPU: 7 PID: 3560 Comm: umount Tainted: G    B   W  O      5.9.0-rc4+ #35
[  130.170019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[  130.171941] Call Trace:
[  130.172528]  dump_stack+0x74/0x9a
[  130.173298]  slab_err+0xb7/0xdc
[  130.174044]  ? kernel_poison_pages+0xc0/0xc0
[  130.175065]  ? on_each_cpu_cond_mask+0x48/0x90
[  130.176096]  __kmem_cache_shutdown.cold+0x34/0x141
[  130.177190]  kmem_cache_destroy+0x59/0x100
[  130.178223]  f2fs_destroy_page_array_cache+0x15/0x20 [f2fs]
[  130.179527]  f2fs_put_super+0x1bc/0x380 [f2fs]
[  130.180538]  generic_shutdown_super+0x72/0x110
[  130.181547]  kill_block_super+0x27/0x50
[  130.182438]  kill_f2fs_super+0x76/0xe0 [f2fs]
[  130.183448]  deactivate_locked_super+0x3b/0x80
[  130.184456]  deactivate_super+0x3e/0x50
[  130.185363]  cleanup_mnt+0x109/0x160
[  130.186179]  __cleanup_mnt+0x12/0x20
[  130.187003]  task_work_run+0x70/0xb0
[  130.187841]  exit_to_user_mode_prepare+0x18f/0x1b0
[  130.188917]  syscall_exit_to_user_mode+0x31/0x170
[  130.189989]  do_syscall_64+0x45/0x90
[  130.190828]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  130.191986] RIP: 0033:0x7faf868ea2eb
[  130.192815] Code: 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 7b 0c 00 f7 d8 64 89 01
[  130.196872] RSP: 002b:00007fffb7edb478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[  130.198494] RAX: 0000000000000000 RBX: 00007faf86a18204 RCX: 00007faf868ea2eb
[  130.201021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055971df71c50
[  130.203415] RBP: 000055971df71a40 R08: 0000000000000000 R09: 00007fffb7eda1f0
[  130.205772] R10: 00007faf86a04339 R11: 0000000000000246 R12: 000055971df71c50
[  130.208150] R13: 0000000000000000 R14: 000055971df71b38 R15: 0000000000000000
[  130.210515] INFO: Object 0x00000000a980843a @offset=744
[  130.212476] INFO: Allocated in page_array_alloc+0x3d/0xe0 [f2fs] age=1572 cpu=0 pid=3297
[  130.215030] 	__slab_alloc+0x20/0x40
[  130.216566] 	kmem_cache_alloc+0x2a0/0x2e0
[  130.218217] 	page_array_alloc+0x3d/0xe0 [f2fs]
[  130.219940] 	f2fs_init_compress_ctx+0x1f/0x40 [f2fs]
[  130.221736] 	f2fs_write_cache_pages+0x3db/0x860 [f2fs]
[  130.223591] 	f2fs_write_data_pages+0x2c9/0x300 [f2fs]
[  130.225414] 	do_writepages+0x43/0xd0
[  130.226907] 	__filemap_fdatawrite_range+0xd5/0x110
[  130.228632] 	filemap_write_and_wait_range+0x48/0xb0
[  130.230336] 	__generic_file_write_iter+0x18a/0x1d0
[  130.232035] 	f2fs_file_write_iter+0x226/0x550 [f2fs]
[  130.233737] 	new_sync_write+0x113/0x1a0
[  130.235204] 	vfs_write+0x1a6/0x200
[  130.236579] 	ksys_write+0x67/0xe0
[  130.237898] 	__x64_sys_write+0x1a/0x20
[  130.239309] 	do_syscall_64+0x38/0x90

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29 09:16:36 -07:00
Chao Yu
c8eb702484 f2fs: clean up kvfree
After commit 0b6d4ca04a ("f2fs: don't return vmalloc() memory from
f2fs_kmalloc()"), f2fs_k{m,z}alloc() will not return vmalloc()'ed
memory, so clean up to use kfree() instead of kvfree() to free
vmalloc()'ed memory.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-14 11:15:37 -07:00
Daeho Jeong
78134d0351 f2fs: change return value of f2fs_disable_compressed_file to bool
The returned integer is not required anywhere. So we need to change
the return value to bool type.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-11 11:11:26 -07:00
Daeho Jeong
4eda1682cd f2fs: add block address limit check to compressed file
Need to add block address range check to compressed file case and
avoid calling get_data_block_bmap() for compressed file.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-11 11:11:25 -07:00
Jack Qiu
335cac8b25 f2fs: correct statistic of APP_DIRECT_IO/APP_DIRECT_READ_IO
Miss to update APP_DIRECT_IO/APP_DIRECT_READ_IO when receiving async DIO.
For example: fio -filename=/data/test.0 -bs=1m -ioengine=libaio -direct=1
		-name=fill -size=10m -numjobs=1 -iodepth=32 -rw=write

Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-11 11:11:24 -07:00