1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

56429 commits

Author SHA1 Message Date
Maciej W. Rozycki
2f819db565
binfmt_elf: Respect error return from `regset->active'
The regset API documented in <linux/regset.h> defines -ENODEV as the
result of the `->active' handler to be used where the feature requested
is not available on the hardware found.  However code handling core file
note generation in `fill_thread_core_info' interpretes any non-zero
result from the `->active' handler as the regset requested being active.
Consequently processing continues (and hopefully gracefully fails later
on) rather than being abandoned right away for the regset requested.

Fix the problem then by making the code proceed only if a positive
result is returned from the `->active' handler.

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 4206d3aa19 ("elf core dump: notes user_regset")
Patchwork: https://patchwork.linux-mips.org/patch/19332/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
2018-07-19 13:46:34 -07:00
Filipe Manana
bd3599a0e1 Btrfs: fix file data corruption after cloning a range and fsync
When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).

The following example, turned into a test for fstests, reproduces the
issue:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
  $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar

  $ xfs_io -c "fsync" /mnt/bar
  # reflink destination offset corresponds to the size of file bar,
  # 2728Kb minus 4Kb.
  $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
  $ xfs_io -c "fsync" /mnt/bar

  $ md5sum /mnt/bar
  95a95813a8c2abc9aa75a6c2914a077e  /mnt/bar

  <power fail>

  $ mount /dev/sdb /mnt
  $ md5sum /mnt/bar
  207fd8d0b161be8a84b945f0df8d5f8d  /mnt/bar
  # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
  # power failure

In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.

Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).

CC: stable@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-07-19 15:36:31 +02:00
Linus Torvalds
04a1320651 for-4.18-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAltPMI0ACgkQxWXV+ddt
 WDvmXw//fyV+2hoARngzjd+4o32YHfxdf+Xv4XCnMsZVKOHKkqX8qrmMNyX0sd4w
 5NwUZpv/mZ4LHnm4M+EMGJWjXL/oXkLGrDzndninNC+u7GlFVieZ/aF5D96z6rOm
 p45wGETYvAbZI7XZ3dLebpIDqr+eXOhx3lpJTAKY5sfTIwzwJ+KC5vFYdt+Rz4cr
 cbjwHhRUsRfu1I0SSjUVFIC5frtegIzbDgjWNiLLO44ozbDAH3j1SufOgNLb5GFM
 n+eh0xIHDNLOrH3aVKO19zk9NigVBu96/FJnIz0+Jzs67hifksfZWVDV5vKetUxA
 M46aqtTrSVb/NJ/RHkQkyWiJjZqioXXx+KsZjdU63fyv4iu0+o2HV0uY/Pifm+X/
 fCS7xbQOhWJySQ+6mAjxXB9eo0RqO+RIGGIV9gJWZKt3S3DvAUmvd980jeHUtXRB
 VwMwmnvqvYaGWLWmaTRm1mjdmhCX2JdNN2RMmVN36tGfed0uopIFeax2rtWJ4153
 V+8eZWaLkvvT3iGu+XLUhEfv3UCUy7N1LDk8toe7Xp+qIMvWus3GIsKAUCmJJ3b+
 sGmbYSgn5v8TR65m5QO4/ZWmt4/bi/2Usd6Cq3vd0Op08kTWBTxjdelAVm+dlEYb
 sZLIMrxPg8ogEw8qX4GxROa8/1z9F/62RSmHfk4W7InY2AMJJAg=
 =Ga4m
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Three regression fixes. They're few-liners and fixing some corner
  cases missed in the origial patches"

* tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block()
  btrfs: fix use-after-free of cmp workspace pages
  btrfs: restore uuid_mutex in btrfs_open_devices
2018-07-18 11:13:25 -07:00
Michael Callahan
dbae2c5513 block: Define and use STAT_READ and STAT_WRITE
Add defines for STAT_READ and STAT_WRITE for indexing the partition
stat entries. This clarifies some fs/ code which has hardcoded 1 for
STAT_WRITE and will make it easier to extend the stats with additional
fields.

tj: Refreshed on top of v4.17.

Signed-off-by: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-18 08:44:18 -06:00
Tejun Heo
3f289dcb4b block: make bdev_ops->rw_page() take a REQ_OP instead of bool
c11f0c0b5b ("block/mm: make bdev_ops->rw_page() take a bool for
read/write") replaced @op with boolean @is_write, which limited the
amount of information going into ->rw_page() and more importantly
page_endio(), which removed the need to expose block internals to mm.

Unfortunately, we want to track discards separately and @is_write
isn't enough information.  This patch updates bdev_ops->rw_page() to
take REQ_OP instead but leaves page_endio() to take bool @is_write.
This allows the block part of operations to have enough information
while not leaking it to mm.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-18 08:44:14 -06:00
Arnd Bergmann
5f7a01e222 jffs2: use unsigned 32-bit timstamps consistently
Most users of jffs2 are 32-bit systems that traditionally only support
timestamps using a 32-bit signed time_t, in the range from years 1902 to
2038. On 64-bit systems, jffs2 however interpreted the same timestamps
as unsigned values, reading back negative times (before 1970) as times
between 2038 and 2106.

Now that Linux supports 64-bit inode timestamps even on 32-bit systems,
let's use the second interpretation everywhere to allow jffs2 to be
used on 32-bit systems beyond 2038 without a fundamental change to the
inode format.

This has a slight risk of regressions, when existing files with timestamps
before 1970 are present in file system images and are now interpreted
as future time stamps. I considered moving the wraparound point a bit,
e.g. to 1960, in order to deal with timestamps that ended up on Dec 31,
1969 due to incorrect timezone handling. However, this would complicate
the implementation unnecessarily, so I went with the simplest possible
method of extending the timestamps.

Writing files with timestamps before 1970 or after 2106 now results
in those times being clamped in the file system.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-07-18 16:44:01 +02:00
Arnd Bergmann
c4592b9c37 jffs2: use 64-bit intermediate timestamps
The VFS now uses timespec64 timestamps consistently, but jffs2 still
converts them to 32-bit numbers on the storage medium. As the helper
functions for the conversion (get_seconds() and timespec_to_timespec64())
are now deprecated, let's change them over to the more modern
replacements.

This keeps the traditional interpretation of those values, where
the on-disk 32-bit numbers are taken to be negative numbers, i.e.
dates before 1970, on 32-bit machines, but future numbers past 2038
on 64-bit machines.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-07-18 16:43:58 +02:00
Miklos Szeredi
670c23248e ovl: obsolete "check_copy_up" module option
This was provided for debugging the ro/rw inconsistecy.  The inconsitency
is now gone so this option is obsolete.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:44 +02:00
Miklos Szeredi
fb16043b46 vfs: remove open_flags from d_real()
Opening regular files on overlayfs is now handled via ovl_open().  Remove
the now unused "open_flags" argument from d_op->d_real() and the d_real()
helper.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:44 +02:00
Miklos Szeredi
de2a4a501e Partially revert "locks: fix file locking on overlayfs"
This partially reverts commit c568d68341.

Overlayfs files will now automatically get the correct locks, no need to
hack overlay support in VFS.

It is a partial revert, because it leaves the locks_inode() calls in place
and defines locks_inode() to file_inode().  We could revert those as well,
but it would be unnecessary code churn and it makes sense to document that
we are getting the inode for locking purposes.

Don't revert MS_NOREMOTELOCK yet since that has been part of the userspace
API for some time (though not in a useful way).  Will try to remove
internal flags later when the dust around the new mount API settles.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
2018-07-18 15:44:43 +02:00
Miklos Szeredi
8cf9ee5061 Revert "vfs: do get_write_access() on upper layer of overlayfs"
This reverts commit 4d0c5ba2ff.

We now get write access on both overlay and underlying layers so this patch
is no longer needed for correct operation.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:43 +02:00
Miklos Szeredi
4ab30319fd Revert "vfs: add flags to d_real()"
This reverts commit 495e642939.

No user of "flags" argument of d_real() remain.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:43 +02:00
Miklos Szeredi
c671854346 Revert "vfs: update ovl inode before relatime check"
This reverts commit 598e3c8f72.

Overlayfs no longer relies on the vfs correct atime handling.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:43 +02:00
Miklos Szeredi
88059de155 Revert "ovl: fix relatime for directories"
This reverts commit cd91304e71.

Overlayfs no longer relies on the vfs correct atime handling.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:43 +02:00
Miklos Szeredi
a6795a5859 vfs: fix freeze protection in mnt_want_write_file() for overlayfs
The underlying real file used by overlayfs still contains the overlay path.
This results in mnt_want_write_file() calls by the filesystem getting
freeze protection on the wrong inode (the overlayfs one instead of the real
one).

Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb.

Reported-by: Amir Goldstein <amir73il@gmail.com> 
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-07-18 15:44:43 +02:00
Miklos Szeredi
6742cee043 Revert "ovl: don't allow writing ioctl on lower layer"
This reverts commit 7c6893e3c9.

Overlayfs no longer relies on the vfs for checking writability of files.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:43 +02:00
Miklos Szeredi
d561f21856 Revert "ovl: fix may_write_real() for overlayfs directories"
This reverts commit 954c736f86.

Overlayfs no longer relies on the vfs for checking writability of files.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:43 +02:00
Miklos Szeredi
a6518f73e6 vfs: don't open real
Let overlayfs do its thing when opening a file.

This enables stacking and fixes the corner case when a file is opened for
read, modified through a writable open, and data is read from the read-only
file.  After this patch the read-only open will not return stale data even
in this case.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-18 15:44:42 +02:00
Miklos Szeredi
8ede205541 ovl: add reflink/copyfile/dedup support
Since set of arguments are so similar, handle in a common helper.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:42 +02:00
Miklos Szeredi
f7c72396d0 ovl: add O_DIRECT support
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:42 +02:00
Miklos Szeredi
9e142c4102 ovl: add ovl_fiemap()
Implement stacked fiemap().

Need to split inode operations for regular file (which has fiemap) and
special file (which doesn't have fiemap).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:42 +02:00
Miklos Szeredi
dab5ca8fd9 ovl: add lsattr/chattr support
Implement FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:42 +02:00
Miklos Szeredi
aab8848cee ovl: add ovl_fallocate()
Implement stacked fallocate.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:42 +02:00
Miklos Szeredi
2f502839e8 ovl: add ovl_mmap()
Implement stacked mmap.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:42 +02:00
Miklos Szeredi
de30dfd629 ovl: add ovl_fsync()
Implement stacked fsync().

Don't sync if lower (noticed by Amir Goldstein).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:42 +02:00
Miklos Szeredi
2a92e07edc ovl: add ovl_write_iter()
Implement stacked writes.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi
16914e6fc7 ovl: add ovl_read_iter()
Implement stacked reading.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi
2ef66b8a03 ovl: add helper to return real file
In the common case we can just use the real file cached in
file->private_data.  There are two exceptions:

1) File has been copied up since open: in this unlikely corner case just
use a throwaway real file for the operation.  If ever this becomes a
perfomance problem (very unlikely, since overlayfs has been doing most fine
without correctly handling this case at all), then we can deal with that by
updating the cached real file.

2) File's f_flags have changed since open: no need to reopen the cached
real file, we can just change the flags there as well.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi
d1d04ef857 ovl: stack file ops
Implement file operations on a regular overlay file.  The underlying file
is opened separately and cached in ->private_data.

It might be worth making an exception for such files when accounting in
nr_file to confirm to userspace expectations.  We are only adding a small
overhead (248bytes for the struct file) since the real inode and dentry are
pinned by overlayfs anyway.

This patch doesn't have any effect, since the vfs will use d_real() to find
the real underlying file to open.  The patch at the end of the series will
actually enable this functionality.

AV: make it use open_with_fake_path(), don't mess with override_creds

SzM: still need to mess with override_creds() until no fs uses
current_cred() in their open method.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-18 15:44:41 +02:00
Miklos Szeredi
e8c985bace ovl: deal with overlay files in ovl_d_real()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi
46e5d0a390 ovl: copy up file size as well
Copy i_size of the underlying inode to the overlay inode in ovl_copyattr().

This is in preparation for stacking I/O operations on overlay files.

This patch shouldn't have any observable effect.

Remove stale comment from ovl_setattr() [spotted by Vivek Goyal].

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi
5812160eb5 Revert "Revert "ovl: get_write_access() in truncate""
This reverts commit 31c3a70695.

Re-add functionality dealing with i_writecount on truncate to overlayfs.
This patch shouldn't have any observable effects, since we just re-assert
the writecout that vfs_truncate() already got for us.

This is in preparation for moving overlay functionality out of the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi
4f3572954a ovl: copy up inode flags
On inode creation copy certain inode flags from the underlying real inode
to the overlay inode.

This is in preparation for moving overlay functionality out of the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi
d9854c87f0 ovl: copy up times
Copy up mtime and ctime to overlay inode after times in real object are
modified.  Be careful not to dirty cachelines when not necessary.

This is in preparation for moving overlay functionality out of the VFS.

This patch shouldn't have any observable effect.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:40 +02:00
Miklos Szeredi
f182536684 vfs: export vfs_dedupe_file_range_one() to modules
This is needed by the stacked dedupe implementation in overlayfs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:40 +02:00
Miklos Szeredi
9df6702ad0 vfs: export vfs_ioctl() to modules
This is needed by the stacked ioctl implementation in overlayfs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:40 +02:00
Miklos Szeredi
d3b1084dfd vfs: make open_with_fake_path() not contribute to nr_files
Stacking file operations in overlay will store an extra open file for each
overlay file opened.

The overhead is just that of "struct file" which is about 256bytes, because
overlay already pins an extra dentry and inode when the file is open, which
add up to a much larger overhead.

For fear of breaking working setups, don't start accounting the extra file.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:40 +02:00
Miklos Szeredi
51e6ce820b Merge branch 'dedupe-cleanup' into overlayfs-next
Following series for stacking overlay files depends on this mini series.
2018-07-18 15:39:29 +02:00
Miklos Szeredi
9951934d76 Merge branch 'for-ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into overlayfs-next
This gives us the open_with_fake_path() helper that is needed for stacked
open files in overlay and mmap in particular.
2018-07-18 10:46:05 +02:00
Christoph Hellwig
9ba546c019 aio: don't expose __aio_sigset in uapi
glibc uses a different defintion of sigset_t than the kernel does,
and the current version would pull in both.  To fix this just do not
expose the type at all - this somewhat mirrors pselect() where we
do not even have a type for the magic sigmask argument, but just
use pointer arithmetics.

Fixes: 7a074e96 ("aio: implement io_pgetevents")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Adrian Reber <adrian@lisas.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-17 23:26:58 -04:00
Carlos Maiolino
5089eafffb libxfs: Fix a couple of sparse complaintis
No significant changes, just silence a couple of sparse errors.

Using cpu_to_be32(NULLAGINO), the NULLAGINO constant will be encoded in
BE as a constant, avoiding a BE -> CPU conversion every iteraction of
the loop, if be32_to_cpu(agi->agi_unlinked[i]) was used instead.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-17 14:25:58 -07:00
Gustavo A. R. Silva
e4e542a683 xfs: use swap macro in xfs_dir2_leafn_rebalance
Make use of the swap macro and remove unnecessary variable *tmp*. This
makes the code easier to read and maintain. Also, slightly refactor some
code.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-17 14:25:57 -07:00
Gustavo A. R. Silva
897992b7e3 xfs_bmap_util: use swap macro
Make use of the swap macro and remove some unnecessary variables.
This makes the code easier to read and maintain. Also, reduces the
stack usage.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-17 14:25:57 -07:00
Gustavo A. R. Silva
1d5bebbafc xfs_attr_leaf: use swap macro in xfs_attr3_leaf_rebalance
Make use of the swap macro and remove some unnecessary variables.
This makes the code easier to read and maintain. Also, reduces the
stack usage.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-17 14:25:57 -07:00
Darrick J. Wong
fa248de98a xfs: don't assume a left rmap when allocating a new rmap
The original rmap code assumed that there would always be at least one
rmap in the rmapbt (the AG sb/agf/agi) and so errored out if it didn't
find one.  This assumption isn't true for the rmapbt repair function
(and it won't be true for realtime rmap either), so remove the check and
just deal with the situation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-07-17 14:25:57 -07:00
Amir Goldstein
6781069307 ovl: fix wrong use of impure dir cache in ovl_iterate()
Only upper dir can be impure, but if we are in the middle of
iterating a lower real dir, dir could be copied up and marked
impure. We only want the impure cache if we started iterating
a real upper dir to begin with.

Aditya Kali reported that the following reproducer hits the
WARN_ON(!cache->refcount) in ovl_get_cache():

 docker run --rm drupal:8.5.4-fpm-alpine \
    sh -c 'cd /var/www/html/vendor/symfony && \
           chown -R www-data:www-data . && ls -l .'

Reported-by: Aditya Kali <adityakali@google.com>
Tested-by: Aditya Kali <adityakali@google.com>
Fixes: 4edb83bb10 ('ovl: constant d_ino for non-merge dirs')
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-17 16:04:34 +02:00
Mike Christie
cc57c07343 configfs: fix registered group removal
This patch fixes a bug where configfs_register_group had added
a group in a tree, and userspace has done a rmdir on a dir somewhere
above that group and we hit a kernel crash. The problem is configfs_rmdir
will detach everything under it and unlink groups on the default_groups
list. It will not unlink groups added with configfs_register_group so when
configfs_unregister_group is called to drop its references to the group/items
we crash when we try to access the freed dentrys.

The patch just adds a check for if a rmdir has been done above
us and if so just does the unlink part of unregistration.

Sorry if you are getting this multiple times. I thouhgt I sent
this to some of you and lkml, but I do not see it.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-17 06:14:07 -07:00
Qu Wenruo
665d4953cd btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block()
In commit ac0b4145d6 ("btrfs: scrub: Don't use inode pages for device
replace") we removed the branch of copy_nocow_pages() to avoid
corruption for compressed nodatasum extents.

However above commit only solves the problem in scrub_extent(), if
during scrub_pages() we failed to read some pages,
sctx->no_io_error_seen will be non-zero and we go to fixup function
scrub_handle_errored_block().

In scrub_handle_errored_block(), for sctx without csum (no matter if
we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
which does the similar thing with copy_nocow_pages(), but does it
without the extra check in copy_nocow_pages() routine.

So for test cases like btrfs/100, where we emulate read errors during
replace/scrub, we could corrupt compressed extent data again.

This patch will fix it just by avoiding any "optimization" for
nodatasum, just falls back to the normal fixup routine by try read from
any good copy.

This also solves WARN_ON() or dead lock caused by lame backref iteration
in scrub_fixup_nodatasum() routine.

The deadlock or WARN_ON() won't be triggered before commit ac0b4145d6
("btrfs: scrub: Don't use inode pages for device replace") since
copy_nocow_pages() have better locking and extra check for data extent,
and it's already doing the fixup work by try to read data from any good
copy, so it won't go scrub_fixup_nodatasum() anyway.

This patch disables the faulty code and will be removed completely in a
followup patch.

Fixes: ac0b4145d6 ("btrfs: scrub: Don't use inode pages for device replace")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-07-17 13:56:30 +02:00
Ingo Molnar
52b544bd38 Linux 4.18-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltLpVUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGWisH/ikONMwV7OrSk36Y
 5rxzTFUoBk0Qffct88gtSNuRVCxaVb1ofCndvFJE6A6HfJkWpbBzH6eq90aakmJi
 f7uFcu4YmsQpeQaf9lpftWmY2vDf2fIadVTV0RnSMXks57wMax1cpBe7LJGpz13e
 f+g5XRVs1MdlZVtr6tG2SU3Y5AqVVVsYe/0DBPonEqeh9/JJbPFCuNkFOxxzAqPu
 VTnjyoOqG8qtZzjklNtR5rZn0Gv592tWX36eiWTQdThNmVFkGEAJwsHCQlY4OQYK
 61QN4UhOHiu8e1ZuGDNEDhNVRnKtaaYUPFeWL1wLRW73ul4P3ZkpvpS8QTMwcFJI
 JjzNOkI=
 =ckcO
 -----END PGP SIGNATURE-----

Merge tag 'v4.18-rc5' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-17 09:27:43 +02:00
Jaegeuk Kim
1cb50f87e1 f2fs: do checkpoint in kill_sb
When unmounting f2fs in force mode, we can get it stuck by io_schedule()
by some pending IOs in meta_inode.

io_schedule+0xd/0x30
wait_on_page_bit_common+0xc6/0x130
__filemap_fdatawait_range+0xbd/0x100
filemap_fdatawait_keep_errors+0x15/0x40
sync_inodes_sb+0x1cf/0x240
sync_filesystem+0x52/0x90
generic_shutdown_super+0x1d/0x110
kill_f2fs_super+0x28/0x80 [f2fs]
deactivate_locked_super+0x35/0x60
cleanup_mnt+0x36/0x70
task_work_run+0x79/0xa0
exit_to_usermode_loop+0x62/0x70
do_syscall_64+0xdb/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
0xffffffffffffffff

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-15 12:14:04 +09:00