1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

56429 commits

Author SHA1 Message Date
Jon Derrick
a17712c8e4 ext4: check superblock mapped prior to committing
This patch attempts to close a hole leading to a BUG seen with hot
removals during writes [1].

A block device (NVME namespace in this test case) is formatted to EXT4
without partitions. It's mounted and write I/O is run to a file, then
the device is hot removed from the slot. The superblock attempts to be
written to the drive which is no longer present.

The typical chain of events leading to the BUG:
ext4_commit_super()
  __sync_dirty_buffer()
    submit_bh()
      submit_bh_wbc()
        BUG_ON(!buffer_mapped(bh));

This fix checks for the superblock's buffer head being mapped prior to
syncing.

[1] https://www.spinics.net/lists/linux-ext4/msg56527.html

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-07-02 18:45:18 -04:00
Andreas Gruenbacher
025d0e7f73 gfs2: Remove gfs2_write_{begin,end}
Now that generic_file_write_iter is no longer used, there are no
remaining users of these address space operations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:27:38 +01:00
Andreas Gruenbacher
967bcc91b0 gfs2: iomap direct I/O support
The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:27:32 +01:00
Andreas Gruenbacher
bcfe94139a gfs2: gfs2_extent_length cleanup
Now that gfs2_extent_length is no longer used for determining the size
of a hole and always with an upper size limit, the function can be
simplified.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:27:24 +01:00
Andreas Gruenbacher
64bc06bb32 gfs2: iomap buffered write support
With the traditional page-based writes, blocks are allocated separately
for each page written to.  With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.

Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:27:17 +01:00
Andreas Gruenbacher
d505a96a3b gfs2: Further iomap cleanups
In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage.  This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.

Further clean up gfs2_iomap_get and implement gfs2_stuffed_iomap here
directly.  For reads beyond the end of the file, return holes instead of
failing with -ENOENT so that we can get rid of that special case in
gfs2_block_map.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:26:01 +01:00
Guenter Roeck
1823342a1f configfs: replace strncpy with memcpy
gcc 8.1.0 complains:

fs/configfs/symlink.c:67:3: warning:
	'strncpy' output truncated before terminating nul copying as many
	bytes from a string as its length
fs/configfs/symlink.c: In function 'configfs_get_link':
fs/configfs/symlink.c:63:13: note: length computed here

Using strncpy() is indeed less than perfect since the length of data to
be copied has already been determined with strlen(). Replace strncpy()
with memcpy() to address the warning and optimize the code a little.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-02 07:12:55 -06:00
Linus Torvalds
d3bc0e67f8 for-4.18-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAls4zz0ACgkQxWXV+ddt
 WDs0ZhAAplEAcN1BP986BS7GpjrG20vQtP9AnHlnSEJJJmsnykpspBylOcLRkKjF
 LKBfBPCKqIo7kn5ebKT1Kk7zJPkOOEfmxGW7hffVN/oa/oMtmgJbbHDUgl2TDgdu
 rky1O+Bj+S37s5rhiXAJ4oU9ekdpWIlN30GczfynjiPqGigKh/cINsEEhQIIAiJG
 PRDQfSIJeh67x1AP0KE8sJAYSsaeFxT+kHrT/NPs1NFDSzrQSa/QWPFVjGVVuI/Y
 w84Mo0EqdRV7tap7D3QyWyYea6zdP00PG8TyLl0Kba+LckFbzpNN5hP3SUxleBzL
 0ZBJi7/tOqnrMV3YaGm40dLfgD4B+CFt8zDyg2JvWUxxEzfQfYif7KIT2IV8fSqS
 QrVw2NrzQC7EZ4Zu98wCN7dyyOE8yhqbq805YdG3Nj+zT6DqRu01TBo4Yr/Ek8ux
 +ITAtQVbaOZmTIt/qh/Oxc5jRsurAno1FP3XRH+1hfSlS7xc3LfI1CUbX3jAKzXN
 edxdM4/h+d4nekvROnKBH4EheS6+ZVfgzYlYUW9c2rjcJ1RHhDElbh14+IoM6LKJ
 nJ+Cp+744F6W5jaG3oWElJrdhlY31mWUjiZaj2CHl16EcH3MToytrxKMX+OWo95W
 gChnKicrtpO6+9nbED3Tdhp7SkbDysun6jvEpSgdlm8+2H5Kwrw=
 =KWL7
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "We have a few regression fixes for qgroup rescan status tracking and
  the vm_fault_t conversion that mixed up the error values"

* tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix mount failure when qgroup rescan is in progress
  Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
  btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
2018-07-01 12:38:16 -07:00
Linus Torvalds
4a770e638f Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
 "Followup to procfs-seq_file series this window"

This fixes a memory leak by making sure that proc seq files release any
private data on close.  The 'proc_seq_open' has to be properly paired
with 'proc_seq_release' that releases the extra private data.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  proc: add proc_seq_release
2018-07-01 12:32:19 -07:00
Al Viro
e876c445df hpfs: fix an inode leak in lookup, switch to d_splice_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-30 09:43:45 -04:00
Linus Torvalds
7886953859 A trivial dentry leak fix from Zheng.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAls2PxETHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi51OB/4yN29wPAiCFDeN87gTnkO2flNbATzH
 PCBMydy4nUSfUmnfaMZyMLo0xM6EXIxLte9kGt0VEQwJrw7cwK+IhkhhOQtbH2ER
 /oxul0nAfgEOp0kFzHY7iQzd1qppaxAYtfXLy/BdOsTbntHcHPvp619+JKiuJoXM
 XA2ABV/bdw21FMPq+kP4uyzQsPw/yTXAw47JlVe1PqsCE17Eyj3X8GaMXeTZo52z
 bbOzFJBxlOOHwJ2w6TM4uK6Pr/BgIF4QHDkcNfNzajtdzW9y9wAmVb2GrmkLnwMw
 rou6MgmbB76nUL82SWMMqSeqV1hHGW4iPqu8kpWekIbGOSZJ8a7FIbqo
 =qSQ6
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.18-rc3' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A trivial dentry leak fix from Zheng"

* tag 'ceph-for-4.18-rc3' of git://github.com/ceph/ceph-client:
  ceph: fix dentry leak in splice_dentry()
2018-06-29 12:19:47 -07:00
Al Viro
b0c6108ecf nfs_instantiate(): prevent multiple aliases for directory inode
Since NFS allows open-by-fhandle, we have to cope with the possibility
of mkdir vs. open-by-guessed-handle races.  A local filesystem could
decide what the inumber of the new object will be and insert a locked
inode with that inumber into icache _before_ the on-disk data structures
begin to look good and unlock it only once it has a dentry alias, so
that open-by-handle coming first would quietly fail and mkdir coming
first would have open-by-handle grab its dentry.

For NFS it's a non-starter - the icache key is server-supplied fhandle
and we do not get that until the object has been fully created on server.
We really have to deal with the possibility that open-by-handle gets
the in-core inode and attaches a dentry to it before mkdir does.

Solution: let nfs_mkdir() use d_splice_alias() to catch those.  We can
	* get an error.  Just return it to our caller.
	* get NULL - no preexisting dentry aliases, we'd just done what
d_add() would've done.  Success.
	* get a reference to preexisting alias.  In that case the alias
had been moved in place of nfs_mkdir() argument (and hashed there), while
nfs_mkdir() argument is left unhashed negative.  Which is just fine for
->mkdir() callers, all we need is to release the reference we'd got from
d_splice_alias() and report success.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-28 15:28:48 -04:00
Linus Torvalds
a11e1d432b Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL
The poll() changes were not well thought out, and completely
unexplained.  They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead.  That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case.  The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
  individually because that would just be unnecessarily messy  - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 10:40:47 -07:00
Carlos Maiolino
9991274fdd xfs: Initialize variables in xfs_alloc_get_rec before using them
Make sure we initialize *bno and *len, before jumping to out_bad_rec
label, and risk calling xfs_warn() with uninitialized variables.

Coverity: 100898
Coverity: 1437081
Coverity: 1437129
Coverity: 1437191
Coverity: 1437201
Coverity: 1437212
Coverity: 1437341
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-28 06:56:23 -07:00
Filipe Manana
e4e7ede739 Btrfs: fix mount failure when qgroup rescan is in progress
If a power failure happens while the qgroup rescan kthread is running,
the next mount operation will always fail. This is because of a recent
regression that makes qgroup_rescan_init() incorrectly return -EINVAL
when we are mounting the filesystem (through btrfs_read_qgroup_config()).
This causes the -EINVAL error to be returned regardless of any qgroup
flags being set instead of returning the error only when neither of
the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON
are set.

A test case for fstests follows up soon.

Fixes: 9593bf4967 ("btrfs: qgroup: show more meaningful qgroup_rescan_init error message")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-28 11:30:57 +02:00
Chris Mason
717beb96d9 Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
The vm_fault_t conversion commit introduced a ret2 variable for tracking
the integer return values from internal btrfs functions.  It was
sometimes returning VM_FAULT_LOCKED for pages that were actually invalid
and had been removed from the radix.  Something like this:

    ret2 = btrfs_delalloc_reserve_space() // returns zero on success

    lock_page(page)
    if (page->mapping != inode->i_mapping)
	goto out_unlock;

...

out_unlock:
    if (!ret2) {
	    ...
	    return VM_FAULT_LOCKED;
    }

This ends up triggering this WARNING in btrfs_destroy_inode()
    WARN_ON(BTRFS_I(inode)->block_rsv.size);

xfstests generic/095 was able to reliably reproduce the errors.

Since out_unlock: is only used for errors, this fix moves it below the
if (!ret2) check we use to return VM_FAULT_LOCKED for success.

Fixes: a528a24150 (btrfs: change return type of btrfs_page_mkwrite to vm_fault_t)
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-28 11:30:50 +02:00
Qu Wenruo
6f7de19ed3 btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
Commit ff3d27a048 ("btrfs: qgroup: Finish rescan when hit the last leaf
of extent tree") added a new exit for rescan finish.

However after finishing quota rescan, we set
fs_info->qgroup_rescan_progress to (u64)-1 before we exit through the
original exit path.
While we missed that assignment of (u64)-1 in the new exit path.

The end result is, the quota status item doesn't have the same value.
(-1 vs the last bytenr + 1)
Although it doesn't affect quota accounting, it's still better to keep
the original behavior.

Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Fixes: ff3d27a048 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-28 11:30:48 +02:00
Chunyu Hu
877f919e19 proc: add proc_seq_release
kmemleak reported some memory leak on reading proc files. After adding
some debug lines, find that proc_seq_fops is using seq_release as
release handler, which won't handle the free of 'private' field of
seq_file, while in fact the open handler proc_seq_open could create
the private data with __seq_open_private when state_size is greater
than zero. So after reading files created with proc_create_seq_private,
such as /proc/timer_list and /proc/vmallocinfo, the private mem of a
seq_file is not freed. Fix it by adding the paired proc_seq_release
as the default release handler of proc_seq_ops instead of seq_release.

Fixes: 44414d82cf ("proc: introduce proc_create_seq_private")
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-27 20:44:38 -04:00
Linus Torvalds
f57494321c Changes since last update:
- More metadata validation strengthening to prevent crashes.
 - Fix extent offset overflow problem when insert_range on a 512b block fs
 - Fix some off-by-one errors in the realtime fsmap code
 - Fix some math errors in the default resblks calculation when free space
   is low
 - Fix a problem where stale page contents are exposed via mmap read
   after a zero_range at eof
 - Fix accounting problems with per-ag reservations causing statfs
   reports to vary incorrectly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlsv6mQACgkQ+H93GTRK
 tOv3ZA//W9QVf6aGv3fefg7ZkleRpKPhpvrdgy+LnnhGdkToKGsOIJGavFjBYfBh
 cFSOXrKamNiSZiJzzQ8NiFaZ8wrD2NAhOybatp7jAGKGb+a3VTtoz7v+4VwJ5Psw
 gPdZ+zADqsbbUg4Xx08TDqdQDHlIXz6kTvrM+MGBq5TynICAIUgS2o9fJT1HKf12
 xfbC2dh2cK/7TnN0MNCLzekrPoiLfuEPsMkrnWv/PBre0AhfcCJLbNhQi5+mCWcs
 e71vttXAc3kkSly6G4D9SSb+5Mn1Q1SkBsORn82fJdI59eHjFn1ujLvkfPM1w1Iu
 oDjJw8PCyrG7NFZiTh46MuJhvH/Tc6ZZvop/YJyJJApHOYYx6K64enpJrWcg+CKp
 Rcl5gQocvlIfMyzBvyozpHctlIuOjkQsSmyEXXx8PAsr7zU+FehXS5ENzCOtdg5U
 LivcsXmh0bvSU1PdzA3oiANcIPAfVX6+RoM4rQpDjT+IS7Ud9F2hELD0Yi/pmk4l
 oC91+e4oPJ4HsoCDmxePLl29Gqm4n1lY9i3l1PsZe8yWeYxs991R/hJuxKvRQLul
 sxkgKk5nvdG2PjSmAKx9XYRrHEf/3uK+TEtrQtrm06i1QEI881qqSEMHpaoRj0qo
 awESWimVodllaag+ugFLi+u+UAh8LA0jTAmijBqczLtspU1Cg0c=
 =Aw39
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.18-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Here are some patches for 4.18 to fix regressions, accounting
  problems, overflow problems, and to strengthen metadata validation to
  prevent corruption.

  This series has been run through a full xfstests run over the weekend
  and through a quick xfstests run against this morning's master, with
  no major failures reported.

  Changes since last update:

   - more metadata validation strengthening to prevent crashes.

   - fix extent offset overflow problem when insert_range on a 512b
     block fs

   - fix some off-by-one errors in the realtime fsmap code

   - fix some math errors in the default resblks calculation when free
     space is low

   - fix a problem where stale page contents are exposed via mmap read
     after a zero_range at eof

   - fix accounting problems with per-ag reservations causing statfs
     reports to vary incorrectly"

* tag 'xfs-4.18-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation
  xfs: ensure post-EOF zeroing happens after zeroing part of a file
  xfs: fix off-by-one error in xfs_rtalloc_query_range
  xfs: fix uninitialized field in rtbitmap fsmap backend
  xfs: recheck reflink state after grabbing ILOCK_SHARED for a write
  xfs: don't allow insert-range to shift extents past the maximum offset
  xfs: don't trip over negative free space in xfs_reserve_blocks
  xfs: allow empty transactions while frozen
  xfs: xfs_iflush_abort() can be called twice on cluster writeback failure
  xfs: More robust inode extent count validation
  xfs: simplify xfs_bmap_punch_delalloc_range
2018-06-27 12:21:06 -07:00
Henry Wilson
4d97f7d53d inotify: Add flag IN_MASK_CREATE for inotify_add_watch()
The flag IN_MASK_CREATE is introduced as a flag for inotiy_add_watch()
which prevents inotify from modifying any existing watches when invoked.
If the pathname specified in the call has a watched inode associated
with it and IN_MASK_CREATE is specified, fail with an errno of EEXIST.

Use of IN_MASK_CREATE with IN_MASK_ADD is reserved for future use and
will return EINVAL.

RATIONALE

In the current implementation, there is no way to prevent
inotify_add_watch() from modifying existing watch descriptors. Even if
the caller keeps a record of all watch descriptors collected, this is
only sufficient to detect that an existing watch descriptor may have
been modified.

The assumption that a particular path will map to the same inode over
multiple calls to inotify_add_watch() cannot be made as files can be
renamed or deleted.  It is also not possible to assume that two distinct
paths do no map to the same inode, due to hard-links or a dereferenced
symbolic link. Further uses of inotify_add_watch() to revert the change
may cause other watch descriptors to be modified or created, merely
compunding the problem. There is currently no system call such as
inotify_modify_watch() to explicity modify a watch descriptor, which
would be able to revert unwanted changes. Thus the caller cannot
guarantee to be able to revert any changes to existing watch decriptors.

Additionally the caller cannot assume that the events that are
associated with a watch descriptor are within the set requested, as any
future calls to inotify_add_watch() may unintentionally modify a watch
descriptor's mask. Thus it cannot currently be guaranteed that a watch
descriptor will only generate events which have been requested. The
program must filter events which come through its watch descriptor to
within its expected range.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Henry Wilson <henry.wilson@acentic.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 19:21:25 +02:00
Arnd Bergmann
fe2c32545b ext2: use ktime_get_real_seconds for timestamps
get_seconds() is deprecated because of the y2038 overflow, so users
should migrate to 64-bit timestamps using ktime_get_real_seconds().
In ext2, the timestamps in the superblock and in the inode are all
limited to 32-bit, and this won't get fixed, so let's just stop
using the deprecated interface and keep truncating.

All users of ext2 should migrate to ext4 before 2038 to prevent this
from causing problems.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:59:18 +02:00
Arnd Bergmann
c3b9cecd89 udf: convert inode stamps to timespec64
The VFS structures are finally converted to always use 64-bit timestamps,
and this file system can represent a long range of on-disk timestamps
already, so now let's fit in the missing bits for udf.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:58:00 +02:00
Amir Goldstein
eaa2c6b0c9 fanotify: factor out helpers to add/remove mark
Factor out helpers fanotify_add_mark() and fanotify_remove_mark()
to reduce duplicated code.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:45:12 +02:00
Amir Goldstein
3ac70bfcde fsnotify: add helper to get mask from connector
Use a helper to get the mask from the object (i.e. i_fsnotify_mask)
to generalize code of add/remove inode/vfsmount mark.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:45:07 +02:00
Amir Goldstein
36f10f55ff fsnotify: let connector point to an abstract object
Make the code to attach/detach a connector to object more generic
by letting the fsnotify connector point to an abstract fsnotify_connp_t.
Code that needs to dereference an inode or mount object now uses the
helpers fsnotify_conn_{inode,mount}.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:45:05 +02:00
Amir Goldstein
b812a9f589 fsnotify: pass connp and object type to fsnotify_add_mark()
Instead of passing inode and vfsmount arguments to fsnotify_add_mark()
and its _locked variant, pass an abstract object pointer and the object
type.

The helpers fsnotify_obj_{inode,mount} are added to get the concrete
object pointer from abstract object pointer.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:45:03 +02:00
Amir Goldstein
9b6e543450 fsnotify: use typedef fsnotify_connp_t for brevity
The object marks manipulation functions fsnotify_destroy_marks()
fsnotify_find_mark() and their helpers take an argument of type
struct fsnotify_mark_connector __rcu ** to dereference the connector
pointer. use a typedef to describe this type for brevity.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:44:59 +02:00
Yan, Zheng
8b8f53af1e ceph: fix dentry leak in splice_dentry()
In any case, d_splice_alias() does not drop reference of original
dentry.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-26 18:42:44 +02:00
Linus Torvalds
84bfed40fc for-4.18-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlsvtAIACgkQxWXV+ddt
 WDvPJQ//eEr+ACNxG5n42e63TaNpLOeoGAsUChwekP6DAIc2n/r78SHX+T/woDqd
 7/dc2eqlYF5Fmn6MPQ1ufL2xlfw0t3OnemK8T5+4sxZDdzeAH9V+kHaqoaLUPExL
 4r6lK5Ywwa2cWghC7WvQg3+bWLX18OEExG63SlEhLo3YJM5uUqVhGVPi6ARrbxNM
 GJvXcQsxjXLqukm4gYvHC6Zra9Awv65uiAU+VCm2y96j1fEJ0yjK/pC1RtoFGCqS
 48Jjuzfq/V3nxy0Wjr3DvpQVEQcKyGha6i/eazZISdRhGSjrYdvIwpUn7gZeoo2Q
 hT8VVergLbVYgIeaOwgwubQzNaG2C/ZTsEjPQrNrA7a/AGsh5C/ommYE9MSyS6L0
 PG0NLUNDXFmEj8WdI97So+1Sm2OCb04DPbPhHbbkhw5L/MPE1TaLN5aUWguj8laB
 NnyPRdVP9jCJAI0OhJY7nPDmsPKe2jogVVsRheTcob+V5G+vIgzDXlGfsW/88Seg
 dHubMaC0nz6u8Cj4dIviitiLXuustyz0JkVdljTLawWWJ/Hs7NlsSf3Q3nj2Kvia
 e8QMID0vLphQyL1hqC0n7M0g2ohq9NUGT1nhLTPSpFl0l8bIA9PQehOx9Q5vx8yp
 tJF+d0qiNfgvadA+KhyvQ8puQb2+zZQ8+Pqfjwd4ySD7SI5dcA4=
 =fCdm
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Two regression fixes and an incorrect error value propagation fix from
  'rename exchange'"

* tag 'for-4.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix return value on rename exchange failure
  btrfs: fix invalid-free in btrfs_extent_same
  Btrfs: fix physical offset reported by fiemap for inline extents
2018-06-26 08:41:54 -07:00
Darrick J. Wong
d8cb5e4237 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation
In __xfs_ag_resv_init we incorrectly calculate the amount by which to
decrease fdblocks when reserving blocks for the rmapbt.  Because rmapbt
allocations do not decrease fdblocks, we must decrease fdblocks by the
entire size of the requested reservation in order to achieve our goal of
always having enough free blocks to satisfy an rmapbt expansion.

This is in contrast to the refcountbt/finobt, which /do/ subtract from
fdblocks whenever they allocate a block.  For this allocation type we
preserve the existing behavior where we decrease fdblocks only by the
requested reservation minus the size of the existing tree.

This fixes the problem where the available block counts reported by
statfs change across a remount if there had been an rmapbt size change
since mount time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-06-24 12:00:12 -07:00
Darrick J. Wong
e53c4b5983 xfs: ensure post-EOF zeroing happens after zeroing part of a file
If a user asks us to zero_range part of a file, the end of the range is
EOF, and not aligned to a page boundary, invoke writeback of the EOF
page to ensure that the post-EOF part of the page is zeroed.  This
ensures that we don't expose stale memory contents via mmap, if in a
clumsy manner.

Found by running generic/127 when it runs zero_range and mapread at EOF
one after the other.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
a3a374bf18 xfs: fix off-by-one error in xfs_rtalloc_query_range
In commit 8ad560d256 ("xfs: strengthen rtalloc query range checks")
we strengthened the input parameter checks in the rtbitmap range query
function, but introduced an off-by-one error in the process.  The call
to xfs_rtfind_forw deals with the high key being rextents, but we clamp
the high key to rextents - 1.  This causes the returned results to stop
one block short of the end of the rtdev, which is incorrect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
232d0a24b0 xfs: fix uninitialized field in rtbitmap fsmap backend
Initialize the extent count field of the high key so that when we use
the high key to synthesize an 'unknown owner' record (i.e. used space
record) at the end of the queried range we have a field with which to
compute rm_blockcount.  This is not strictly necessary because the
synthesizer never uses the rm_blockcount field, but we can shut up the
static code analysis anyway.

Coverity-id: 1437358
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
5bd88d1539 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write
The reflink iflag could have changed since the earlier unlocked check,
so if we got ILOCK_SHARED for a write and but we're now a reflink inode
we have to switch to ILOCK_EXCL and relock.

This helps us avoid blowing lock assertions in things like generic/166:

XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/xfs_reflink.c, line: 383
WARNING: CPU: 1 PID: 24707 at fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs]
Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
CPU: 1 PID: 24707 Comm: xfs_io Not tainted 4.18.0-rc1-djw #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:assfail+0x25/0x30 [xfs]
Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 e8 ef 1b a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d fd ba 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9
RSP: 0018:ffffc90006423ad8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880030b65e80 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01b0447
RBP: ffffc90006423c10 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88003d43fc30 R11: f000000000000000 R12: ffff880077cda000
R13: 0000000000000000 R14: ffffc90006423c30 R15: ffffc90006423bf9
FS:  00007feba8986800(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000138ab58 CR3: 000000003d40a000 CR4: 00000000000006a0
Call Trace:
 xfs_reflink_allocate_cow+0x24c/0x3d0 [xfs]
 xfs_file_iomap_begin+0x6d2/0xeb0 [xfs]
 ? iomap_to_fiemap+0x80/0x80
 iomap_apply+0x5e/0x130
 iomap_dio_rw+0x2e0/0x400
 ? iomap_to_fiemap+0x80/0x80
 ? xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
 xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
 xfs_file_write_iter+0x7b/0xb0 [xfs]
 __vfs_write+0x16f/0x1f0
 vfs_write+0xc8/0x1c0
 ksys_pwrite64+0x74/0x90
 do_syscall_64+0x56/0x180
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
f62cb48e43 xfs: don't allow insert-range to shift extents past the maximum offset
Zorro Lang reports that generic/485 blows an assert on a filesystem with
512 byte blocks.  The test tries to fallocate a post-eof extent at the
maximum file size and calls insert range to shift the extents right by
two blocks.  On a 512b block filesystem this causes startoff to overflow
the 54-bit startoff field, leading to the assert.

Therefore, always check the rightmost extent to see if it would overflow
prior to invoking the insert range machinery.

Reported-by: zlang@redhat.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200137
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
aafe12cee0 xfs: don't trip over negative free space in xfs_reserve_blocks
If we somehow end up with a filesystem that has fewer free blocks than
the blocks set aside to avoid ENOSPC deadlocks, it's possible that the
free space calculation in xfs_reserve_blocks will spit out a negative
number (because percpu_counter_sum returns s64).  We fail to notice
this negative number and set fdblks_delta to it.  Now we increment
fdblocks(!) and the unsigned type of m_resblks means that we end up
setting a ridiculously huge m_resblks reservation.

Avoid this comedy of errors by detecting the negative free space and
returning -ENOSPC.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
10ee25268e xfs: allow empty transactions while frozen
In commit e89c041338 ("xfs: implement the GETFSMAP ioctl") we
created the ability to obtain empty transactions.  These transactions
have no log or block reservations and therefore can't modify anything.
Since they're also NO_WRITECOUNT they can run while the fs is frozen,
so we don't need to WARN_ON about that usage.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:35 -07:00
Deepa Dinamani
6ff8473507 time: Change types to new y2038 safe __kernel_itimerspec
timer_set/gettime and timerfd_set/get apis use struct itimerspec at the
user interface layer.  struct itimerspec is not y2038-safe.  Change these
interfaces to use y2038-safe struct __kernel_itimerspec instead.  This will
help define new syscalls when 32bit architectures select CONFIG_64BIT_TIME.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: viro@zeniv.linux.org.uk
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: y2038@lists.linaro.org
Link: https://lkml.kernel.org/r/20180617051144.29756-4-deepa.kernel@gmail.com
2018-06-24 14:39:47 +02:00
Al Viro
50f3074011 hostfs_lookup: switch to d_splice_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-23 20:27:29 -04:00
Al Viro
63a67a926e kill dentry_update_name_case()
the last user is gone

Spotted-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-23 17:16:44 -04:00
Filipe Manana
c5b4a50b74 Btrfs: fix return value on rename exchange failure
If we failed during a rename exchange operation after starting/joining a
transaction, we would end up replacing the return value, stored in the
local 'ret' variable, with the return value from btrfs_end_transaction().
So this could end up returning 0 (success) to user space despite the
operation having failed and aborted the transaction, because if there are
multiple tasks having a reference on the transaction at the time
btrfs_end_transaction() is called by the rename exchange, that function
returns 0 (otherwise it returns -EIO and not the original error value).
So fix this by not overwriting the return value on error after getting
a transaction handle.

Fixes: cdd1fedf82 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-22 12:59:08 +02:00
Linus Torvalds
894b8c000a \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlssqQcACgkQnJ2qBz9k
 QNl2XwgAhtb2LvBhWKh3OkBVczICgc9eQho03KiHJkEcR1OCkgjOJhVBDrQmJbBa
 tAuMmUcnTpLJG8CI5TyR0G4hNbttE2LuTu+6RV67hXOjhiYAQ5P9wYLyUqfZf/01
 myrPEewr9qqx3h2htqufiLQIyO1M4FeM37VqdH7vZhQOb+B+FUw7JB9a/HbCpNh/
 7NUDf2GbLtLjK+2Xh0ttXvfWjgbLjC4wMmaPaa5+Nabn+URtvX9aHgQ/dYrOjQ16
 7oD5K5x3k3sOT6Ix7xRGLHgE6Xl6MlTtxpyt5ldb96RwWO/GAuMhSCBd14nS2hmx
 fEx5N2vbs3v8Ux+ZdMauzAKZI6Fz2g==
 =oFo+
 -----END PGP SIGNATURE-----

Merge tag 'for_v4.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull udf, quota, ext2 fixes from Jan Kara:
 "UDF:
   - fix an oops due to corrupted disk image
   - two small cleanups

  quota:
   - a fixfor lru handling
   - cleanup

  ext2:
   - a warning about a deprecated mount option"

* tag 'for_v4.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Drop unused arguments of udf_delete_aext()
  udf: Provide function for calculating dir entry length
  udf: Detect incorrect directory size
  ext2: add warning when specifying nocheck option
  quota: Cleanup list iteration in dqcache_shrink_scan()
  quota: reclaim least recently used dquots
2018-06-22 18:04:56 +09:00
Dave Chinner
e53946dbd3 xfs: xfs_iflush_abort() can be called twice on cluster writeback failure
When a corrupt inode is detected during xfs_iflush_cluster, we can
get a shutdown ASSERT failure like this:

XFS (pmem1): Metadata corruption detected at xfs_symlink_shortform_verify+0x5c/0xa0, inode 0x86627 data fork
XFS (pmem1): Unmount and run xfs_repair
XFS (pmem1): xfs_do_force_shutdown(0x8) called from line 3372 of file fs/xfs/xfs_inode.c.  Return address = ffffffff814f4116
XFS (pmem1): Corruption of in-memory data detected.  Shutting down filesystem
XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8a88
XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8ef9
XFS (pmem1): Please umount the filesystem and rectify the problem(s)
XFS: Assertion failed: xfs_isiflocked(ip), file: fs/xfs/xfs_inode.h, line: 258
.....
Call Trace:
 xfs_iflush_abort+0x10a/0x110
 xfs_iflush+0xf3/0x390
 xfs_inode_item_push+0x126/0x1e0
 xfsaild+0x2c5/0x890
 kthread+0x11c/0x140
 ret_from_fork+0x24/0x30

Essentially, xfs_iflush_abort() has been called twice on the
original inode that that was flushed. This happens because the
inode has been flushed to teh buffer successfully via
xfs_iflush_int(), and so when another inode is detected as corrupt
in xfs_iflush_cluster, the buffer is marked stale and EIO, and
iodone callbacks are run on it.

Running the iodone callbacks walks across the original inode and
calls xfs_iflush_abort() on it. When xfs_iflush_cluster() returns
to xfs_iflush(), it runs the error path for that function, and that
calls xfs_iflush_abort() on the inode a second time, leading to the
above assert failure as the inode is not flush locked anymore.

This bug has been there a long time.

The simple fix would be to just avoid calling xfs_iflush_abort() in
xfs_iflush() if we've got a failure from xfs_iflush_cluster().
However, xfs_iflush_cluster() has magic delwri buffer handling that
means it may or may not have run IO completion on the buffer, and
hence sometimes we have to call xfs_iflush_abort() from
xfs_iflush(), and sometimes we shouldn't.

After reading through all the error paths and the delwri buffer
code, it's clear that the error handling in xfs_iflush_cluster() is
unnecessary. If the buffer is delwri, it leaves it on the delwri
list so that when the delwri list is submitted it sees a shutdown
fliesystem in xfs_buf_submit() and that marks the buffer stale, EIO
and runs IO completion. i.e. exactly what xfs+iflush_cluster() does
when it's not a delwri buffer. Further, marking a buffer stale
clears the _XBF_DELWRI_Q flag on the buffer, which means when
submission of the buffer occurs, it just skips over it and releases
it.

IOWs, the error handling in xfs_iflush_cluster doesn't need to care
if the buffer is already on a the delwri queue or not - it just
needs to mark the buffer stale, EIO and run completions. That means
we can just use the easy fix for xfs_iflush() to avoid the double
abort.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-21 23:31:38 -07:00
Dave Chinner
23fcb3340d xfs: More robust inode extent count validation
When the inode is in extent format, it can't have more extents that
fit in the inode fork. We don't currenty check this, and so this
corruption goes unnoticed by the inode verifiers. This can lead to
crashes operating on invalid in-memory structures.

Attempts to access such a inode will now error out in the verifier
rather than allowing modification operations to proceed.

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix a typedef, add some braces and breaks to shut up compiler warnings]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-21 23:25:57 -07:00
Christoph Hellwig
e2ac836307 xfs: simplify xfs_bmap_punch_delalloc_range
Instead of using xfs_bmapi_read to find delalloc extents and then punch
them out using xfs_bunmapi, opencode the loop to iterate over the extents
and call xfs_bmap_del_extent_delay directly.  This both simplifies the
code and reduces the number of extent tree lookups required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-21 23:24:38 -07:00
Linus Torvalds
27db64f65f NFS client bugfixes for Linux 4.18
Hightlights include:
 
 Bugfixes:
 - Fix an rcu deadlock in nfs_delegation_find_inode()
 - Fix NFSv4 deadlocks due to not freeing the session slot in layoutget
 - Don't send layoutreturn if the layout is already invalid
 - Prevent duplicate XID allocation
 - flexfiles: Don't tie up all the rpciod threads in resends
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbK9avAAoJEA4mA3inWBJclq0P/1VCigyDlsbtdby3z2leV84k
 l0asrGjOQndljJ7I21awAgEo8KvXOd66cMv6YT3+UqEW18aNblH4/ngjyId6hPVb
 RZDX7tsG16ZHEqfe9f9irNZo90mdvuSC4ChJ/CbbPesaK9pblE1d76b/qUVr4FUX
 Gj7JPAC5ckoiZXPFfRWfc+o7JnGvs5wkEuDTy+ig6v7BRdL64hdPG3veRNpmLIAZ
 uS/NCyRpO+nFN/ukmvuoI2ZQ3qfHubHBD+rHxr1UKT/ad7dywLmL2UBaYQ0Tl3bq
 /iSQHutgJYj/80VaRTqdlLt/m4ebUZg+9BEZgM5MvqBWkXcpXND51zxExVJN4cGW
 BOytqjLz0gP1OGb8w+Oow58K8l4XyEgHe2CtZ6Yz8Vwof7nchkpv7RSX50hJFIcA
 YlikeDyDzfOmTT6ove5kF31WQSa3Bk6OMEei0of6hWU3UVHyEdr9az73pm/CLSHE
 /R7w0osU3B9tmQD4btQeJ2DxP+syQwhelOYodyVTwOlkmmGg7DSV7fehnGyH8t8f
 I4Yp8f0raiYGbwonYVE2+zDO140VRETEfTE4XQZnn41fZUfB74oIqk77JtgvGMk2
 /+XFNCYBGadHdSBdxyJmhSjhoAWrhgChEIz1G12SiHrNvqIRY/uHhdCX1Ut5vlPf
 5aqyn/yXm6rUH7aNh/Gd
 =tz0M
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Hightlights include:

   - fix an rcu deadlock in nfs_delegation_find_inode()

   - fix NFSv4 deadlocks due to not freeing the session slot in
     layoutget

   - don't send layoutreturn if the layout is already invalid

   - prevent duplicate XID allocation

   - flexfiles: Don't tie up all the rpciod threads in resends"

* tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS/flexfiles: Process writeback resends from nfsiod context as well
  pNFS/flexfiles: Don't tie up all the rpciod threads in resends
  sunrpc: Prevent duplicate XID allocation
  pNFS: Don't send layoutreturn if the layout is already invalid
  pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception
  NFS: Fix an rcu deadlock in nfs_delegation_find_inode()
2018-06-22 06:21:34 +09:00
Lu Fengqi
22883ddc66 btrfs: fix invalid-free in btrfs_extent_same
If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
		   (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
is hit, we will go to free the uninitialized cmp.src_pages and
cmp.dst_pages.

Fixes: 67b07bd4be ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-21 19:21:13 +02:00
Filipe Manana
f098631848 Btrfs: fix physical offset reported by fiemap for inline extents
Commit 9d311e11fc ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero") introduced a regression where we no longer
report 0 as the physical offset for inline extents (and other extents
with a special block_start value). This is because it always sets the
variable used to report the physical offset ("disko") as em->block_start
plus some offset, and em->block_start has the value 18446744073709551614
((u64) -2) for inline extents.

This made the btrfs test 004 (from fstests) often fail, for example, for
a file with an inline extent we have the following items in the subvolume
tree:

    item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
           generation 25 transid 38 size 1525 nbytes 1525
           block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
           sequence 0 flags 0x2(none)
           atime 1529342058.461891730 (2018-06-18 18:14:18)
           ctime 1529342058.461891730 (2018-06-18 18:14:18)
           mtime 1529342058.461891730 (2018-06-18 18:14:18)
           otime 1529342055.869892885 (2018-06-18 18:14:15)
    item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
           index 25 namelen 3 name: fc7
    item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
           generation 38 type 0 (inline)
           inline extent data size 1525 ram_bytes 1525 compression 0 (none)

Then when test 004 invoked fiemap against the file it got a non-zero
physical offset:

 $ filefrag -v /mnt/p0/d4/d7/fc7
 Filesystem type is: 9123683e
 File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..    4095: 18446744073709551614..      4093:   4096:             last,not_aligned,inline,eof
 /mnt/p0/d4/d7/fc7: 1 extent found

This resulted in the test failing like this:

btrfs/004 49s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
    --- tests/btrfs/004.out	2016-08-23 10:17:35.027012095 +0100
    +++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad	2018-06-18 18:15:02.385872155 +0100
    @@ -1,3 +1,10 @@
     QA output created by 004
     *** test backref walking
    -*** done
    +./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression expected
    +ERROR: 7.55578637259143e+22 is not a valid numeric value.
    +unexpected output from
    +	/home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -s 65536 -P 7.55578637259143e+22 /home/fdmanana/btrfs-tests/scratch_1
    ...
    (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see the entire diff)
Ran: btrfs/004

The large number in scientific notation reported as an invalid numeric
value is the result from the filter passed to perl which multiplies the
physical offset by the block size reported by fiemap.

So fix this by ensuring the physical offset is always set to 0 when we
are processing an extent with a special block_start value.

Fixes: 9d311e11fc ("Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-21 19:21:13 +02:00
Arnd Bergmann
ee9c7f9ae3 gfs2: call ktime_get_coarse_real_ts64() directly
current_kernel_time64() is now just a deprecated wrapper around
ktime_get_coarse_real_ts64(), so let's just call that directly.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-21 07:40:23 -05:00
Andreas Gruenbacher
00251a16d7 gfs2: Minor clarification to __gfs2_punch_hole
Rename end_off to end_len to make the code less confusing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-21 07:40:00 -05:00