linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Dave Chinner	5e96fa8d2b	xfs: factor iclog state processing out of xlog_state_do_callback() The iclog IO completion state processing is somewhat complex, and because it's inside two nested loops it is highly indented and very hard to read. Factor it out, flatten the logic flow and clean up the comments so that it much easier to see what the code is doing both in processing the individual iclogs and in the over xlog_state_do_callback() operation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-09-05 21:36:12 -07:00
Dave Chinner	6546818c85	xfs: factor callbacks out of xlog_state_do_callback() Simplify the code flow by lifting the iclog callback work out of the main iclog iteration loop. This isolates the log juggling and callbacks from the iclog state change logic in the loop. Note that the loopdidcallbacks variable is not actually tracking whether callbacks are actually run - it is tracking whether the icloglock was dropped during the loop and so determines if we completed the entire iclog scan loop atomically. Hence we know for certain there are either no more ordered completions to run or that the next completion will run the remaining ordered iclog completions. Hence rename that variable appropriately for it's function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-09-05 21:36:12 -07:00
Dave Chinner	6769aa2a4f	xfs: factor debug code out of xlog_state_do_callback() Start making this function readable by lifting the debug code into a conditional function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-09-05 21:36:12 -07:00
Dave Chinner	8ab39f11d9	xfs: prevent CIL push holdoff in log recovery generic/530 on a machine with enough ram and a non-preemptible kernel can run the AGI processing phase of log recovery enitrely out of cache. This means it never blocks on locks, never waits for IO and runs entirely through the unlinked lists until it either completes or blocks and hangs because it has run out of log space. It runs out of log space because the background CIL push is scheduled but never runs. queue_work() queues the CIL work on the current CPU that is busy, and the workqueue code will not run it on any other CPU. Hence if the unlinked list processing never yields the CPU voluntarily, the push work is delayed indefinitely. This results in the CIL aggregating changes until all the log space is consumed. When the log recoveyr processing evenutally blocks, the CIL flushes but because the last iclog isn't submitted for IO because it isn't full, the CIL flush never completes and nothing ever moves the log head forwards, or indeed inserts anything into the tail of the log, and hence nothing is able to get the log moving again and recovery hangs. There are several problems here, but the two obvious ones from the trace are that: a) log recovery does not yield the CPU for over 4 seconds, b) binding CIL pushes to a single CPU is a really bad idea. This patch addresses just these two aspects of the problem, and are suitable for backporting to work around any issues in older kernels. The more fundamental problem of preventing the CIL from consuming more than 50% of the log without committing will take more invasive and complex work, so will be done as followup work. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-09-05 21:36:12 -07:00
Rik van Riel	cdea5459ce	xfs: fix missed wakeup on l_flush_wait The code in xlog_wait uses the spinlock to make adding the task to the wait queue, and setting the task state to UNINTERRUPTIBLE atomic with respect to the waker. Doing the wakeup after releasing the spinlock opens up the following race condition: Task 1 task 2 add task to wait queue wake up task set task state to UNINTERRUPTIBLE This issue was found through code inspection as a result of kworkers being observed stuck in UNINTERRUPTIBLE state with an empty wait queue. It is rare and largely unreproducable. Simply moving the spin_unlock to after the wake_up_all results in the waker not being able to see a task on the waitqueue before it has set its state to UNINTERRUPTIBLE. This bug dates back to the conversion of this code to generic waitqueue infrastructure from a counting semaphore back in 2008 which didn't place the wakeups consistently w.r.t. to the relevant spin locks. [dchinner: Also fix a similar issue in the shutdown path on xc_commit_wait. Update commit log with more details of the issue.] Fixes: `d748c62367` ("[XFS] Convert l_flushsema to a sv_t") Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-09-05 21:36:12 -07:00
Dave Chinner	7c107afb87	xfs: push the AIL in xlog_grant_head_wake In the situation where the log is full and the CIL has not recently flushed, the AIL push threshold is throttled back to the where the last write of the head of the log was completed. This is stored in log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space pinned by flushes and/or aggregation in progress, we can get the situation where the head of the log lags a long way behind the reservation grant head. When this happens, the AIL push target is trimmed back from where the reservation grant head wants to push the log tail to, back to where the head of the log currently is. This means the push target doesn't reach far enough into the log to actually move the tail before the transaction reservation goes to sleep. When the CIL push completes, it moves the log head forward such that the AIL push target can now be moved, but that has no mechanism for puhsing the log tail. Further, if the next tail movement of the log is not large enough wake the waiter (i.e. still not enough space for it to have a reservation granted), we don't wake anything up, and hence we do not update the AIL push target to take into account the head of the log moving and allowing the push target to be moved forwards. To avoid this particular condition, if we fail to wake the first waiter on the grant head because we don't have enough space, push on the AIL again. This will pick up any movement of the log head and allow the push target to move forward due to completion of CIL pushing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-09-05 21:36:12 -07:00
Austin Kim	eb2e99943c	xfs: Use WARN_ON_ONCE for bailout mount-operation If the CONFIG_BUG is enabled, BUG is executed and then system is crashed. However, the bailout for mount is no longer proceeding. Using WARN_ON_ONCE rather than BUG can prevent this situation. Signed-off-by: Austin Kim <austindh.kim@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-09-05 21:36:12 -07:00
Al Viro	df02450217	make ramfs_fill_super() static all users should just call ramfs_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:27 -04:00
David Howells	5a2be1288b	vfs: Convert squashfs to use the new mount API Convert the squashfs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: Phillip Lougher <phillip@squashfs.org.uk> cc: squashfs-devel@lists.sourceforge.net Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:26 -04:00
David Howells	ec10a24f10	vfs: Convert jffs2 to use the new mount API Convert the jffs2 filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: David Woodhouse <dwmw2@infradead.org> cc: linux-mtd@lists.infradead.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:25 -04:00
David Howells	74f78fc5ef	vfs: Convert cramfs to use the new mount API Convert the cramfs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Nicolas Pitre <nico@fluxnic.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> cc: linux-mtd@lists.infradead.org cc: linux-block@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:24 -04:00
David Howells	b941759985	vfs: Convert romfs to use the new mount API Convert the romfs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-mtd@lists.infradead.org cc: linux-block@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:24 -04:00
David Howells	43ce4c1fea	vfs: Add a single-or-reconfig keying to vfs_get_super() Add an additional keying mode to vfs_get_super() to indicate that only a single superblock should exist in the system, and that, if it does, further mounts should invoke reconfiguration upon it. This allows mount_single() to be replaced. [Fix by Eric Biggers folded in] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:23 -04:00
David Howells	fe62c3a4e1	vfs: Create fs_context-aware mount_bdev() replacement Create a function, get_tree_bdev(), that is fs_context-aware and a ->get_tree() counterpart of mount_bdev(). It caches the block device pointer in the fs_context struct so that this information can be passed into sget_fc()'s test and set functions. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: linux-block@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:22 -04:00
Al Viro	533770cc0a	new helper: get_tree_keyed() For vfs_get_keyed_super users. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:22 -04:00
Eric Biggers	1dd9bc08cf	vfs: set fs_context::user_ns for reconfigure fs_context::user_ns is used by fuse_parse_param(), even during remount, so it needs to be set to the existing value for reconfigure. Reproducer: #include <fcntl.h> #include <sys/mount.h> int main() { char opts[128]; int fd = open("/dev/fuse", O_RDWR); sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd); mkdir("mnt", 0777); mount("foo", "mnt", "fuse.foo", 0, opts); mount("foo", "mnt", "fuse.foo", MS_REMOUNT, opts); } Crash: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 129 Comm: syz_make_kuid Not tainted 5.3.0-rc5-next-20190821 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 RIP: 0010:map_id_range_down+0xb/0xc0 kernel/user_namespace.c:291 [...] Call Trace: map_id_down kernel/user_namespace.c:312 [inline] make_kuid+0xe/0x10 kernel/user_namespace.c:389 fuse_parse_param+0x116/0x210 fs/fuse/inode.c:523 vfs_parse_fs_param+0xdb/0x1b0 fs/fs_context.c:145 vfs_parse_fs_string+0x6a/0xa0 fs/fs_context.c:188 generic_parse_monolithic+0x85/0xc0 fs/fs_context.c:228 parse_monolithic_mount_data+0x1b/0x20 fs/fs_context.c:708 do_remount fs/namespace.c:2525 [inline] do_mount+0x39a/0xa60 fs/namespace.c:3107 ksys_mount+0x7d/0xd0 fs/namespace.c:3325 __do_sys_mount fs/namespace.c:3339 [inline] __se_sys_mount fs/namespace.c:3336 [inline] __x64_sys_mount+0x20/0x30 fs/namespace.c:3336 do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: syzbot+7d6a57304857423318a5@syzkaller.appspotmail.com Fixes: 408cbe695350 ("vfs: Convert fuse to use the new mount API") Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:33:45 -04:00
Gao Xiang	618f40ea02	erofs: use read_cache_page_gfp for erofs_get_meta_page As Christoph said [1], "I'd much prefer to just use read_cache_page_gfp, and live with the fact that this allocates bufferheads behind you for now. I'll try to speed up my attempts to get rid of the buffer heads on the block device mapping instead. " This simplifies the code a lot and a minor thing is "no REQ_META (e.g. for blktrace) on metadata at all..." [1] https://lore.kernel.org/r/20190903153704.GA2201@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-26-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:09 +02:00
Gao Xiang	4231138fe0	erofs: always use iget5_locked As Christoph said [1] [2], "Just use the slightly more complicated 32-bit version everywhere so that you have a single actually tested code path. And then remove this helper. " [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ [2] https://lore.kernel.org/r/20190902125320.GA16726@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-25-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:09 +02:00
Gao Xiang	fe7c242357	erofs: use read_mapping_page instead of sb_bread As Christoph said [1], "This seems to be your only direct use of buffer heads, which while not deprecated are a bit of an ugly step child. So if you can easily avoid creating a buffer_head dependency in a new filesystem I think you should avoid it. " [1] https://lore.kernel.org/r/20190902125109.GA9826@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-24-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:09 +02:00
Gao Xiang	4f761fa253	erofs: rename errln/infoln/debugln to erofs_{err, info, dbg} Add prefix "erofs_" to these functions and print sb->s_id as a prefix to erofs_{err, info} so that the user knows which file system is affected. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-23-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:09 +02:00
Gao Xiang	84947eb603	erofs: save one level of indentation As Christoph said [1], ".. and save one level of indentation." [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-22-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:09 +02:00
Gao Xiang	73d03931be	erofs: kill use_vmap module parameter As Christoph said [1], "vm_map_ram is supposed to generally behave better. So if it doesn't please report that that to the arch maintainer and linux-mm so that they can look into the issue. Having user make choices of deep down kernel internals is just a horrible interface. Please talk to maintainers of other bits of the kernel if you see issues and / or need enhancements. " Let's redo the previous conclusion and kill the vmap approach. [1] https://lore.kernel.org/r/20190830165533.GA10909@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-21-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:09 +02:00
Gao Xiang	e2c71e74b2	erofs: kill all erofs specific fault injection As Christoph suggested [1], "Please just use plain kmalloc everywhere and let the normal kernel error injection code take care of injeting any errors." [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-20-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:08 +02:00
Gao Xiang	99634bf388	erofs: add "erofs_" prefix for common and short functions Add erofs_ prefix to free_inode, alloc_inode, ... Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-19-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:08 +02:00
Gao Xiang	94e4e153b1	erofs: kill __submit_bio() As Christoph pointed out [1], " Why is there __submit_bio which really just obsfucates what is going on? Also why is __submit_bio using bio_set_op_attrs instead of opencode it as the comment right next to it asks you to? " Let's use submit_bio directly instead. [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-18-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:08 +02:00
Gao Xiang	e655b5b3a2	erofs: kill prio and nofail of erofs_get_meta_page() As Christoph pointed out [1], "Why is there __erofs_get_meta_page with the two weird booleans instead of a single erofs_get_meta_page that gets and gfp_t for additional flags and an unsigned int for additional bio op flags." And since all callers can handle errors, let's kill prio and nofail and erofs_get_inline_page() now. [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-17-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:08 +02:00
Gao Xiang	a5c0b7802c	erofs: localize erofs_grab_bio() As Christoph pointed out [1], "erofs_grab_bio tries to handle a bio_alloc failure, except that the function will not actually fail due the mempool backing it." Sorry about useless code, fix it now and localize erofs_grab_bio [2]. [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/ [2] https://lore.kernel.org/r/20190902122016.GL15931@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-16-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:08 +02:00
Gao Xiang	688a5f2ed4	erofs: kill verbose debug info in erofs_fill_super As Christoph said [1], "That is some very verbose debug info. We usually don't add that and let people trace the function instead. " [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-15-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:08 +02:00
Gao Xiang	0259f20948	erofs: use dsb instead of layout for ondisk super_block As Christoph pointed out [1], "Why is the variable name for the on-disk subperblock layout? We usually still calls this something with sb in the name, e.g. dsb. for disksuper block. " Let's fix it. [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-14-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:08 +02:00
Gao Xiang	a2c75c8143	erofs: better erofs symlink stuffs Fix as Christoph suggested [1] [2], "remove is_inode_fast_symlink and just opencode it in the few places using it" and "Please just set the ops directly instead of obsfucating that in a single caller, single line inline function. And please set it instead of the normal symlink iops in the same place where you also set those." [1] https://lore.kernel.org/r/20190830163910.GB29603@infradead.org/ [2] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-13-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	2d78c209b9	erofs: update comments in inode.c As Christoph suggested [1], update them all. [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-12-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	ea559e7b84	erofs: update erofs_fs.h comments As Christoph said [1] [2], update it now. [1] https://lore.kernel.org/r/20190902124521.GA22153@infradead.org/ [2] https://lore.kernel.org/r/20190902120548.GB15931@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-11-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	a5876e24f1	erofs: use erofs_inode naming As Christoph suggested [1], "Why is this called vnode instead of inode? That seems like a rather odd naming for a Linux file system." [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-10-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	1c2dfbf9c2	erofs: kill erofs_{init,exit}_inode_cache As Christoph said [1] "having this function seems entirely pointless", let's kill those. filesystem function name ext2,f2fs,ext4,isofs,squashfs,cifs,... init_inodecache In addition, add a necessary "rcu_barrier()" on exit_fs(); [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-9-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	8a76568225	erofs: better naming for erofs inode related stuffs updates inode naming - kill is_inode_layout_compression [1] - kill magic underscores [2] [3] - better naming for datamode & data_mapping_mode [3] - better naming erofs_inode_{compact, extended} [4] [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ [2] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ [3] https://lore.kernel.org/r/20190902122627.GN15931@infradead.org/ [4] https://lore.kernel.org/r/20190902125438.GA17750@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-8-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	426a930891	erofs: use feature_incompat rather than requirements As Christoph said [1], "This is only cosmetic, why not stick to feature_compat and feature_incompat?" In my thought, requirements means "incompatible" instead of "feature" though. [1] https://lore.kernel.org/r/20190902125109.GA9826@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-7-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	c39747f770	erofs: update erofs_inode_is_data_compressed helper As Christoph said, "This looks like a really obsfucated way to write: return datamode == EROFS_INODE_FLAT_COMPRESSION \|\| datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY; " Although I had my own consideration, it's the right way for now. [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-6-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	ed34aa4a8a	erofs: kill __packed for on-disk structures As Christoph suggested "Please don't add __packed" [1], remove all __packed except struct erofs_dirent here. Note that all on-disk fields except struct erofs_dirent (12 bytes with a 8-byte nid) in EROFS are naturally aligned. [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-5-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	b6796abd3c	erofs: some macros are much more readable as a function As Christoph suggested [1], these macros are much more readable as a function. [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-4-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:07 +02:00
Gao Xiang	60a49ba8fe	erofs: on-disk format should have explicitly assigned numbers As Christoph suggested [1], on-disk format should have explicitly assigned numbers. [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-3-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:06 +02:00
Gao Xiang	4b66eb51d2	erofs: remove all the byte offset comments As Christoph suggested [1], "Please remove all the byte offset comments. that is something that can easily be checked with gdb or pahole." [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-2-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-09-05 20:10:06 +02:00
Deepa Dinamani	cba465b4f9	ext4: Reduce ext4 timestamp warnings When ext4 file systems were created intentionally with 128 byte inodes, the rate-limited warning of eventual possible timestamp overflow are still emitted rather frequently. Remove the warning for now. Discussion for whether any warning is needed, and where it should be emitted, can be found at https://lore.kernel.org/lkml/1567523922.5576.57.camel@lca.pw/. I can post a separate follow-up patch after the conclusion. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2019-09-04 22:54:53 +02:00
Al Viro	b0841eefd9	configfs: provide exclusion between IO and removals Make sure that attribute methods are not called after the item has been removed from the tree. To do so, we * at the point of no return in removals, grab ->frag_sem exclusive and mark the fragment dead. * call the methods of attributes with ->frag_sem taken shared and only after having verified that the fragment is still alive. The main benefit is for method instances - they are guaranteed that the objects they are accessing and all ancestors are still there. Another win is that we don't need to bother with extra refcount on config_item when opening a file - the item will be alive for as long as it stays in the tree, and we won't touch it/attributes/any associated data after it's been removed from the tree. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-09-04 22:33:51 +02:00
Bob Peterson	ad26967b9a	gfs2: Use async glocks for rename Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can reverse the roles of which directories are "old" and which are "new" for the purposes of rename. This can cause deadlocks where two nodes end up waiting for each other. There can be several layers of directory dependencies across many nodes. This patch fixes the problem by acquiring all gfs2_rename's inode glocks asychronously and waiting for all glocks to be acquired. That way all inodes are locked regardless of the order. The timeout value for multiple asynchronous glocks is calculated to be the total of the individual wait times for each glock times two. Since gfs2_exchange is very similar to gfs2_rename, both functions are patched in the same way. A new async glock wait queue, sd_async_glock_wait, keeps a list of waiters for these events. If gfs2's holder_wake function detects an async holder, it wakes up any waiters for the event. The waiter only tests whether any of its requests are still pending. Since the glocks are sent to dlm asychronously, the wait function needs to check to see which glocks, if any, were granted. If a glock is granted by dlm (and therefore held), its minimum hold time is checked and adjusted as necessary, as other glock grants do. If the event times out, all glocks held thus far must be dequeued to resolve any existing deadlocks. Then, if there are any outstanding locking requests, we need to loop around and wait for dlm to respond to those requests too. After we release all requests, we return -ESTALE to the caller (vfs rename) which loops around and retries the request. Node1 Node2 --------- --------- 1. Enqueue A Enqueue B 2. Enqueue B Enqueue A 3. A granted 6. B granted 7. Wait for B 8. Wait for A 9. A times out (since Node 1 holds A) 10. Dequeue B (since it was granted) 11. Wait for all requests from DLM 12. B Granted (since Node2 released it in step 10) 13. Rename 14. Dequeue A 15. DLM Grants A 16. Dequeue A (due to the timeout and since we no longer have B held for our task). 17. Dequeue B 18. Return -ESTALE to vfs 19. VFS retries the operation, goto step 1. This release-all-locks / acquire-all-locks may slow rename / exchange down as both nodes struggle in the same way and do the same thing. However, this will only happen when there is contention for the same inodes, which ought to be rare. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2019-09-04 20:22:17 +02:00
Andreas Gruenbacher	01123cf17c	gfs2: create function gfs2_glock_update_hold_time This patch moves the code that updates glock minimum hold time to a separate function. This will be called by a future patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2019-09-04 20:22:17 +02:00
Bob Peterson	bc74aaefdd	gfs2: separate holder for rgrps in gfs2_rename Before this patch, gfs2_rename added a holder for the rgrp glock to its array of holders, ghs. There's nothing wrong with that, but this patch separates it into a separate holder. This is done to ensure it's always locked last as per the proper glock lock ordering, and also to pave the way for a future patch in which we will lock the non-rgrp glocks asynchronously. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2019-09-04 20:22:17 +02:00
Markus Elfring	bccaef9073	gfs2: Delete an unnecessary check before brelse() The brelse() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. [The same applies to brelse() in gfs2_dir_no_add (which Coccinelle apparently missed), so fix that as well.] Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2019-09-04 20:22:17 +02:00
Andreas Gruenbacher	45eb05042d	gfs2: Minor PAGE_SIZE arithmetic cleanups Replace divisions by PAGE_SIZE with shifts by PAGE_SHIFT and similar. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2019-09-04 20:22:06 +02:00
Markus Elfring	4eb09e1112	fs-udf: Delete an unnecessary check before brelse() The brelse() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://lore.kernel.org/r/a254c1d1-0109-ab51-c67a-edc5c1c4b4cd@web.de Signed-off-by: Jan Kara <jack@suse.cz>	2019-09-04 18:19:43 +02:00
Markus Elfring	18c2433cb8	ext2: Delete an unnecessary check before brelse() The brelse() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://lore.kernel.org/r/51dea296-2207-ebc0-bac3-13f3e5c3b235@web.de Signed-off-by: Jan Kara <jack@suse.cz>	2019-09-04 18:19:43 +02:00

... 10 11 12 13 14 ...

60722 commits