linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Jeff Layton	2d4a532d38	nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock Currently, both destroy_revoked_delegation and revoke_delegation manipulate the cl_revoked list without any locking aside from the client_mutex. Ensure that the clp->cl_lock is held when manipulating it, except for the list walking in destroy_client. At that point, the client should no longer be in use, and so it should be safe to walk the list without any locking. That also means that we don't need to do the list_splice_init there either. Also, the fact that revoke_delegation deletes dl_recall_lru list_head without any locking makes it difficult to know whether it's doing so safely in all cases. Move the list_del_init calls into the callers, and add a WARN_ON in the event that t's passed a delegation that has a non-empty list_head. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:53 -04:00
Jeff Layton	4269067696	nfsd: fully unhash delegations when revoking them Ensure that the delegations cannot be found by the laundromat etc once we add them to the various 'revoke' lists. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:52 -04:00
Trond Myklebust	f83388341b	nfsd: simplify stateid allocation and file handling Don't allow stateids to clear the open file pointer until they are being destroyed. In a later patches we'll want to rely on the fact that we have a valid file pointer when dealing with the stateid and this will save us from having to do a lot of NULL pointer checks before doing so. Also, move to allocating stateids with kzalloc and get rid of the explicit zeroing of fields. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:51 -04:00
David Howells	0ef1351523	AFS: Correctly assemble the client UUID Correctly assemble the client UUID by OR'ing in the flags rather than assigning them over the other components. Reported-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-29 10:14:36 -07:00
Jaegeuk Kim	fff04f90c1	f2fs: add info of appended or updated data writes This patch introduces a inode number list in which represents inodes having appended data writes or updated data writes after last checkpoint. This will be used at fsync to determine whether the recovery information should be written or not. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	39efac41fb	f2fs: use radix_tree for ino management For better ino management, this patch replaces the data structure from list to radix tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	6451e041c8	f2fs: add infra for ino management This patch changes the naming of orphan-related data structures to use as inode numbers managed globally. Later, we can use this facility for managing any inode number lists. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:45:54 -07:00
Namjae Jeon	ee98fa3a8b	ext4: fix COLLAPSE RANGE test for bigalloc file systems Blocks in collapse range should be collapsed per cluster unit when bigalloc is enable. If bigalloc is not enable, EXT4_CLUSTER_SIZE will be same with EXT4_BLOCK_SIZE. With this bug fixed, patch enables COLLAPSE_RANGE for bigalloc, which fixes a large number of xfstest failures which use fsx. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-07-29 10:45:31 -04:00
Jaegeuk Kim	953e6cc6bc	f2fs: punch the core function for inode management This patch punches out the core functions to manage the inode numbers. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:40:25 -07:00
Jaegeuk Kim	0f7b2abd18	f2fs: add nobarrier mount option This patch adds a mount option, nobarrier, in f2fs. The assumption in here is that file system keeps the IO ordering, but doesn't care about cache flushes inside the storages. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 05:27:48 -07:00
David S. Miller	3fd0202a0d	Merge tag 'master-2014-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-07-25 Please pull this batch of updates intended for the 3.17 stream! For the mac80211 bits, Johannes says: "We have a lot of TDLS patches, among them a fix that should make hwsim tests happy again. The rest, this time, is mostly small fixes." For the Bluetooth bits, Gustavo says: "Some more patches for 3.17. The most important change here is the move of the 6lowpan code to net/6lowpan. It has been agreed with Davem that this change will go through the bluetooth tree. The rest are mostly clean up and fixes." and, "Here follows some more patches for 3.17. These are mostly fixes to what we've sent to you before for next merge window." For the iwlwifi bits, Emmanuel says: "I have the usual amount of BT Coex stuff. Arik continues to work on TDLS and Ariej contributes a few things for HS2.0. I added a few more things to the firmware debugging infrastructure. Eran fixes a small bug - pretty normal content." And for the Atheros bits, Kalle says: "For ath6kl me and Jessica added support for ar6004 hw3.0, our latest version of ar6004. For ath10k Janusz added a printout so that it's easier to check what ath10k kconfig options are enabled. He also added a debugfs file to configure maximum amsdu and ampdu values. Also we had few fixes as usual." On top of that is the usual large batch of various driver updates -- brcmfmac, mwifiex, the TI drivers, and wil6210 all get some action. Rafał has also been very busy with b43 and related updates. Also, I pulled the wireless tree into this in order to resolve a merge conflict... P.S. The change to fs/compat_ioctl.c reflects a name change in a Bluetooth header file... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 17:36:25 -07:00
Darrick J. Wong	40b163f1c4	ext4: check inline directory before converting Before converting an inline directory to a regular directory, check the directory entries to make sure they're not obviously broken. This helps us to avoid a BUG_ON if one of the dirents is trashed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2014-07-28 13:06:26 -04:00
Artem Bityutskiy	6390e99177	Revert "UBIFS: add a log overlap assertion" This reverts commit `545f7fdf6d`. Hujianyang's testing revealed that the patch is bogus. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-28 19:15:19 +03:00
Yan, Zheng	06fee30f6a	ceph: fix append mode write generic_write_checks() may update 'pos', so we need to pass 'pos' to ceph_sync_write() and ceph_sync_direct_write(); Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-28 13:29:33 +04:00
Ilya Dryomov	7e8a295295	ceph: fix sizeof(struct tYpO *) typo struct ceph_xattr -> struct ceph_inode_xattr Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-28 13:29:27 +04:00
Ilya Dryomov	1a295bd8c8	ceph: remove redundant memset(0) xattrs array of pointers is allocated with kcalloc() - no need to memset() it to 0 right after that. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-28 13:28:33 +04:00
Ingo Molnar	ca5bc6cd5d	Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-28 10:03:00 +02:00
Dmitry Monakhov	6e2631463f	ext4: fix incorrect locking in move_extent_per_page If we have to copy data we must drop i_data_sem because of get_blocks() will be called inside mext_page_mkuptodate(), but later we must reacquire it again because we are about to change extent's tree Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:32:27 -04:00
Dmitry Monakhov	29faed1638	ext4: use correct depth value Inode's depth can be changed from here: ext4_ext_try_to_merge() ->ext4_ext_try_to_merge_up() We must use correct value. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:30:29 -04:00
Dmitry Monakhov	4b1f166071	ext4: add i_data_sem sanity check Each caller of ext4_ext_dirty must hold i_data_sem, The only exception is migration code, let's make it convenient. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:28:15 -04:00
Xiaoguang Wang	b27b1535ac	ext4: fix wrong size computation in ext4_mb_normalize_request() As the member fe_len defined in struct ext4_free_extent is expressed as number of clusters, the variable "size" computation is wrong, we need to first translate fe_len to block number, then to bytes. Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>	2014-07-27 22:26:36 -04:00
Linus Torvalds	fbf08efa04	Merge branch 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs Pull vfs fixes from Christoph Hellwig: "A vfsmount leak fix, and a compile warning fix" * 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs: fs: umount on symlink leaks mnt count direct-io: fix uninitialized warning in do_direct_IO()	2014-07-27 09:43:41 -07:00
Linus Torvalds	0246544fc9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "These two pathes fix issues with the kernel-userspace protocol changes in v3.15" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT fuse: s_time_gran fix	2014-07-25 16:16:34 -07:00
Chao Yu	9d84795077	f2fs: fix to put root inode in error path of fill_super We should put root inode correctly in error path of fill_super, otherwise we may encounter a leak case of inode resource. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:19:57 -07:00
Chao Yu	dbf20cb259	f2fs: avoid use invalid mapping of node_inode when evict meta inode Andrey Tsyvarev reported: "Using memory error detector reveals the following use-after-free error in 3.15.0: AddressSanitizer: heap-use-after-free in f2fs_evict_inode Read of size 8 by thread T22279: [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs] [<ffffffff812359af>] evict+0x15f/0x290 [< inlined >] iput+0x196/0x280 iput_final [<ffffffff812369a6>] iput+0x196/0x280 [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs] [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0 [<ffffffff812105fd>] kill_block_super+0x4d/0xb0 [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80 [<ffffffff81211c98>] deactivate_super+0x68/0x80 [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250 [< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0 [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b Freed by thread T3: [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs] [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim [< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch [< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930 [<ffffffff8107cce2>] __do_softirq+0x142/0x380 [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50 [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280 [<ffffffff810a8238>] kthread+0x148/0x160 [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0 Allocated by thread T22276: [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs] [<ffffffff81235e2a>] iget_locked+0x10a/0x230 [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs] [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs] [<ffffffff81211bce>] mount_bdev+0x1de/0x240 [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs] [<ffffffff81212a85>] mount_fs+0x55/0x220 [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200 [< inlined >] do_mount+0x2b4/0x1120 do_new_mount [<ffffffff812400d4>] do_mount+0x2b4/0x1120 [< inlined >] SyS_mount+0xb2/0x110 SYSC_mount [<ffffffff812414a2>] SyS_mount+0xb2/0x110 [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b The buggy address ffff8800587866c8 is located 48 bytes inside of 680-byte region [ffff880058786698, ffff880058786940) Memory state around the buggy address: ffff880058786100: ffffffff ffffffff ffffffff ffffffff ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff ffff880058786400: ffffffff ffffffff ffffffff ffffffff ffff880058786500: ffffffff ffffffff ffffffff fffffffr >ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff ^ ffff880058786700: ffffffff ffffffff ffffffff ffffffff ffff880058786800: ffffffff ffffffff ffffffff ffffffff ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr.... ffff880058786a00: ........ ........ ........ ........ ffff880058786b00: ........ ........ ........ ........ Legend: f - 8 freed bytes r - 8 redzone bytes . - 8 allocated bytes x=1..7 - x allocated bytes + (8-x) redzone bytes Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned." Nids for both meta_inode and node_inode are reservation, so it's not necessary for us to invalidate pages which will never be allocated. To fix this issue, let's skipping needlessly invalidating pages for {meta,node}_inode in f2fs_evict_inode. Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:19:55 -07:00
Chao Yu	32f9bc25cb	f2fs: support ->rename2() Now new interface ->rename2() is added to VFS, here are related description: https://lkml.org/lkml/2014/2/7/873 https://lkml.org/lkml/2014/2/7/758 This patch adds function f2fs_rename2() to support ->rename2() including handling both RENAME_EXCHANGE and RENAME_NOREPLACE flag. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:14:08 -07:00
Huang Ying	79e35dc3c2	f2fs: add f2fs_balance_fs for direct IO Otherwise, if a large amount of direct IO writes were done, the segment allocation may be failed because no enough segments are gced. Changes: v2: add f2fs_balance_fs into __get_data_block instead of f2fs_direct_IO. Signed-off-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:13:58 -07:00
Gu Zheng	00fefb9cf2	aio: use iovec array rather than the single one Previously, we only offer a single iovec to handle all the read/write cases, so the PREADV/PWRITEV request always need to alloc more iovec buffer when copying user vectors. If we use a tmp iovec array rather than the single one, some small PREADV/PWRITEV workloads(vector size small than the tmp buffer) will not need to alloc more iovec buffer when copying user vectors. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	2be4e7deec	aio: fix some comments The function comments of aio_run_iocb and aio_read_events are out of date, so fix them here. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	8dc4379e17	aio: use the macro rather than the inline magic number Replace the inline magic number with the ready-made macro(AIO_RING_MAGIC), just clean up. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	b53f1f82fb	aio: remove the needless registration of ring file's private_data Remove the registration of ring file's private_data, we do not use it. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Eric Paris	7d8b6c6375	CAPABILITIES: remove undefined caps from all processes This is effectively a revert of `7b9a7ec565` plus fixing it a different way... We found, when trying to run an application from an application which had dropped privs that the kernel does security checks on undefined capability bits. This was ESPECIALLY difficult to debug as those undefined bits are hidden from /proc/$PID/status. Consider a root application which drops all capabilities from ALL 4 capability sets. We assume, since the application is going to set eff/perm/inh from an array that it will clear not only the defined caps less than CAP_LAST_CAP, but also the higher 28ish bits which are undefined future capabilities. The BSET gets cleared differently. Instead it is cleared one bit at a time. The problem here is that in security/commoncap.c::cap_task_prctl() we actually check the validity of a capability being read. So any task which attempts to 'read all things set in bset' followed by 'unset all things set in bset' will not even attempt to unset the undefined bits higher than CAP_LAST_CAP. So the 'parent' will look something like: CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffc000000000 All of this 'should' be fine. Given that these are undefined bits that aren't supposed to have anything to do with permissions. But they do... So lets now consider a task which cleared the eff/perm/inh completely and cleared all of the valid caps in the bset (but not the invalid caps it couldn't read out of the kernel). We know that this is exactly what the libcap-ng library does and what the go capabilities library does. They both leave you in that above situation if you try to clear all of you capapabilities from all 4 sets. If that root task calls execve() the child task will pick up all caps not blocked by the bset. The bset however does not block bits higher than CAP_LAST_CAP. So now the child task has bits in eff which are not in the parent. These are 'meaningless' undefined bits, but still bits which the parent doesn't have. The problem is now in cred_cap_issubset() (or any operation which does a subset test) as the child, while a subset for valid cap bits, is not a subset for invalid cap bits! So now we set durring commit creds that the child is not dumpable. Given it is 'more priv' than its parent. It also means the parent cannot ptrace the child and other stupidity. The solution here: 1) stop hiding capability bits in status This makes debugging easier! 2) stop giving any task undefined capability bits. it's simple, it you don't put those invalid bits in CAP_FULL_SET you won't get them in init and you won't get them in any other task either. This fixes the cap_issubset() tests and resulting fallout (which made the init task in a docker container untraceable among other things) 3) mask out undefined bits when sys_capset() is called as it might use ~0, ~0 to denote 'all capabilities' for backward/forward compatibility. This lets 'capsh --caps="all=eip" -- -c /bin/bash' run. 4) mask out undefined bit when we read a file capability off of disk as again likely all bits are set in the xattr for forward/backward compatibility. This lets 'setcap all+pe /bin/bash; /bin/bash' run Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Dan Walsh <dwalsh@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>	2014-07-24 21:53:47 +10:00
James Morris	4ca332e11d	Merge tag 'keys-next-20140722' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next	2014-07-24 21:36:19 +10:00
Jie Liu	74dc93a908	xfs: fix uflags detection at xfs_fs_rm_xquota We are intended to check up uflags against FS_PROJ_QUOTA rather than FS_USER_UQUOTA once more, it looks to me like a typo, but might cause the project quota metadata space can not be removed. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 21:27:17 +10:00
Jie Liu	7b409a7d6f	xfs: remove XFS_IS_OQUOTA_ON macros Remove the XFS_IS_OQUOTA_ON macros as it is obsoleted. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 21:27:16 +10:00
Eric Sandeen	54aa61f82d	xfs: tidy up xfs_set_inode32 xfs_set_inode32() caught my eye because it had weird spacing around the "-1's". In cleaning that up, I realized that the assignment in the declaration of "ino" is never used; it's rewritten before it gets read. Drop the ino initializer from its declaration since it's not used, and move the agino initialization into the body of the function, mostly so that we can have pretty whitespace and not exceed 80 columns. :) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:53:10 +10:00
Eric Sandeen	9de67c3ba9	xfs: allow inode allocations in post-growfs disk space Today, if we perform an xfs_growfs which adds allocation groups, mp->m_maxagi is not properly updated when the growfs is complete. Therefore inodes will continue to be allocated only in the AGs which existed prior to the growfs, and the new space won't be utilized. This is because of this path in xfs_growfs_data_private(): xfs_growfs_data_private xfs_initialize_perag(mp, nagcount, &nagimax); if (mp->m_flags & XFS_MOUNT_32BITINODES) index = xfs_set_inode32(mp); else index = xfs_set_inode64(mp); if (maxagi) maxagi = index; where xfs_set_inode iterates over the (old) agcount in mp->m_sb.sb_agblocks, which has not yet been updated in the growfs path. So "index" will be returned based on the old agcount, not the new one, and new AGs are not available for inode allocation. Fix this by explicitly passing the proper AG count (which xfs_initialize_perag() already has) down another level, so that xfs_set_inode* can make the proper decision about acceptable AGs for inode allocation in the potentially newly-added AGs. This has been broken since 3.7, when these two xfs_set_inode* functions were added in commit `2d2194f`. Prior to that, we looped over "agcount" not sb_agblocks in these calculations. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:51:54 +10:00
Jie Liu	eb866bbf09	xfs: mark xfs_qm_quotacheck as static xfs_qm_quotacheck() is not used outside of xfs_qm.c. Mark it static and move it around in the file to avoid a forward declaration. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:49:57 +10:00
Mark Tinguely	5c18717ea2	xfs: fix cil push sequence after log recovery When the CIL checkpoint is fully written to the log, the LSN of the checkpoint commit record is written into the CIL context structure. This allows log force waiters to correctly detect when the checkpoint they are waiting on have been fully written into the log buffers. However, the initial context after mount is initialised with a non-zero commit LSN, so appears to waiters as though it is complete even though it may not have even been pushed, let alone written to the log buffers. Hence a log force immediately after a filesystem is mounted may not behave correctly, nor does commit record ordering if multiple CIL pushes interleave immediately after mount. To fix this, make sure the initial context commit LSN is not touched until the first checkpointis actually pushed. [dchinner: rewrite commit message] Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:49:40 +10:00
Vasily Averin	295dc39d94	fs: umount on symlink leaks mnt count Currently umount on symlink blocks following umount: /vz is separate mount # ls /vz/ -al \| grep test drwxr-xr-x. 2 root root 4096 Jul 19 01:14 testdir lrwxrwxrwx. 1 root root 11 Jul 19 01:16 testlink -> /vz/testdir # umount -l /vz/testlink umount: /vz/testlink: not mounted (expected) # lsof /vz # umount /vz umount: /vz: device is busy. (unexpected) In this case mountpoint_last() gets an extra refcount on path->mnt Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Ian Kent <raven@themaw.net> Acked-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-24 06:18:12 -04:00
Boaz Harrosh	6fcc5420bf	direct-io: fix uninitialized warning in do_direct_IO() The following warnings: fs/direct-io.c: In function ‘__blockdev_direct_IO’: fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized] fs/direct-io.c:913:16: note: ‘to’ was declared here fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized] fs/direct-io.c:913:10: note: ‘from’ was declared here are false positive because dio_get_page() either fails, or sets both 'from' and 'to'. Paul Bolle said ... Maybe it's better to move initializing "to" and "from" out of dio_get_page(). That _might_ make it easier for both the the reader and the compiler to understand what's going on. Something like this: Christoph Hellwig said ... The fix of moving the code definitively looks nicer, while I think uninitialized_var is horrible wart that won't get anywhere near my code. Boaz Harrosh: I agree with Christoph and Paul Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-24 06:17:07 -04:00
Brian Foster	f074051ff5	xfs: squash prealloc while over quota free space as well From: Brian Foster <bfoster@redhat.com> Commit `4d559a3b` introduced heavy prealloc. squashing to catch the case of requesting too large a prealloc on smaller filesystems, leading to repeated flush and retry cycles that occur on ENOSPC. Now that we issue eofblocks scans on EDQUOT/ENOSPC, squash the prealloc against the minimum available free space across all applicable quotas as well to avoid a similar problem of repeated eofblocks scans. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:56:08 +10:00
Brian Foster	dc06f398f0	xfs: run an eofblocks scan on ENOSPC/EDQUOT From: Brian Foster <bfoster@redhat.com> Speculative preallocation and and the associated throttling metrics assume we're working with large files on large filesystems. Users have reported inefficiencies in these mechanisms when we happen to be dealing with large files on smaller filesystems. This can occur because while prealloc throttling is aggressive under low free space conditions, it is not active until we reach 5% free space or less. For example, a 40GB filesystem has enough space for several files large enough to have multi-GB preallocations at any given time. If those files are slow growing, they might reserve preallocation for long periods of time as well as avoid the background scanner due to frequent modification. If a new file is written under these conditions, said file has no access to this already reserved space and premature ENOSPC is imminent. To handle this scenario, modify the buffered write ENOSPC handling and retry sequence to invoke an eofblocks scan. In the smaller filesystem scenario, the eofblocks scan resets the usage of preallocation such that when the 5% free space threshold is met, throttling effectively takes over to provide fair and efficient preallocation until legitimate ENOSPC. The eofblocks scan is selective based on the nature of the failure. For example, an EDQUOT failure in a particular quota will use a filtered scan for that quota. Because we don't know which quota might have caused an allocation failure at any given time, we include each applicable quota determined to be under low free space conditions in the scan. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:49:28 +10:00
Brian Foster	f452639792	xfs: support a union-based filter for eofblocks scans From: Brian Foster <bfoster@redhat.com> The eofblocks scan inode filter uses intersection logic by default. E.g., specifying both user and group quota ids filters out inodes that are not covered by both the specified user and group quotas. This is suitable for behavior exposed to userspace. Scans that are initiated from within the kernel might require more broad semantics, such as scanning all inodes under each quota associated with an inode to alleviate low free space conditions in each. Create the XFS_EOF_FLAGS_UNION flag to support a conditional union-based filtering algorithm for eofblocks scans. This flag is intentionally left out of the valid mask as it is not supported for scans initiated from userspace. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:44:28 +10:00
Brian Foster	5400da7dc0	xfs: add scan owner field to xfs_eofblocks From: Brian Foster <bfoster@redhat.com> The scan owner field represents an optional inode number that is responsible for the current scan. The purpose is to identify that an inode is under iolock and as such, the iolock shouldn't be attempted when trimming eofblocks. This is an internal only field. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:40:22 +10:00
Jie Liu	f3d1e58743	xfs: introduce xfs_bulkstat_grab_ichunk From: Jie Liu <jeff.liu@oracle.com> Introduce xfs_bulkstat_grab_ichunk() to look up an inode chunk in where the given inode resides, then grab the record. Update the data for the pointed-to record if the inode was not the last in the chunk and there are some left allocated, return the grabbed inode count on success. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:42:21 +10:00
Jie Liu	4b8fdfecd8	xfs: introduce xfs_bulkstat_ichunk_ra From: Jie Liu <jeff.liu@oracle.com> Introduce xfs_bulkstat_ichunk_ra() to loop over all clusters in the next inode chunk, then performs readahead if there are any allocated inodes in that cluster. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:41:18 +10:00
Jie Liu	d4c2734875	xfs: fix error handling at xfs_bulkstat From: Jie Liu <jeff.liu@oracle.com> We should not ignore the btree operation errors at xfs_bulkstat() but to propagate them if any. This patch fix two places in this function and the remaining things will be fixed with code refactoring thereafter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:40:43 +10:00
Jie Liu	296dfd7fdb	xfs: remove redundant user buffer count checks at xfs_bulkstat From: Jie Liu <jeff.liu@oracle.com> Remove the redundant user buffer and count checks as it has already been validated at xfs_ioc_bulkstat(). Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:40:26 +10:00
Himangi Saraogi	08a0f24e4c	ceph: replace comma with a semicolon Replace a comma between expression statements by a semicolon. This changes the semantics of the code, but given the current indentation appears to be what is intended. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-07-24 12:04:46 +04:00

... 9 10 11 12 13 ...

37611 commits