linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Jeff Layton	b9382e29ca	nfsd: validate the nfsd_serv pointer before calling svc_wake_up nfsd_file_dispose_list_delayed can be called from the filecache laundrette, which is shut down after the nfsd threads are shut down and the nfsd_serv pointer is cleared. If nn->nfsd_serv is NULL then there are no threads to wake. Ensure that the nn->nfsd_serv pointer is non-NULL before calling svc_wake_up in nfsd_file_dispose_list_delayed. This is safe since the svc_serv is not freed until after the filecache laundrette is cancelled. Reported-by: Salvatore Bonaccorso <carnil@debian.org> Closes: https://bugs.debian.org/1093734 Fixes: `ffb4025961` ("nfsd: Don't leave work of closing files to a work queue") Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-02-02 11:31:58 -05:00
Li Lingfeng	7faf14a7b0	nfsd: clear acl_access/acl_default after releasing them If getting acl_default fails, acl_access and acl_default will be released simultaneously. However, acl_access will still retain a pointer pointing to the released posix_acl, which will trigger a WARNING in nfs3svc_release_getacl like this: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 26 PID: 3199 at lib/refcount.c:28 refcount_warn_saturate+0xb5/0x170 Modules linked in: CPU: 26 UID: 0 PID: 3199 Comm: nfsd Not tainted 6.12.0-rc6-00079-g04ae226af01f-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb5/0x170 Code: cc cc 0f b6 1d b3 20 a5 03 80 fb 01 0f 87 65 48 d8 00 83 e3 01 75 e4 48 c7 c7 c0 3b 9b 85 c6 05 97 20 a5 03 01 e8 fb 3e 30 ff <0f> 0b eb cd 0f b6 1d 8a3 RSP: 0018:ffffc90008637cd8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83904fde RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88871ed36380 RBP: ffff888158beeb40 R08: 0000000000000001 R09: fffff520010c6f56 R10: ffffc90008637ab7 R11: 0000000000000001 R12: 0000000000000001 R13: ffff888140e77400 R14: ffff888140e77408 R15: ffffffff858b42c0 FS: 0000000000000000(0000) GS:ffff88871ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562384d32158 CR3: 000000055cc6a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? refcount_warn_saturate+0xb5/0x170 ? __warn+0xa5/0x140 ? refcount_warn_saturate+0xb5/0x170 ? report_bug+0x1b1/0x1e0 ? handle_bug+0x53/0xa0 ? exc_invalid_op+0x17/0x40 ? asm_exc_invalid_op+0x1a/0x20 ? tick_nohz_tick_stopped+0x1e/0x40 ? refcount_warn_saturate+0xb5/0x170 ? refcount_warn_saturate+0xb5/0x170 nfs3svc_release_getacl+0xc9/0xe0 svc_process_common+0x5db/0xb60 ? __pfx_svc_process_common+0x10/0x10 ? __rcu_read_unlock+0x69/0xa0 ? __pfx_nfsd_dispatch+0x10/0x10 ? svc_xprt_received+0xa1/0x120 ? xdr_init_decode+0x11d/0x190 svc_process+0x2a7/0x330 svc_handle_xprt+0x69d/0x940 svc_recv+0x180/0x2d0 nfsd+0x168/0x200 ? __pfx_nfsd+0x10/0x10 kthread+0x1a2/0x1e0 ? kthread+0xf4/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Kernel panic - not syncing: kernel: panic_on_warn set ... Clear acl_access/acl_default after posix_acl_release is called to prevent UAF from being triggered. Fixes: `a257cdd0e2` ("[PATCH] NFSD: Add server support for NFSv3 ACLs.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241107014705.2509463-1-lilingfeng@huaweicloud.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Rick Macklem <rmacklem@uoguelph.ca> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-02-02 11:31:45 -05:00
Dr. David Alan Gilbert	c92066e786	sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode Commit `ec596aaf9b` ("SUNRPC: Remove code behind CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED") was the last user of the gss_decrypt_xdr_buf() and gss_encrypt_xdr_buf() functions. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Dr. David Alan Gilbert	afc52b1eeb	sunrpc: Remove gss_generic_token deadcode Commit `ec596aaf9b` ("SUNRPC: Remove code behind CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED") was the last user of the routines in gss_generic_token.c. Remove the routines and associated header. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Dr. David Alan Gilbert	ee0d90d4b9	sunrpc: Remove unused xprt_iter_get_xprt xprt_iter_get_xprt() was added by commit `80b14d5e61` ("SUNRPC: Add a structure to track multiple transports") but is unused. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Chuck Lever	966a675da8	Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages" I noticed that a handful of NFSv3 fstests were taking an unexpectedly long time to run. Troubleshooting showed that the server's TCP window closed and never re-opened, which caused the client to trigger an RPC retransmit timeout after 180 seconds. The client's recovery action was to establish a fresh connection and retransmit the timed-out requests. This worked, but it adds a long delay. I tracked the problem to the commit that attempted to reduce the rate at which the network layer delivers TCP socket data_ready callbacks. Under most circumstances this change worked as expected, but for NFSv3, which has no session or other type of throttling, it can overwhelm the receiver on occasion. I'm sure I could tweak the lowat settings, but the small benefit doesn't seem worth the bother. Just revert it. Fixes: `2b877fc53e` ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages") Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	d3edfd9ed1	nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION Allow clients to request getting a delegation xor an open stateid if a delegation isn't available. This allows the client to avoid sending a final CLOSE for the (useless) open stateid, when it is granted a delegation. If this flag is requested by the client and there isn't already a new open stateid, discard the new open stateid before replying. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	7e13f4f8d2	nfsd: handle delegated timestamps in SETATTR Allow SETATTR to handle delegated timestamps. This patch assumes that only the delegation holder has the ability to set the timestamps in this way, so we allow this only if the SETATTR stateid refers to a *_ATTRS_DELEG delegation. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	6ae30d6eb2	nfsd: add support for delegated timestamps Add support for the delegated timestamps on write delegations. This allows the server to proxy timestamps from the delegation holder to other clients that are doing GETATTRs vs. the same inode. When OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS bit is set in the OPEN call, set the dl_type to the *_ATTRS_DELEG flavor of delegation. Add timespec64 fields to nfs4_cb_fattr and decode the timestamps into those. Vet those timestamps according to the delstid spec and update the inode attrs if necessary. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	cee9b4ef42	nfsd: rework NFS4_SHARE_WANT_* flag handling The delstid draft adds new NFS4_SHARE_WANT_TYPE_MASK values that don't fit neatly into the existing WANT_MASK or WHEN_MASK. Add a new NFS4_SHARE_WANT_MOD_MASK value and redefine NFS4_SHARE_WANT_MASK to include it. Also fix the checks in nfsd4_deleg_xgrade_none_ext() to check for the flags instead of equality, since there may be modifier flags in the value. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	51c0d4f7e3	nfsd: add support for FATTR4_OPEN_ARGUMENTS Add support for FATTR4_OPEN_ARGUMENTS. This a new mechanism for the client to discover what OPEN features the server supports. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	fbd5573d0d	nfsd: prepare delegation code for handing out _ATTRS_DELEG delegations Add some preparatory code to various functions that handle delegation types to allow them to handle the OPEN_DELEGATE__ATTRS_DELEG constants. Add helpers for detecting whether it's a read or write deleg, and whether the attributes are delegated. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	c9c99a33e2	nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_* Add the OPEN4_SHARE_ACCESS_WANT constants from the nfs4.1 and delstid draft into the nfs4_1.x file, and regenerate the headers and source files. Do a mass renaming of NFS4_SHARE_WANT_* to OPEN4_SHARE_ACCESS_WANT_* in the nfsd directory. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	8dfbea8bde	nfsd: switch to autogenerated definitions for open_delegation_type4 Rename the enum with the same name in include/linux/nfs4.h, add the proper enum to nfs4_1.x and regenerate the headers and source files. Do a mass rename of all NFS4_OPEN_DELEGATE_* to OPEN_DELEGATE_* in the nfsd directory. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:01 -05:00
Jeff Layton	8e1d32273a	nfs_common: make include/linux/nfs4.h include generated nfs4_1.h In the long run, the NFS development community intends to autogenerate a lot of the XDR handling code. Both the NFS client and server include "include/linux/nfs4.hi". That file was hand-rolled, and some of the symbols in it conflict with the autogenerated symbols. Add a small nfs4_1.x to Documentation that currently just has the necessary definitions for the delstid draft, and generate the relevant header and source files. Make include/linux/nfs4.h include the generated include/linux/sunrpc/xdrgen/nfs4_1.h and remove the conflicting definitions from it and nfs_xdr.h. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:30:00 -05:00
Jeff Layton	531503054e	nfsd: fix handling of delegated change attr in CB_GETATTR RFC8881, section 10.4.3 has some specific guidance as to how the delegated change attribute should be handled. We currently don't follow that guidance properly. In particular, when the file is modified, the server always reports the initial change attribute + 1. Section 10.4.3 however indicates that it should be incremented on every GETATTR request from other clients. Only request the change attribute until the file has been modified. If there is an outstanding delegation, then increment the cached change attribute on every GETATTR. Fixes: `6487a13b5c` ("NFSD: add support for CB_GETATTR callback") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-21 15:29:40 -05:00
Chuck Lever	1196bdce3d	SUNRPC: Document validity guarantees of the pointer returned by reserve_space A subtlety of this API is that if the @nbytes region traverses a page boundary, the next __xdr_commit_encode will shift the data item in the XDR encode buffer. This makes the returned pointer point to something else, leading to unexpected behavior. There are a few cases where the caller saves the returned pointer and then later uses it to insert a computed value into an earlier part of the stream. This can be safe only if either: - the data item is guaranteed to be in the XDR buffer's head, and thus is not ever going to be near a page boundary, or - the data item is no larger than 4 octets, since XDR alignment rules require all data items to start on 4-octet boundaries But that safety is only an artifact of the current implementation. It would be less brittle if these "safe" uses were eventually replaced. Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:43:44 -05:00
Chuck Lever	4163ee711c	NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer Commit `ab04de60ae` ("NFSD: Optimize nfsd4_encode_fattr()") replaced the use of write_bytes_to_xdr_buf() because it's expensive and the data items to be encoded are already properly aligned. However, there's no guarantee that the pointer returned from xdr_reserve_space() will still point to the correct reserved space in the encode buffer after one or more intervening calls to xdr_reserve_space(). It just happens to work with the current implementation of xdr_reserve_space(). This commit effectively reverts the optimization. Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:43:25 -05:00
Chuck Lever	b786caa65d	NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer There's no guarantee that the pointer returned from xdr_reserve_space() will still point to the correct reserved space in the encode buffer after one or more intervening calls to xdr_reserve_space(). It just happens to work with the current implementation of xdr_reserve_space(). Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:42:20 -05:00
Chuck Lever	825562bc7d	NFSD: Refactor nfsd4_do_encode_secinfo() again Extract the code that encodes the secinfo4 union data type to clarify the logic. The removed warning is pretty well obscured and thus probably not terribly useful. Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:41:49 -05:00
Chuck Lever	201cb2048a	NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer There's no guarantee that the pointer returned from xdr_reserve_space() will still point to the correct reserved space in the encode buffer after one or more intervening calls to xdr_reserve_space(). It just happens to work with the current implementation of xdr_reserve_space(). Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:41:42 -05:00
Chuck Lever	26ea81638f	NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer Commit `eeadcb7579` ("NFSD: Simplify READ_PLUS") replaced the use of write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read() at the time. However, the current code will corrupt the encoded data if the XDR data items that are reserved early and then poked into the XDR buffer later happen to fall on a page boundary in the XDR encoding buffer. __xdr_commit_encode can shift encoded data items in the encoding buffer so that pointers returned from xdr_reserve_space() no longer address the same part of the encoding stream. Fixes: `eeadcb7579` ("NFSD: Simplify READ_PLUS") Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:41:10 -05:00
Chuck Lever	c9fc7772ba	NFSD: Insulate nfsd4_encode_read_plus() from page boundaries in the encode buffer Commit `eeadcb7579` ("NFSD: Simplify READ_PLUS") replaced the use of write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read() at the time. However, the current code will corrupt the encoded data if the XDR data items that are reserved early and then poked into the XDR buffer later happen to fall on a page boundary in the XDR encoding buffer. __xdr_commit_encode can shift encoded data items in the encoding buffer so that pointers returned from xdr_reserve_space() no longer address the same part of the encoding stream. Fixes: `eeadcb7579` ("NFSD: Simplify READ_PLUS") Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:40:38 -05:00
Chuck Lever	1a861150bd	NFSD: Insulate nfsd4_encode_read() from page boundaries in the encode buffer Commit `28d5bc468e` ("NFSD: Optimize nfsd4_encode_readv()") replaced the use of write_bytes_to_xdr_buf() because it's expensive and the data items to be encoded are already properly aligned. However, the current code will corrupt the encoded data if the XDR data items that are reserved early and then poked into the XDR buffer later happen to fall on a page boundary in the XDR encoding buffer. __xdr_commit_encode can shift encoded data items in the encoding buffer so that pointers returned from xdr_reserve_space() no longer address the same part of the encoding stream. This isn't an issue for splice reads because the reserved encode buffer areas must fall in the XDR buffers header for the splice to work without error. For vectored reads, however, there is a possibility of send buffer corruption in rare cases. Fixes: `28d5bc468e` ("NFSD: Optimize nfsd4_encode_readv()") Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:39:46 -05:00
Chuck Lever	ef3675b45b	NFSD: Encode COMPOUND operation status on page boundaries J. David reports an odd corruption of a READDIR reply sent to a FreeBSD client. xdr_reserve_space() has to do a special trick when the @nbytes value requests more space than there is in the current page of the XDR buffer. In that case, xdr_reserve_space() returns a pointer to the start of the next page, and then the next call to xdr_reserve_space() invokes __xdr_commit_encode() to copy enough of the data item back into the previous page to make that data item contiguous across the page boundary. But we need to be careful in the case where buffer space is reserved early for a data item whose value will be inserted into the buffer later. One such caller, nfsd4_encode_operation(), reserves 8 bytes in the encoding buffer for each COMPOUND operation. However, a READDIR result can sometimes encode file names so that there are only 4 bytes left at the end of the current XDR buffer page (though plenty of pages are left to handle the remaining encoding tasks). If a COMPOUND operation follows the READDIR result (say, a GETATTR), then nfsd4_encode_operation() will reserve 8 bytes for the op number (9) and the op status (usually NFS4_OK). In this weird case, xdr_reserve_space() returns a pointer to byte zero of the next buffer page, as it assumes the data item will be copied back into place (in the previous page) on the next call to xdr_reserve_space(). nfsd4_encode_operation() writes the op num into the buffer, then saves the next 4-byte location for the op's status code. The next xdr_reserve_space() call is part of GETATTR encoding, so the op num gets copied back into the previous page, but the saved location for the op status continues to point to the wrong spot in the current XDR buffer page because __xdr_commit_encode() moved that data item. After GETATTR encoding is complete, nfsd4_encode_operation() writes the op status over the first XDR data item in the GETATTR result. The NFS4_OK status code (0) makes it look like there are zero items in the GETATTR's attribute bitmask. The patch description of commit `2825a7f907` ("nfsd4: allow encoding across page boundaries") [2014] remarks that NFSD "can't handle a new operation starting close to the end of a page." This bug appears to be one reason for that remark. Reported-by: J David <j.david.lists@gmail.com> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t Tested-by: Rick Macklem <rmacklem@uoguelph.ca> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-10 23:37:41 -05:00
Yang Erkun	2530766492	nfsd: fix UAF when access ex_uuid or ex_stats We can access exp->ex_stats or exp->ex_uuid in rcu context(c_show and e_show). All these resources should be released using kfree_rcu. Fix this by using call_rcu, clean them all after a rcu grace period. ================================================================== BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd] Read of size 1 at addr ff11000010fdc120 by task cat/870 CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x2c/0x3a0 print_report+0xb9/0x280 kasan_report+0xae/0xe0 svc_export_show+0x362/0x430 [nfsd] c_show+0x161/0x390 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 830: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc_node_track_caller_noprof+0x1bc/0x400 kmemdup_noprof+0x22/0x50 svc_export_parse+0x8a9/0xb80 [nfsd] cache_do_downcall+0x71/0xa0 [sunrpc] cache_write_procfs+0x8e/0xd0 [sunrpc] proc_reg_write+0xe1/0x140 vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 868: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kfree+0xf3/0x3e0 svc_export_put+0x87/0xb0 [nfsd] cache_purge+0x17f/0x1f0 [sunrpc] nfsd_destroy_serv+0x226/0x2d0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `ae74136b4b` ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:42 -05:00
Yang Erkun	1b10f0b603	SUNRPC: no need get cache ref when protected by rcu rcu_read_lock/rcu_read_unlock has already provide protection for the pointer we will reference when we call c_show. Therefore, there is no need to obtain a cache reference to help protect cache_head. Additionally, the .put such as expkey_put/svc_export_put will invoke dput, which can sleep and break rcu. Stop get cache reference to fix them all. Fixes: `ae74136b4b` ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock") Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:41 -05:00
Yang Erkun	c224edca7a	nfsd: no need get cache ref when protected by rcu rcu_read_lock/rcu_read_unlock has already provide protection for the pointer we will reference when we call e_show. Therefore, there is no need to obtain a cache reference to help protect cache_head. Additionally, the .put such as expkey_put/svc_export_put will invoke dput, which can sleep and break rcu. Stop get cache reference to fix them all. Fixes: `ae74136b4b` ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock") Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:41 -05:00
Yang Erkun	2f55dbe4e2	SUNRPC: introduce cache_check_rcu to help check in rcu context This is a prepare patch to add cache_check_rcu, will use it with follow patch. Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:40 -05:00
Olga Kornievskaia	cb80ecf75a	NFSD: add cb opcode to WARN_ONCE on failed callback It helps to know what kind of callback happened that triggered the WARN_ONCE in nfsd4_cb_done() function in diagnosing what can set an uncommon state where both cb_status and tk_status are set at the same time. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:40 -05:00
Olga Kornievskaia	1b3e26a5cc	NFSD: fix decoding in nfs4_xdr_dec_cb_getattr If a client were to send an error to a CB_GETATTR call, the code erronously continues to try decode past the error code. It ends up returning BAD_XDR error to the rpc layer and then in turn trigger a WARN_ONCE in nfsd4_cb_done() function. Fixes: `6487a13b5c` ("NFSD: add support for CB_GETATTR callback") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:39 -05:00
NeilBrown	35e34642b5	nfsd: add shrinker to reduce number of slots allocated per session Add a shrinker which frees unused slots and may ask the clients to use fewer slots on each session. We keep a global count of the number of freeable slots, which is the sum of one less than the current "target" slots in all sessions in all clients in all net-namespaces. This number is reported by the shrinker. When the shrinker is asked to free some, we call xxx on each session in a round-robin asking each to reduce the slot count by 1. This will reduce the "target" so the number reported by the shrinker will reduce immediately. The memory will only be freed later when the client confirmed that it is no longer needed. We use a global list of sessions and move the "head" to after the last session that we asked to reduce, so the next callback from the shrinker will move on to the next session. This pressure should be applied "evenly" across all sessions over time. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:39 -05:00
NeilBrown	fc8738c68d	nfsd: add support for freeing unused session-DRC slots Reducing the number of slots in the session slot table requires confirmation from the client. This patch adds reduce_session_slots() which starts the process of getting confirmation, but never calls it. That will come in a later patch. Before we can free a slot we need to confirm that the client won't try to use it again. This involves returning a lower cr_maxrequests in a SEQUENCE reply and then seeing a ca_maxrequests on the same slot which is not larger than we limit we are trying to impose. So for each slot we need to remember that we have sent a reduced cr_maxrequests. To achieve this we introduce a concept of request "generations". Each time we decide to reduce cr_maxrequests we increment the generation number, and record this when we return the lower cr_maxrequests to the client. When a slot with the current generation reports a low ca_maxrequests, we commit to that level and free extra slots. We use an 16 bit generation number (64 seems wasteful) and if it cycles we iterate all slots and reset the generation number to avoid false matches. When we free a slot we store the seqid in the slot pointer so that it can be restored when we reactivate the slot. The RFC can be read as suggesting that the slot number could restart from one after a slot is retired and reactivated, but also suggests that retiring slots is not required. So when we reactive a slot we accept with the next seqid in sequence, or 1. When decoding sa_highest_slotid into maxslots we need to add 1 - this matches how it is encoded for the reply. se_dead is moved in struct nfsd4_session to remove a hole. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:38 -05:00
NeilBrown	60aa656431	nfsd: allocate new session-based DRC slots on demand. If a client ever uses the highest available slot for a given session, attempt to allocate more slots so there is room for the client to use them if wanted. GFP_NOWAIT is used so if there is not plenty of free memory, failure is expected - which is what we want. It also allows the allocation while holding a spinlock. Each time we increase the number of slots by 20% (rounded up). This allows fairly quick growth while avoiding excessive over-shoot. We would expect to stablise with around 10% more slots available than the client actually uses. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:38 -05:00
NeilBrown	601c8cb349	nfsd: add session slot count to /proc/fs/nfsd/clients/*/info Each client now reports the number of slots allocated in each session. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:38 -05:00
NeilBrown	b5fba969a2	nfsd: remove artificial limits on the session-based DRC Rather than guessing how much space it might be safe to use for the DRC, simply try allocating slots and be prepared to accept failure. The first slot for each session is allocated with GFP_KERNEL which is unlikely to fail. Subsequent slots are allocated with the addition of __GFP_NORETRY which is expected to fail if there isn't much free memory. This is probably too aggressive but clears the way for adding a shrinker interface to free extra slots when memory is tight. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:37 -05:00
NeilBrown	0b6e142426	nfsd: use an xarray to store v4.1 session slots Using an xarray to store session slots will make it easier to change the number of active slots based on demand, and removes an unnecessary limit. To achieve good throughput with a high-latency server it can be helpful to have hundreds of concurrent writes, which means hundreds of slots. So increase the limit to 2048 (twice what the Linux client will currently use). This limit is only a sanity check, not a hard limit. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:37 -05:00
NeilBrown	a4b853f183	sunrpc: remove all connection limit configuration Now that the connection limit only apply to unconfirmed connections, there is no need to configure it. So remove all the configuration and fix the number of unconfirmed connections as always 64 - which is now given a name: XPT_MAX_TMP_CONN Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:36 -05:00
NeilBrown	eccbbc7c00	nfsd: don't use sv_nrthreads in connection limiting calculations. The heuristic for limiting the number of incoming connections to nfsd currently uses sv_nrthreads - allowing more connections if more threads were configured. A future patch will allow number of threads to grow dynamically so that there will be no need to configure sv_nrthreads. So we need a different solution for limiting connections. It isn't clear what problem is solved by limiting connections (as mentioned in a code comment) but the most likely problem is a connection storm - many connections that are not doing productive work. These will be closed after about 6 minutes already but it might help to slow down a storm. This patch adds a per-connection flag XPT_PEER_VALID which indicates that the peer has presented a filehandle for which it has some sort of access. i.e the peer is known to be trusted in some way. We now only count connections which have NOT been determined to be valid. There should be relative few of these at any given time. If the number of non-validated peer exceed a limit - currently 64 - we close the oldest non-validated peer to avoid having too many of these useless connections. Note that this patch significantly changes the meaning of the various configuration parameters for "max connections". The next patch will remove all of these. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:36 -05:00
Scott Mayhew	de71d4e211	nfsd: fix legacy client tracking initialization Get rid of the nfsd4_legacy_tracking_ops->init() call in check_for_legacy_methods(). That will be handled in the caller (nfsd4_client_tracking_init()). Otherwise, we'll wind up calling nfsd4_legacy_tracking_ops->init() twice, and the second time we'll trigger the BUG_ON() in nfsd4_init_recdir(). Fixes: `74fd48739d` ("nfsd: new Kconfig option for legacy client tracking") Reported-by: Jur van der Burg <jur@avtware.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219580 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:35 -05:00
Chuck Lever	6f035c99ac	NFSD: Clean up unused variable @sb should have been removed by commit `7e64c5bc49` ("NLM/NFSD: Fix lock notifications for async-capable filesystems"). Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:35 -05:00
NeilBrown	6e1d75f778	sunrpc/svc: use store_release_wake_up() svc_thread_init_status() contains an open-coded store_release_wake_up(). It is cleaner to use that function directly rather than needing to remember the barrier. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:34 -05:00
NeilBrown	935fee5d5b	nfsd: use new wake_up_var interfaces. The wake_up_var interface is fragile as barriers are sometimes needed. There are now new interfaces so that most wake-ups can use an interface that is guaranteed to have all barriers needed. This patch changes the wake up on cl_cb_inflight to use atomic_dec_and_wake_up(). It also changes the wake up on rp_locked to use store_release_wake_up(). This involves changing rp_locked from atomic_t to int. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:34 -05:00
Chen Hanxiao	19d97ac5aa	nfsd: trace: remove redundant stateid even deleg_recall Since commit `e56dc9e294` ("nfsd: remove fault injection code") remove all nfsd_recall_delegations codes, we don't need trace_nfsd_deleg_recall any more. Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-01-06 09:37:34 -05:00
Linus Torvalds	9d89551994	Linux 6.13-rc6	2025-01-05 14:13:40 -08:00
Linus Torvalds	9244696b34	Kbuild fixes for v6.13 (3rd) - Fix escaping of '$' in scripts/mksysmap - Fix a modpost crash observed with the latest binutils - Fix 'provides' in the linux-api-headers pacman package -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmd6lzsVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGkXoQAJCH0b0gfzuRMtgKI+HKj/eow4ZI +uArtK/cu2fJ6Vxe71GTBJGEca8DNslOOlyMvWf6jhRcBdd5jOlTciUojk389jRb 6JR9fvEfS3RevkbkzVe0Uq6jSL02eLrF/z/Q3xmVMi68npBlCg40G2EDIIS8iWpu jRxnhSUBKU+ooKahA2KwRVexUdkE8BbdsNLUf5NaTSOACI0A4sn6Om4uYdCQdrRC oNptpzJdVoG40cKlzSyzNoCVxxY77b9PKdkuR28SMpj55ZjcAQRV3ZuHphegH1JQ fpg6JNu0sNHrM4OkKzrpA6f8d3s1yooILgrCMQhFo0gUHLFVLIuos1t8uT4urDr+ RJ7bUWnLCyqu6wEtzGRaiRPXDolKJrXYYTO+XwLkz5iLkNLO1uGXvm7Hbl4GG2iX PKKZ5gT02kQmijvDRc496Q7z4LYJqAqUIUu0NSR1lP+qK0qKCIHJyi3kAAvvLZ/Y Eia8LtUXcjjbbMSSk3vCXC8qvRj4NE1HxqmbBDVTpdnzxNfxeQisHiD0Rz9RzTyo FhvWGrDNx2vTcShrBiZvww5wfUkSSueXqG89d95U3AxlhJ58dk4w08whu03fkhNB P/NM1e6ShOo682LVrBgFHmhgxc+2iYb81ZqzfSJVHLOV8cpShC0KjONFRgdCW4K4 O8iD6fUXYDlLIJjg =v0Mh -----END PGP SIGNATURE----- Merge tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix escaping of '$' in scripts/mksysmap - Fix a modpost crash observed with the latest binutils - Fix 'provides' in the linux-api-headers pacman package * tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: pacman-pkg: provide versioned linux-api-headers package modpost: work around unaligned data access error modpost: refactor do_vmbus_entry() modpost: fix the missed iteration for the max bit in do_input() scripts/mksysmap: Fix escape chars '$'	2025-01-05 10:52:47 -08:00
Linus Torvalds	5635d8bad2	25 hotfixes. 16 are cc:stable. 18 are MM and 7 are non-MM. The usual bunch of singletons and two doubletons - please see the relevant changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ3noXwAKCRDdBJ7gKXxA jkzRAP9Ejb8kbgCrA3cptnzlVkDCDUm0TmleepT3bx6B2rH0BgEAzSiTXf4ioZPg 4pOHnKIGOWEVPcVwBrdA0irWG+QPYAQ= =nEIZ -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "25 hotfixes. 16 are cc:stable. 18 are MM and 7 are non-MM. The usual bunch of singletons and two doubletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits) MAINTAINERS: change Arınç _NAL's name and email address scripts/sorttable: fix orc_sort_cmp() to maintain symmetry and transitivity mm/util: make memdup_user_nul() similar to memdup_user() mm, madvise: fix potential workingset node list_lru leaks mm/damon/core: fix ignored quota goals and filters of newly committed schemes mm/damon/core: fix new damon_target objects leaks on damon_commit_targets() mm/list_lru: fix false warning of negative counter vmstat: disable vmstat_work on vmstat_cpu_down_prep() mm: shmem: fix the update of 'shmem_falloc->nr_unswapped' mm: shmem: fix incorrect index alignment for within_size policy percpu: remove intermediate variable in PERCPU_PTR() mm: zswap: fix race between [de]compression and CPU hotunplug ocfs2: fix slab-use-after-free due to dangling pointer dqi_priv fs/proc/task_mmu: fix pagemap flags with PMD THP entries on 32bit kcov: mark in_softirq_really() as __always_inline docs: mm: fix the incorrect 'FileHugeMapped' field mailmap: modify the entry for Mathieu Othacehe mm/kmemleak: fix sleeping function called from invalid context at print message mm: hugetlb: independent PMD page table shared count maple_tree: reload mas before the second call for mas_empty_area ...	2025-01-05 10:37:45 -08:00
Linus Torvalds	7a5b6fc8bd	A randconfig build fix and a performance fix: - Fix the CONFIG_RESET_CONTROLLER=n path signature of clk_imx8mp_audiomix_reset_controller_register() to appease randconfig - Speed up the sdhci clk on TH1520 by a factor of 4 by adding a fixed factor clk -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmd5nXkRHHNib3lkQGtl cm5lbC5vcmcACgkQrQKIl8bklSXRpRAAsPGcr0feyS+TxKSp3qGdmUwNdiwYzoH/ NvyzMO++Lw3VICY+NfatlfhHlVT9mwVbIB/1xvMbL/3p7AbYR2e4/wRzOycE6+mB leYUrEmvWeRXCEgs6zymMm8i7qezTfRPvpTf6dIp04DxbIEXcy/B9FTdhJlYQv8T CfOXv6BVZbWDgkNDZBiu/a93sMk3ivQUYcgdPYZD6CT6IcEeTFYRAoZqIrywO14P xl70TIaWC+PDeGjsGlH5nFXq4cCkRuqSIZx50hsgT5tvM+CV105AGdMjlNRijwzM L4G3u1faRzGZI+4U4ccjZuNQWBhPZxdb5Ho31nDHYSaSxW/f7eBIJfZYVTggryC9 DcgQSlPdfiVa5fwyj/ZSenOr7S0Km0onnHxrbqNd5bCEuhWrpRPCwLJ2XtqOOi1l 3jezZ2RvvtrYqO0naPrcgpJVPU98J+CfNveGQUSXH9g/b2RVBgUsOV4NLWb6YoH0 f11RbfZjBx1maEjUrv1fRIl8dvW4CySQXQpMYG/lGD52bs3EzOre3GCiyAtM9liH tpAvclQB0lVEpr0YIwXPrmK5dFIL4mZQNIaaCqsO8ApkkQMHfPXxwGxX1E9wPIeg E+cBXGsHzQf5fZERNc9BFRtelN5RHGQCb7mgqv9nNkFXCB09jWCNSjyVtOibU8FJ KCeYxUbyVyk= =30Sl -----END PGP SIGNATURE----- Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A randconfig build fix and a performance fix: - Fix the CONFIG_RESET_CONTROLLER=n path signature of clk_imx8mp_audiomix_reset_controller_register() to appease randconfig - Speed up the sdhci clk on TH1520 by a factor of 4 by adding a fixed factor clk" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: clk-imx8mp-audiomix: fix function signature clk: thead: Fix TH1520 emmc and shdci clock rate	2025-01-05 10:28:34 -08:00
Thomas Weißschuh	385443057f	kbuild: pacman-pkg: provide versioned linux-api-headers package The Arch Linux glibc package contains a versioned dependency on "linux-api-headers". If the linux-api-headers package provided by pacman-pkg does not specify an explicit version this dependency is not satisfied. Fix the dependency by providing an explicit version. Fixes: `c8578539de` ("kbuild: add script and target to generate pacman package") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2025-01-05 23:19:17 +09:00
Linus Torvalds	ab75170520	linux-watchdog 6.13-rc6 tag -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEABECAAYFAmd5de4ACgkQ+iyteGJfRsrcWACgoOs/mw7ReFgz4BCkk8BkBfcv y2kAn3uyUMwb/mU1m8qLWpY8RuhNC4Ge =oL1Y -----END PGP SIGNATURE----- Merge tag 'linux-watchdog-6.13-rc6' of git://www.linux-watchdog.org/linux-watchdog Pull watchdog fix from Wim Van Sebroeck: - fix error message during stm32 driver probe * tag 'linux-watchdog-6.13-rc6' of git://www.linux-watchdog.org/linux-watchdog: watchdog: stm32_iwdg: fix error message during driver probe	2025-01-04 10:59:10 -08:00

1 2 3 4 5 ...

1324886 commits