nfsd_file_dispose_list_delayed can be called from the filecache
laundrette, which is shut down after the nfsd threads are shut down and
the nfsd_serv pointer is cleared. If nn->nfsd_serv is NULL then there
are no threads to wake.
Ensure that the nn->nfsd_serv pointer is non-NULL before calling
svc_wake_up in nfsd_file_dispose_list_delayed. This is safe since the
svc_serv is not freed until after the filecache laundrette is cancelled.
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Closes: https://bugs.debian.org/1093734
Fixes: ffb4025961 ("nfsd: Don't leave work of closing files to a work queue")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit ec596aaf9b ("SUNRPC: Remove code behind
CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED") was the last user of the
gss_decrypt_xdr_buf() and gss_encrypt_xdr_buf() functions.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit ec596aaf9b ("SUNRPC: Remove code behind
CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED") was the last user of the routines
in gss_generic_token.c.
Remove the routines and associated header.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
xprt_iter_get_xprt() was added by
commit 80b14d5e61 ("SUNRPC: Add a structure to track multiple
transports") but is unused.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
I noticed that a handful of NFSv3 fstests were taking an
unexpectedly long time to run. Troubleshooting showed that the
server's TCP window closed and never re-opened, which caused the
client to trigger an RPC retransmit timeout after 180 seconds.
The client's recovery action was to establish a fresh connection
and retransmit the timed-out requests. This worked, but it adds a
long delay.
I tracked the problem to the commit that attempted to reduce the
rate at which the network layer delivers TCP socket data_ready
callbacks. Under most circumstances this change worked as expected,
but for NFSv3, which has no session or other type of throttling, it
can overwhelm the receiver on occasion.
I'm sure I could tweak the lowat settings, but the small benefit
doesn't seem worth the bother. Just revert it.
Fixes: 2b877fc53e ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages")
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Allow clients to request getting a delegation xor an open stateid if a
delegation isn't available. This allows the client to avoid sending a
final CLOSE for the (useless) open stateid, when it is granted a
delegation.
If this flag is requested by the client and there isn't already a new
open stateid, discard the new open stateid before replying.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Allow SETATTR to handle delegated timestamps. This patch assumes that
only the delegation holder has the ability to set the timestamps in this
way, so we allow this only if the SETATTR stateid refers to a
*_ATTRS_DELEG delegation.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add support for the delegated timestamps on write delegations. This
allows the server to proxy timestamps from the delegation holder to
other clients that are doing GETATTRs vs. the same inode.
When OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS bit is set in the OPEN
call, set the dl_type to the *_ATTRS_DELEG flavor of delegation.
Add timespec64 fields to nfs4_cb_fattr and decode the timestamps into
those. Vet those timestamps according to the delstid spec and update
the inode attrs if necessary.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The delstid draft adds new NFS4_SHARE_WANT_TYPE_MASK values that don't
fit neatly into the existing WANT_MASK or WHEN_MASK. Add a new
NFS4_SHARE_WANT_MOD_MASK value and redefine NFS4_SHARE_WANT_MASK to
include it.
Also fix the checks in nfsd4_deleg_xgrade_none_ext() to check for the
flags instead of equality, since there may be modifier flags in the
value.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add support for FATTR4_OPEN_ARGUMENTS. This a new mechanism for the
client to discover what OPEN features the server supports.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add some preparatory code to various functions that handle delegation
types to allow them to handle the OPEN_DELEGATE_*_ATTRS_DELEG constants.
Add helpers for detecting whether it's a read or write deleg, and
whether the attributes are delegated.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add the OPEN4_SHARE_ACCESS_WANT constants from the nfs4.1 and delstid
draft into the nfs4_1.x file, and regenerate the headers and source
files. Do a mass renaming of NFS4_SHARE_WANT_* to
OPEN4_SHARE_ACCESS_WANT_* in the nfsd directory.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Rename the enum with the same name in include/linux/nfs4.h, add the
proper enum to nfs4_1.x and regenerate the headers and source files. Do
a mass rename of all NFS4_OPEN_DELEGATE_* to OPEN_DELEGATE_* in the nfsd
directory.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In the long run, the NFS development community intends to autogenerate a
lot of the XDR handling code. Both the NFS client and server include
"include/linux/nfs4.hi". That file was hand-rolled, and some of the symbols
in it conflict with the autogenerated symbols.
Add a small nfs4_1.x to Documentation that currently just has the
necessary definitions for the delstid draft, and generate the relevant
header and source files. Make include/linux/nfs4.h include the generated
include/linux/sunrpc/xdrgen/nfs4_1.h and remove the conflicting
definitions from it and nfs_xdr.h.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC8881, section 10.4.3 has some specific guidance as to how the
delegated change attribute should be handled. We currently don't follow
that guidance properly.
In particular, when the file is modified, the server always reports the
initial change attribute + 1. Section 10.4.3 however indicates that it
should be incremented on every GETATTR request from other clients.
Only request the change attribute until the file has been modified. If
there is an outstanding delegation, then increment the cached change
attribute on every GETATTR.
Fixes: 6487a13b5c ("NFSD: add support for CB_GETATTR callback")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
A subtlety of this API is that if the @nbytes region traverses a
page boundary, the next __xdr_commit_encode will shift the data item
in the XDR encode buffer. This makes the returned pointer point to
something else, leading to unexpected behavior.
There are a few cases where the caller saves the returned pointer
and then later uses it to insert a computed value into an earlier
part of the stream. This can be safe only if either:
- the data item is guaranteed to be in the XDR buffer's head, and
thus is not ever going to be near a page boundary, or
- the data item is no larger than 4 octets, since XDR alignment
rules require all data items to start on 4-octet boundaries
But that safety is only an artifact of the current implementation.
It would be less brittle if these "safe" uses were eventually
replaced.
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit ab04de60ae ("NFSD: Optimize nfsd4_encode_fattr()") replaced
the use of write_bytes_to_xdr_buf() because it's expensive and the
data items to be encoded are already properly aligned.
However, there's no guarantee that the pointer returned from
xdr_reserve_space() will still point to the correct reserved space
in the encode buffer after one or more intervening calls to
xdr_reserve_space(). It just happens to work with the current
implementation of xdr_reserve_space().
This commit effectively reverts the optimization.
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There's no guarantee that the pointer returned from
xdr_reserve_space() will still point to the correct reserved space
in the encode buffer after one or more intervening calls to
xdr_reserve_space(). It just happens to work with the current
implementation of xdr_reserve_space().
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Extract the code that encodes the secinfo4 union data type to
clarify the logic. The removed warning is pretty well obscured and
thus probably not terribly useful.
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There's no guarantee that the pointer returned from
xdr_reserve_space() will still point to the correct reserved space
in the encode buffer after one or more intervening calls to
xdr_reserve_space(). It just happens to work with the current
implementation of xdr_reserve_space().
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit eeadcb7579 ("NFSD: Simplify READ_PLUS") replaced the use of
write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read()
at the time.
However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.
__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.
Fixes: eeadcb7579 ("NFSD: Simplify READ_PLUS")
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit eeadcb7579 ("NFSD: Simplify READ_PLUS") replaced the use of
write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read()
at the time.
However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.
__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.
Fixes: eeadcb7579 ("NFSD: Simplify READ_PLUS")
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit 28d5bc468e ("NFSD: Optimize nfsd4_encode_readv()") replaced
the use of write_bytes_to_xdr_buf() because it's expensive and the
data items to be encoded are already properly aligned.
However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.
__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.
This isn't an issue for splice reads because the reserved encode
buffer areas must fall in the XDR buffers header for the splice to
work without error. For vectored reads, however, there is a
possibility of send buffer corruption in rare cases.
Fixes: 28d5bc468e ("NFSD: Optimize nfsd4_encode_readv()")
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
J. David reports an odd corruption of a READDIR reply sent to a
FreeBSD client.
xdr_reserve_space() has to do a special trick when the @nbytes value
requests more space than there is in the current page of the XDR
buffer.
In that case, xdr_reserve_space() returns a pointer to the start of
the next page, and then the next call to xdr_reserve_space() invokes
__xdr_commit_encode() to copy enough of the data item back into the
previous page to make that data item contiguous across the page
boundary.
But we need to be careful in the case where buffer space is reserved
early for a data item whose value will be inserted into the buffer
later.
One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
encoding buffer for each COMPOUND operation. However, a READDIR
result can sometimes encode file names so that there are only 4
bytes left at the end of the current XDR buffer page (though plenty
of pages are left to handle the remaining encoding tasks).
If a COMPOUND operation follows the READDIR result (say, a GETATTR),
then nfsd4_encode_operation() will reserve 8 bytes for the op number
(9) and the op status (usually NFS4_OK). In this weird case,
xdr_reserve_space() returns a pointer to byte zero of the next buffer
page, as it assumes the data item will be copied back into place (in
the previous page) on the next call to xdr_reserve_space().
nfsd4_encode_operation() writes the op num into the buffer, then
saves the next 4-byte location for the op's status code. The next
xdr_reserve_space() call is part of GETATTR encoding, so the op num
gets copied back into the previous page, but the saved location for
the op status continues to point to the wrong spot in the current
XDR buffer page because __xdr_commit_encode() moved that data item.
After GETATTR encoding is complete, nfsd4_encode_operation() writes
the op status over the first XDR data item in the GETATTR result.
The NFS4_OK status code (0) makes it look like there are zero items
in the GETATTR's attribute bitmask.
The patch description of commit 2825a7f907 ("nfsd4: allow encoding
across page boundaries") [2014] remarks that NFSD "can't handle a
new operation starting close to the end of a page." This bug appears
to be one reason for that remark.
Reported-by: J David <j.david.lists@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
Tested-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We can access exp->ex_stats or exp->ex_uuid in rcu context(c_show and
e_show). All these resources should be released using kfree_rcu. Fix this
by using call_rcu, clean them all after a rcu grace period.
==================================================================
BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd]
Read of size 1 at addr ff11000010fdc120 by task cat/870
CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
print_address_description.constprop.0+0x2c/0x3a0
print_report+0xb9/0x280
kasan_report+0xae/0xe0
svc_export_show+0x362/0x430 [nfsd]
c_show+0x161/0x390 [sunrpc]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
proc_reg_read+0xe1/0x140
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Allocated by task 830:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc_node_track_caller_noprof+0x1bc/0x400
kmemdup_noprof+0x22/0x50
svc_export_parse+0x8a9/0xb80 [nfsd]
cache_do_downcall+0x71/0xa0 [sunrpc]
cache_write_procfs+0x8e/0xd0 [sunrpc]
proc_reg_write+0xe1/0x140
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 868:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x37/0x50
kfree+0xf3/0x3e0
svc_export_put+0x87/0xb0 [nfsd]
cache_purge+0x17f/0x1f0 [sunrpc]
nfsd_destroy_serv+0x226/0x2d0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: ae74136b4b ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock")
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
rcu_read_lock/rcu_read_unlock has already provide protection for the
pointer we will reference when we call c_show. Therefore, there is no
need to obtain a cache reference to help protect cache_head.
Additionally, the .put such as expkey_put/svc_export_put will invoke
dput, which can sleep and break rcu. Stop get cache reference to fix
them all.
Fixes: ae74136b4b ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock")
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
rcu_read_lock/rcu_read_unlock has already provide protection for the
pointer we will reference when we call e_show. Therefore, there is no
need to obtain a cache reference to help protect cache_head.
Additionally, the .put such as expkey_put/svc_export_put will invoke
dput, which can sleep and break rcu. Stop get cache reference to fix
them all.
Fixes: ae74136b4b ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock")
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This is a prepare patch to add cache_check_rcu, will use it with follow
patch.
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It helps to know what kind of callback happened that triggered the
WARN_ONCE in nfsd4_cb_done() function in diagnosing what can set
an uncommon state where both cb_status and tk_status are set at
the same time.
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If a client were to send an error to a CB_GETATTR call, the code
erronously continues to try decode past the error code. It ends
up returning BAD_XDR error to the rpc layer and then in turn
trigger a WARN_ONCE in nfsd4_cb_done() function.
Fixes: 6487a13b5c ("NFSD: add support for CB_GETATTR callback")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add a shrinker which frees unused slots and may ask the clients to use
fewer slots on each session.
We keep a global count of the number of freeable slots, which is the sum
of one less than the current "target" slots in all sessions in all
clients in all net-namespaces. This number is reported by the shrinker.
When the shrinker is asked to free some, we call xxx on each session in
a round-robin asking each to reduce the slot count by 1. This will
reduce the "target" so the number reported by the shrinker will reduce
immediately. The memory will only be freed later when the client
confirmed that it is no longer needed.
We use a global list of sessions and move the "head" to after the last
session that we asked to reduce, so the next callback from the shrinker
will move on to the next session. This pressure should be applied
"evenly" across all sessions over time.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reducing the number of slots in the session slot table requires
confirmation from the client. This patch adds reduce_session_slots()
which starts the process of getting confirmation, but never calls it.
That will come in a later patch.
Before we can free a slot we need to confirm that the client won't try
to use it again. This involves returning a lower cr_maxrequests in a
SEQUENCE reply and then seeing a ca_maxrequests on the same slot which
is not larger than we limit we are trying to impose. So for each slot
we need to remember that we have sent a reduced cr_maxrequests.
To achieve this we introduce a concept of request "generations". Each
time we decide to reduce cr_maxrequests we increment the generation
number, and record this when we return the lower cr_maxrequests to the
client. When a slot with the current generation reports a low
ca_maxrequests, we commit to that level and free extra slots.
We use an 16 bit generation number (64 seems wasteful) and if it cycles
we iterate all slots and reset the generation number to avoid false matches.
When we free a slot we store the seqid in the slot pointer so that it can
be restored when we reactivate the slot. The RFC can be read as
suggesting that the slot number could restart from one after a slot is
retired and reactivated, but also suggests that retiring slots is not
required. So when we reactive a slot we accept with the next seqid in
sequence, or 1.
When decoding sa_highest_slotid into maxslots we need to add 1 - this
matches how it is encoded for the reply.
se_dead is moved in struct nfsd4_session to remove a hole.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If a client ever uses the highest available slot for a given session,
attempt to allocate more slots so there is room for the client to use
them if wanted. GFP_NOWAIT is used so if there is not plenty of
free memory, failure is expected - which is what we want. It also
allows the allocation while holding a spinlock.
Each time we increase the number of slots by 20% (rounded up). This
allows fairly quick growth while avoiding excessive over-shoot.
We would expect to stablise with around 10% more slots available than
the client actually uses.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Each client now reports the number of slots allocated in each session.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Rather than guessing how much space it might be safe to use for the DRC,
simply try allocating slots and be prepared to accept failure.
The first slot for each session is allocated with GFP_KERNEL which is
unlikely to fail. Subsequent slots are allocated with the addition of
__GFP_NORETRY which is expected to fail if there isn't much free memory.
This is probably too aggressive but clears the way for adding a
shrinker interface to free extra slots when memory is tight.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Using an xarray to store session slots will make it easier to change the
number of active slots based on demand, and removes an unnecessary
limit.
To achieve good throughput with a high-latency server it can be helpful
to have hundreds of concurrent writes, which means hundreds of slots.
So increase the limit to 2048 (twice what the Linux client will
currently use). This limit is only a sanity check, not a hard limit.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that the connection limit only apply to unconfirmed connections,
there is no need to configure it. So remove all the configuration and
fix the number of unconfirmed connections as always 64 - which is
now given a name: XPT_MAX_TMP_CONN
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The heuristic for limiting the number of incoming connections to nfsd
currently uses sv_nrthreads - allowing more connections if more threads
were configured.
A future patch will allow number of threads to grow dynamically so that
there will be no need to configure sv_nrthreads. So we need a different
solution for limiting connections.
It isn't clear what problem is solved by limiting connections (as
mentioned in a code comment) but the most likely problem is a connection
storm - many connections that are not doing productive work. These will
be closed after about 6 minutes already but it might help to slow down a
storm.
This patch adds a per-connection flag XPT_PEER_VALID which indicates
that the peer has presented a filehandle for which it has some sort of
access. i.e the peer is known to be trusted in some way. We now only
count connections which have NOT been determined to be valid. There
should be relative few of these at any given time.
If the number of non-validated peer exceed a limit - currently 64 - we
close the oldest non-validated peer to avoid having too many of these
useless connections.
Note that this patch significantly changes the meaning of the various
configuration parameters for "max connections". The next patch will
remove all of these.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Get rid of the nfsd4_legacy_tracking_ops->init() call in
check_for_legacy_methods(). That will be handled in the caller
(nfsd4_client_tracking_init()). Otherwise, we'll wind up calling
nfsd4_legacy_tracking_ops->init() twice, and the second time we'll
trigger the BUG_ON() in nfsd4_init_recdir().
Fixes: 74fd48739d ("nfsd: new Kconfig option for legacy client tracking")
Reported-by: Jur van der Burg <jur@avtware.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219580
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
@sb should have been removed by commit 7e64c5bc49 ("NLM/NFSD: Fix
lock notifications for async-capable filesystems").
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
svc_thread_init_status() contains an open-coded
store_release_wake_up(). It is cleaner to use that function directly
rather than needing to remember the barrier.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The wake_up_var interface is fragile as barriers are sometimes needed.
There are now new interfaces so that most wake-ups can use an interface
that is guaranteed to have all barriers needed.
This patch changes the wake up on cl_cb_inflight to use
atomic_dec_and_wake_up().
It also changes the wake up on rp_locked to use store_release_wake_up().
This involves changing rp_locked from atomic_t to int.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Since commit e56dc9e294 ("nfsd: remove fault injection code") remove
all nfsd_recall_delegations codes,
we don't need trace_nfsd_deleg_recall any more.
Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
- Fix escaping of '$' in scripts/mksysmap
- Fix a modpost crash observed with the latest binutils
- Fix 'provides' in the linux-api-headers pacman package
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmd6lzsVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGkXoQAJCH0b0gfzuRMtgKI+HKj/eow4ZI
+uArtK/cu2fJ6Vxe71GTBJGEca8DNslOOlyMvWf6jhRcBdd5jOlTciUojk389jRb
6JR9fvEfS3RevkbkzVe0Uq6jSL02eLrF/z/Q3xmVMi68npBlCg40G2EDIIS8iWpu
jRxnhSUBKU+ooKahA2KwRVexUdkE8BbdsNLUf5NaTSOACI0A4sn6Om4uYdCQdrRC
oNptpzJdVoG40cKlzSyzNoCVxxY77b9PKdkuR28SMpj55ZjcAQRV3ZuHphegH1JQ
fpg6JNu0sNHrM4OkKzrpA6f8d3s1yooILgrCMQhFo0gUHLFVLIuos1t8uT4urDr+
RJ7bUWnLCyqu6wEtzGRaiRPXDolKJrXYYTO+XwLkz5iLkNLO1uGXvm7Hbl4GG2iX
PKKZ5gT02kQmijvDRc496Q7z4LYJqAqUIUu0NSR1lP+qK0qKCIHJyi3kAAvvLZ/Y
Eia8LtUXcjjbbMSSk3vCXC8qvRj4NE1HxqmbBDVTpdnzxNfxeQisHiD0Rz9RzTyo
FhvWGrDNx2vTcShrBiZvww5wfUkSSueXqG89d95U3AxlhJ58dk4w08whu03fkhNB
P/NM1e6ShOo682LVrBgFHmhgxc+2iYb81ZqzfSJVHLOV8cpShC0KjONFRgdCW4K4
O8iD6fUXYDlLIJjg
=v0Mh
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix escaping of '$' in scripts/mksysmap
- Fix a modpost crash observed with the latest binutils
- Fix 'provides' in the linux-api-headers pacman package
* tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: pacman-pkg: provide versioned linux-api-headers package
modpost: work around unaligned data access error
modpost: refactor do_vmbus_entry()
modpost: fix the missed iteration for the max bit in do_input()
scripts/mksysmap: Fix escape chars '$'
The usual bunch of singletons and two doubletons - please see the relevant
changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ3noXwAKCRDdBJ7gKXxA
jkzRAP9Ejb8kbgCrA3cptnzlVkDCDUm0TmleepT3bx6B2rH0BgEAzSiTXf4ioZPg
4pOHnKIGOWEVPcVwBrdA0irWG+QPYAQ=
=nEIZ
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"25 hotfixes. 16 are cc:stable. 18 are MM and 7 are non-MM.
The usual bunch of singletons and two doubletons - please see the
relevant changelogs for details"
* tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
MAINTAINERS: change Arınç _NAL's name and email address
scripts/sorttable: fix orc_sort_cmp() to maintain symmetry and transitivity
mm/util: make memdup_user_nul() similar to memdup_user()
mm, madvise: fix potential workingset node list_lru leaks
mm/damon/core: fix ignored quota goals and filters of newly committed schemes
mm/damon/core: fix new damon_target objects leaks on damon_commit_targets()
mm/list_lru: fix false warning of negative counter
vmstat: disable vmstat_work on vmstat_cpu_down_prep()
mm: shmem: fix the update of 'shmem_falloc->nr_unswapped'
mm: shmem: fix incorrect index alignment for within_size policy
percpu: remove intermediate variable in PERCPU_PTR()
mm: zswap: fix race between [de]compression and CPU hotunplug
ocfs2: fix slab-use-after-free due to dangling pointer dqi_priv
fs/proc/task_mmu: fix pagemap flags with PMD THP entries on 32bit
kcov: mark in_softirq_really() as __always_inline
docs: mm: fix the incorrect 'FileHugeMapped' field
mailmap: modify the entry for Mathieu Othacehe
mm/kmemleak: fix sleeping function called from invalid context at print message
mm: hugetlb: independent PMD page table shared count
maple_tree: reload mas before the second call for mas_empty_area
...
- Fix the CONFIG_RESET_CONTROLLER=n path signature of
clk_imx8mp_audiomix_reset_controller_register() to appease randconfig
- Speed up the sdhci clk on TH1520 by a factor of 4 by adding a fixed
factor clk
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmd5nXkRHHNib3lkQGtl
cm5lbC5vcmcACgkQrQKIl8bklSXRpRAAsPGcr0feyS+TxKSp3qGdmUwNdiwYzoH/
NvyzMO++Lw3VICY+NfatlfhHlVT9mwVbIB/1xvMbL/3p7AbYR2e4/wRzOycE6+mB
leYUrEmvWeRXCEgs6zymMm8i7qezTfRPvpTf6dIp04DxbIEXcy/B9FTdhJlYQv8T
CfOXv6BVZbWDgkNDZBiu/a93sMk3ivQUYcgdPYZD6CT6IcEeTFYRAoZqIrywO14P
xl70TIaWC+PDeGjsGlH5nFXq4cCkRuqSIZx50hsgT5tvM+CV105AGdMjlNRijwzM
L4G3u1faRzGZI+4U4ccjZuNQWBhPZxdb5Ho31nDHYSaSxW/f7eBIJfZYVTggryC9
DcgQSlPdfiVa5fwyj/ZSenOr7S0Km0onnHxrbqNd5bCEuhWrpRPCwLJ2XtqOOi1l
3jezZ2RvvtrYqO0naPrcgpJVPU98J+CfNveGQUSXH9g/b2RVBgUsOV4NLWb6YoH0
f11RbfZjBx1maEjUrv1fRIl8dvW4CySQXQpMYG/lGD52bs3EzOre3GCiyAtM9liH
tpAvclQB0lVEpr0YIwXPrmK5dFIL4mZQNIaaCqsO8ApkkQMHfPXxwGxX1E9wPIeg
E+cBXGsHzQf5fZERNc9BFRtelN5RHGQCb7mgqv9nNkFXCB09jWCNSjyVtOibU8FJ
KCeYxUbyVyk=
=30Sl
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A randconfig build fix and a performance fix:
- Fix the CONFIG_RESET_CONTROLLER=n path signature of
clk_imx8mp_audiomix_reset_controller_register() to appease
randconfig
- Speed up the sdhci clk on TH1520 by a factor of 4 by adding
a fixed factor clk"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: clk-imx8mp-audiomix: fix function signature
clk: thead: Fix TH1520 emmc and shdci clock rate
The Arch Linux glibc package contains a versioned dependency on
"linux-api-headers". If the linux-api-headers package provided by
pacman-pkg does not specify an explicit version this dependency is not
satisfied.
Fix the dependency by providing an explicit version.
Fixes: c8578539de ("kbuild: add script and target to generate pacman package")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>