If the server returns NFS4ERR_OLD_STATEID, then just skip retrying the
GETATTR when replaying the delegreturn compound. We know nothing will
have changed on the server.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the server returns NFS4ERR_OLD_STATEID in response to our delegreturn,
we want to sync to the most recent seqid for the delegation stateid. However
if we are already at the most recent, we have two possibilities:
- an OPEN reply is still outstanding and will return a new seqid
- an earlier OPEN reply was dropped on the floor due to a timeout.
In the latter case, we may end up unable to complete the delegreturn,
so we want to bump the seqid to a value greater than the cached value.
While this may cause us to lose the delegation in the former case,
it should now be safe to assume that the client will replay the OPEN
if necessary in order to get a new valid stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the server returns the same delegation in an open that we just used
in a delegreturn, we need to ensure we don't apply that stateid if
the delegreturn has freed it on the server.
To do so, we ensure that we do not free the storage for the delegation
until either it is replaced by a new one, or we throw the inode out of
cache.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In nfs_inode_find_state_and_recover() we want to mark for recovery
only those stateids that match or are older than the supplied
stateid parameter.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fix the checks in nfs4_inode_make_writeable() to ignore the case where
we hold no delegations. Currently, in such a case, we automatically
flush writes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that we check that the delegation is valid in
nfs4_return_incompatible_delegation() before we try to return it.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the delegation was revoked, or is already being returned, just
clear the NFS_DELEGATION_RETURN and NFS_DELEGATION_RETURN_IF_CLOSED
flags and keep going.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we revoke a delegation, but the stateid's seqid is newer, then
ensure we update the seqid when marking the delegation as revoked.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the server sent us a new delegation stateid that is more recent than
the one that got revoked, then clear the NFS_DELEGATION_REVOKED flag.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add a check to ensure that we haven't already removed the delegation
from the inode after we take all the relevant locks.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Rename nfs_inode_return_delegation_noreclaim() to
nfs_inode_evict_delegation(), which better describes what it
does.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we're processsing a delegation recall, ignore the delegations that
have already been revoked or returned.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the delegation is marked as being revoked, then don't use it in
the open state structure.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
at the source port when deciding whether or not an RPC call is a replay
of a previous call. This requires clients to perform strange TCP gymnastics
in order to ensure that when they reconnect to the server, they bind
to the same source port.
NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
that do not look at the source port of the connection. This patch therefore
ensures they can ignore the rebind requirement.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a NFSv3 server is being used as both a DS and as a regular NFSv3 server,
we may want to keep the IO traffic on a separate TCP connection, since
it will typically have very different timeout characteristics.
This patch therefore sets up a flag to separate the two modes of operation
for the nfs_client.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Connecting to the DS is a non-interactive, asynchronous task, so there is
no reason to fire up an extra RPC null ping in order to ensure that the
server is up.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add a flag to tell the nfs_client it should set RPC_CLNT_CREATE_NOPING when
creating the rpc client.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Simplify the struct iattr timestamp encoding by skipping the step of
an intermediate struct timespec.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Simplify the struct iattr timestamp encoding by skipping the step of
an intermediate struct timespec.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Encode the mtime correctly.
Fixes: 95582b0083 ("vfs: change inode times to use struct timespec64")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Convert the NFSv4 callbacks to use struct timestamp64, rather than
truncating times to 32-bit values.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFSv4 supports 64-bit timestamps, so there is no point in converting
the struct iattr timestamps to 32-bits before encoding.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFSv4 supports 64-bit times, so we should switch to using struct
timespec64 when decoding attributes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we set nfs_mountpoint_expiry_timeout to a negative value, then
allow that to imply that we do not expire NFSv4 submounts.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
A typo in nfs4_refresh_delegation_stateid() means we're leaking an
RCU lock, and always returning a value of 'false'. As the function
description states, we were always supposed to return 'true' if a
matching delegation was found.
Fixes: 12f275cdd1 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the delegation is marked as being revoked, we must not use it
for cached opens.
Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Add a check to disallow cross file systems copy offload, both
files are expected to be of NFS4.2+ type.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
For small file sizes, it make sense to issue a synchronous copy (and
save an RPC callback operation). Also, for the inter copy offload,
copy len must be larger than the cost of doing a mount between the
destination and source server (14RPCs are sent during 4.x mount).
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
When the source server reboots after a server-to-server copy was
issued, we need to retry the copy from COPY_NOTIFY. We need to
detect that the source server rebooted and there is a copy waiting
on a destination server and wake it up.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
When a destination server sends a READ to the source server it can
get a NFS4ERR_PARTNER_NO_AUTH, which means a copy needs to be
restarted.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
If server sends ERR_OFFLOAD_DENIED error, the client must fall
back on doing copy the normal way. Return ENOTSUPP to the vfs and
fallback to regular copy.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
If the client sends an "inter" copy to the destination server but
it only supports "intra" copy, it can return ESTALE (since it
doesn't know anything about the file handle from the other server
and does not recognize the special case of "inter" copy). Translate
this error as ENOTSUPP and also send OFFLOAD_CANCEL to the source
server so that it can clean up state.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Mark the open created for the source file on the destination
server. Then if this open is going thru a recovery, then fail
the recovery as we don't need to be recoving a "fake" open.
We need to fail the ongoing READs and vfs_copy_file_range().
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
NFSv4.2 inter server to server copy requires the destination server to
READ the data from the source server using the provided stateid and
file handle.
Given an NFSv4 stateid and filehandle from the COPY operaion, provide the
destination server with an NFS client function to create a struct file
suitable for the destiniation server to READ the data to be copied.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Support only one source server address: the same address that
the client and source server use.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Try using the delegation stateid, then the open stateid.
Only NL4_NETATTR, No support for NL4_NAME and NL4_URL.
Allow only one source server address to be returned for now.
To distinguish between same server copy offload ("intra") and
a copy between different server ("inter"), do a check of server
owner identity and also make sure server is capable of doing
a copy offload.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
We no longer need the extra mirror length tracking in the O_DIRECT code,
as we are able to track the maximum contiguous length in dreq->max_count.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When a series of O_DIRECT reads or writes are truncated, either due to
eof or due to an error, then we should return the number of contiguous
bytes that were received/sent starting at the offset specified by the
application.
Currently, we are failing to correctly check contiguity, and so we're
failing the generic/465 in xfstests when the race between the read
and write RPCs causes the file to get extended while the 2 reads are
outstanding. If the first read RPC call wins the race and returns with
eof set, we should treat the second read RPC as being truncated.
Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Fixes: 1ccbad9f9f ("nfs: fix DIO good bytes calculation")
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Stable bugfixes:
- Dequeue the request from the receive queue while we're re-encoding # v4.20+
- Fix buffer handling of GSS MIC without slack # 5.1
Features:
- Increase xprtrdma maximum transport header and slot table sizes
- Add support for nfs4_call_sync() calls using a custom rpc_task_struct
- Optimize the default readahead size
- Enable pNFS filelayout LAYOUTGET on OPEN
Other bugfixes and cleanups:
- Fix possible null-pointer dereferences and memory leaks
- Various NFS over RDMA cleanups
- Various NFS over RDMA comment updates
- Don't receive TCP data into a reset request buffer
- Don't try to parse incomplete RPC messages
- Fix congestion window race with disconnect
- Clean up pNFS return-on-close error handling
- Fixes for NFS4ERR_OLD_STATEID handling
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl2NC04ACgkQ18tUv7Cl
QOs4Tg//bAlGs+dIKixAmeMKmTd6I34laUnuyV/12yPQDgo6bryLrTngfe2BYvmG
2l+8H7yHfR4/gQE4vhR0c15xFgu6pvjBGR0/nNRaXienIPXO4xsQkcaxVA7SFRY2
HjffZwyoBfjyRps0jL+2sTsKbRtSkf9Dn+BONRgesg51jK1jyWkXqXpmgi4uMO4i
ojpTrW81dwo7Yhv08U2A/Q1ifMJ8F9dVYuL5sm+fEbVI/Nxoz766qyB8rs8+b4Xj
3gkfyh/Y1zoMmu6c+r2Q67rhj9WYbDKpa6HH9yX1zM/RLTiU7czMX+kjuQuOHWxY
YiEk73NjJ48WJEep3odess1q/6WiAXX7UiJM1SnDFgAa9NZMdfhqMm6XduNO1m60
sy0i8AdxdQciWYexOXMsBuDUCzlcoj4WYs1QGpY3uqO1MznQS/QUfu65fx8CzaT5
snm6ki5ivqXH/js/0Z4MX2n/sd1PGJ5ynMkekxJ8G3gw+GC/oeSeGNawfedifLKK
OdzyDdeiel5Me1p4I28j1WYVLHvtFmEWEU9oytdG0D/rjC/pgYgW/NYvAao8lQ4Z
06wdcyAM66ViAPrbYeE7Bx4jy8zYRkiw6Y3kIbLgrlMugu3BhIW5Mi3BsgL4f4am
KsqkzUqPZMCOVwDuUILSuPp4uHaR+JTJttywiLniTL6reF5kTiA=
=4Ey6
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- Dequeue the request from the receive queue while we're re-encoding
# v4.20+
- Fix buffer handling of GSS MIC without slack # 5.1
Features:
- Increase xprtrdma maximum transport header and slot table sizes
- Add support for nfs4_call_sync() calls using a custom
rpc_task_struct
- Optimize the default readahead size
- Enable pNFS filelayout LAYOUTGET on OPEN
Other bugfixes and cleanups:
- Fix possible null-pointer dereferences and memory leaks
- Various NFS over RDMA cleanups
- Various NFS over RDMA comment updates
- Don't receive TCP data into a reset request buffer
- Don't try to parse incomplete RPC messages
- Fix congestion window race with disconnect
- Clean up pNFS return-on-close error handling
- Fixes for NFS4ERR_OLD_STATEID handling"
* tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
pNFS/filelayout: enable LAYOUTGET on OPEN
NFS: Optimise the default readahead size
NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
NFSv4: Fix OPEN_DOWNGRADE error handling
pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
NFSv4: Add a helper to increment stateid seqids
NFSv4: Handle RPC level errors in LAYOUTRETURN
NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
NFSv4: Clean up pNFS return-on-close error handling
pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
NFS: remove unused check for negative dentry
NFSv3: use nfs_add_or_obtain() to create and reference inodes
NFS: Refactor nfs_instantiate() for dentry referencing callers
SUNRPC: Fix congestion window race with disconnect
SUNRPC: Don't try to parse incomplete RPC messages
SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
SUNRPC: Fix buffer handling of GSS MIC without slack
SUNRPC: RPC level errors should always set task->tk_rpc_status
SUNRPC: Don't receive TCP data into a request buffer that has been reset
...
Add the flag to the filelayout driver to add LAYOUTGET to
the OPEN compound.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In the years since the max readahead size was fixed in NFS, a number of
things have happened:
- Users can now set the value directly using /sys/class/bdi
- NFS max supported block sizes have increased by several orders of
magnitude from 64K to 1MB.
- Disk access latencies are orders of magnitude faster due to SSD + NVME.
In particular note that if the server is advertising 1MB as the optimal
read size, as that will set the readahead size to 15MB.
Let's therefore adjust down, and try to default to VM_READAHEAD_PAGES.
However let's inform the VM about our preferred block size so that it
can choose to round up in cases where that makes sense.
Reported-by: Alkis Georgopoulos <alkisg@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>