Commit graph

700 commits

Author SHA1 Message Date
Chuck Lever
48aab1606f NFSD: Remove the cap on number of operations per NFSv4 COMPOUND
This limit has always been a sanity check; in nearly all cases a
large COMPOUND is a sign of a malfunctioning client. The only real
limit on COMPOUND size and complexity is the size of NFSD's send
and receive buffers.

However, there are a few cases where a large COMPOUND is sane. For
example, when a client implementation wants to walk down a long file
pathname in a single round trip.

A small risk is that now a client can construct a COMPOUND request
that can keep a single nfsd thread busy for quite some time.

Suggested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14 12:46:41 -04:00
Linus Torvalds
2c26b68cd5 NFSD 6.16 Release Notes
The marquee feature for this release is that the limit on the
 maximum rsize and wsize has been raised to 4MB. The default remains
 at 1MB, but risk-seeking administrators now have the ability to try
 larger I/O sizes with NFS clients that support them. Eventually the
 default setting will be increased when we have confidence that this
 change will not have negative impact.
 
 With v6.16, NFSD now has its own debugfs file system where we can
 add experimental features and make them available outside of our
 development community without impacting production deployments. The
 first experimental setting added is one that makes all NFS READ
 operations use vfs_iter_read() instead of the NFSD splice actor. The
 plan is to eventually retire the splice actor, as that will enable a
 number of new capabilities such as the use of struct bio_vec from the
 top to the bottom of the NFSD stack.
 
 Jeff Layton contributed a number of observability improvements. The
 use of dprintk() in a number of high-traffic code paths has been
 replaced with static trace points.
 
 This release sees the continuation of efforts to harden the NFSv4.2
 COPY operation. Soon, the restriction on async COPY operations can
 be lifted.
 
 Many thanks to the contributors, reviewers, testers, and bug
 reporters who participated during the v6.16 development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmg1yGUACgkQM2qzM29m
 f5frMA//TJTbSWiM7qBX1GhVMNr1lxQcjU4BPKo0qZfEtwV06F2BB9mWgDU+BIQh
 AcGfMZUmNWAnhOTOYvwqyW6dnX+1yt8sBsCZ/1ctY30A4JH4AgG5sdZS7BUrlEEr
 bGDMUCaPnvQ3maeDjMlefe7Xv/rUhj9TVhXmDkt4vf/jCde2JODTB/z8n7WeAxYJ
 eOvmr/n5z6VI5Q67M7b5/xqofBEaEoq9P5UEgn61ThfeR0bMlrklm/avDCbbNIH8
 6n7Z3tjzllK1CAjEmwHalq4LRbMX5FHWzNkyJw+wtviXS18J5vCAvRe+JDoykusu
 L2bgXT8bBUqy46eO4WKEOJtEqVQhIsRFx/8ku1iTLrpDWlwrR4mHVyObEDkkdlMX
 EyBQ4svg2OxCXSyy5O8oggzU0TWVJStIjbIEHbJYusWLU7HxxFveBwqwzYHXLtip
 WKm6N2ANqQi1du+Pc6xmgXo9svA5Vk+DQjljm1Y5up9dhi2K9cvCIHjwFsZ+E0VL
 XqXJ2YgIQb3oXK7FttzLOiDrpX1OX82sTIbgdcPcfT7lP+ej7uiHMBPmdPwgaZIU
 EbIp0ThoTkh8/VRMDcWIt+B6SEhmb5vY3Zgz9Lcf2J0PM1fuYJ67L7xGTviFX7Ci
 DpohiCgceb6PHYeIuarayF86tPJGF8Vb7XvQZej2Ybv8QdxLFg8=
 =FbeG
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "The marquee feature for this release is that the limit on the maximum
  rsize and wsize has been raised to 4MB. The default remains at 1MB,
  but risk-seeking administrators now have the ability to try larger I/O
  sizes with NFS clients that support them. Eventually the default
  setting will be increased when we have confidence that this change
  will not have negative impact.

  With v6.16, NFSD now has its own debugfs file system where we can add
  experimental features and make them available outside of our
  development community without impacting production deployments. The
  first experimental setting added is one that makes all NFS READ
  operations use vfs_iter_read() instead of the NFSD splice actor. The
  plan is to eventually retire the splice actor, as that will enable a
  number of new capabilities such as the use of struct bio_vec from the
  top to the bottom of the NFSD stack.

  Jeff Layton contributed a number of observability improvements. The
  use of dprintk() in a number of high-traffic code paths has been
  replaced with static trace points.

  This release sees the continuation of efforts to harden the NFSv4.2
  COPY operation. Soon, the restriction on async COPY operations can be
  lifted.

  Many thanks to the contributors, reviewers, testers, and bug reporters
  who participated during the v6.16 development cycle"

* tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits)
  xdrgen: Fix code generated for counted arrays
  SUNRPC: Bump the maximum payload size for the server
  NFSD: Add a "default" block size
  NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro
  NFSD: Remove NFSD_BUFSIZE
  sunrpc: Remove the RPCSVC_MAXPAGES macro
  svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
  svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
  sunrpc: Adjust size of socket's receive page array dynamically
  SUNRPC: Remove svc_rqst :: rq_vec
  SUNRPC: Remove svc_fill_write_vector()
  NFSD: Use rqstp->rq_bvec in nfsd_iter_write()
  SUNRPC: Export xdr_buf_to_bvec()
  NFSD: De-duplicate the svc_fill_write_vector() call sites
  NFSD: Use rqstp->rq_bvec in nfsd_iter_read()
  sunrpc: Replace the rq_bvec array with dynamically-allocated memory
  sunrpc: Replace the rq_pages array with dynamically-allocated memory
  sunrpc: Remove backchannel check in svc_init_buffer()
  sunrpc: Add a helper to derive maxpages from sv_max_mesg
  svcrdma: Reduce the number of rdma_rw contexts per-QP
  ...
2025-05-28 12:21:12 -07:00
Chuck Lever
58d721684d NFSD: Remove NFSD_BUFSIZE
Clean up: The documenting comment for NFSD_BUFSIZE is quite stale.
NFSD_BUFSIZE is used only for NFSv4 Reply these days; never for
NFSv2 or v3, and never for RPC Calls. Even so, the byte count
estimate does not include the size of the NFSv4 COMPOUND Reply
HEADER or the RPC auth flavor.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15 16:16:27 -04:00
Chuck Lever
d6ca7d2643 NFSD: Implement FATTR4_CLONE_BLKSIZE attribute
RFC 7862 states that if an NFS server implements a CLONE operation,
it MUST also implement FATTR4_CLONE_BLKSIZE. NFSD implements CLONE,
but does not implement FATTR4_CLONE_BLKSIZE.

Note that in Section 12.2, RFC 7862 claims that
FATTR4_CLONE_BLKSIZE is RECOMMENDED, not REQUIRED. Likely this is
because a minor version is not permitted to add a REQUIRED
attribute. Confusing.

We assume this attribute reports a block size as a count of bytes,
as RFC 7862 does not specify a unit.

Reported-by: Roland Mainz <roland.mainz@nrubsig.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Cc: stable@vger.kernel.org # v6.7+
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11 19:48:29 -04:00
NeilBrown
8ad9248471
nfsd: Use lookup_one() rather than lookup_one_len()
nfsd uses some VFS interfaces (such as vfs_mkdir) which take an explicit
mnt_idmap, and it passes &nop_mnt_idmap as nfsd doesn't yet support
idmapped mounts.

It also uses the lookup_one_len() family of functions which implicitly
use &nop_mnt_idmap.  This mixture of implicit and explicit could be
confusing.  When we eventually update nfsd to support idmap mounts it
would be best if all places which need an idmap determined from the
mount point were similar and easily found.

So this patch changes nfsd to use lookup_one(), lookup_one_unlocked(),
and lookup_one_positive_unlocked(), passing &nop_mnt_idmap.

This has the benefit of removing some uses of the lookup_one_len
functions where permission checking is actually needed.  Many callers
don't care about permission checking and using these function only where
permission checking is needed is a valuable simplification.

This change requires passing the name in a qstr.  Currently this is a
little clumsy, but if nfsd is changed to use qstr more broadly it will
result in a net improvement.

Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-3-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:25:32 +02:00
Linus Torvalds
f34b580514 NFSD 6.14 Release Notes
Jeff Layton contributed an implementation of NFSv4.2+ attribute
 delegation, as described here:
 
 https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html
 
 This interoperates with similar functionality introduced into the
 Linux NFS client in v6.11. An attribute delegation permits an NFS
 client to manage a file's mtime, rather than flushing dirty data to
 the NFS server so that the file's mtime reflects the last write,
 which is considerably slower.
 
 Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
 This facility enables NFSD to increase or decrease the number of
 slots per NFS session depending on server memory availability. More
 session slots means greater parallelism.
 
 Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
 encoding screws up when crossing a page boundary in the encoding
 buffer. This is a zero-day bug, but hitting it is rare and depends
 on the NFS client implementation. The Linux NFS client does not
 happen to trigger this issue.
 
 A variety of bug fixes and other incremental improvements fill out
 the list of commits in this release. Great thanks to all
 contributors, reviewers, testers, and bug reporters who participated
 during this development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmeVPBUACgkQM2qzM29m
 f5dClxAAmW4O2bOJaR8neJ54fzeFYXtFYEXRF/XbIh2KdnCy2LoywT9ux8ndzE0k
 1tsjtv0g4y84IcYfrxXPRTuhh2GO2pHw5L1kGVRezJg2ODSFbpdzGtcVK7SIrs2S
 eijTViqdi9xRgfj1jPqRrvxC99RL3/fmCztqorAPLDFsqYAd/6ZRxZ7+IcZ2h4+J
 cJ0Z6Wx6eh10roacZPXweH13XJ7xWO/ublYZvQFQpK2BAKyO98aXGgLDraNt8k60
 X3DZLSKkGB/eBlNeAlTtcrXec/ot6XGJPKr3b/7zhwfMi8B13RGdSmCR8SxMdRQM
 vCQO4G2YadU4YFS6FFIw9Wc1XDYUuYh2YgcveafjzjXbgi7NnY7rYOxtnTgBi0Xv
 XGjtGqpvD676gPm+8b3DcwqmWI3c/WUdtQIZ1uYRCFZdFqkVP91bySPT2aHtx2GO
 4j3uEyTlypC00kyvu1oL3+tVUG/EFlJCpYvIbOwqDG2m7KWPStzpfJTD5Q9cdlEl
 fdgs6l82EVqe1YyjLTqajDuOcRrLYK2hlR/5STc03hQV+GpKSo5UypRejzE9WtRV
 zT/tyelqhj4+0EZJz4ay/8q9s2Jp+5JGVxoVvjujSuH7+Ulb3T+IDkldtMamO8Fm
 www2y0/fLfU2xIapMJdCoJ+ZKgel2i8RZMPIc0cIfO5ITXm+dOs=
 =hwXx
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Jeff Layton contributed an implementation of NFSv4.2+ attribute
  delegation, as described here:

    https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html

  This interoperates with similar functionality introduced into the
  Linux NFS client in v6.11. An attribute delegation permits an NFS
  client to manage a file's mtime, rather than flushing dirty data to
  the NFS server so that the file's mtime reflects the last write, which
  is considerably slower.

  Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
  This facility enables NFSD to increase or decrease the number of slots
  per NFS session depending on server memory availability. More session
  slots means greater parallelism.

  Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
  encoding screws up when crossing a page boundary in the encoding
  buffer. This is a zero-day bug, but hitting it is rare and depends on
  the NFS client implementation. The Linux NFS client does not happen to
  trigger this issue.

  A variety of bug fixes and other incremental improvements fill out the
  list of commits in this release. Great thanks to all contributors,
  reviewers, testers, and bug reporters who participated during this
  development cycle"

* tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
  sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode
  sunrpc: Remove gss_generic_token deadcode
  sunrpc: Remove unused xprt_iter_get_xprt
  Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages"
  nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
  nfsd: handle delegated timestamps in SETATTR
  nfsd: add support for delegated timestamps
  nfsd: rework NFS4_SHARE_WANT_* flag handling
  nfsd: add support for FATTR4_OPEN_ARGUMENTS
  nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
  nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
  nfsd: switch to autogenerated definitions for open_delegation_type4
  nfs_common: make include/linux/nfs4.h include generated nfs4_1.h
  nfsd: fix handling of delegated change attr in CB_GETATTR
  SUNRPC: Document validity guarantees of the pointer returned by reserve_space
  NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
  NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer
  NFSD: Refactor nfsd4_do_encode_secinfo() again
  NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer
  NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
  ...
2025-01-27 17:27:24 -08:00
Jeff Layton
d3edfd9ed1 nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
Allow clients to request getting a delegation xor an open stateid if a
delegation isn't available. This allows the client to avoid sending a
final CLOSE for the (useless) open stateid, when it is granted a
delegation.

If this flag is requested by the client and there isn't already a new
open stateid, discard the new open stateid before replying.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
7e13f4f8d2 nfsd: handle delegated timestamps in SETATTR
Allow SETATTR to handle delegated timestamps. This patch assumes that
only the delegation holder has the ability to set the timestamps in this
way, so we allow this only if the SETATTR stateid refers to a
*_ATTRS_DELEG delegation.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
6ae30d6eb2 nfsd: add support for delegated timestamps
Add support for the delegated timestamps on write delegations. This
allows the server to proxy timestamps from the delegation holder to
other clients that are doing GETATTRs vs. the same inode.

When OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS bit is set in the OPEN
call, set the dl_type to the *_ATTRS_DELEG flavor of delegation.

Add timespec64 fields to nfs4_cb_fattr and decode the timestamps into
those. Vet those timestamps according to the delstid spec and update
the inode attrs if necessary.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
cee9b4ef42 nfsd: rework NFS4_SHARE_WANT_* flag handling
The delstid draft adds new NFS4_SHARE_WANT_TYPE_MASK values that don't
fit neatly into the existing WANT_MASK or WHEN_MASK. Add a new
NFS4_SHARE_WANT_MOD_MASK value and redefine NFS4_SHARE_WANT_MASK to
include it.

Also fix the checks in nfsd4_deleg_xgrade_none_ext() to check for the
flags instead of equality, since there may be modifier flags in the
value.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
51c0d4f7e3 nfsd: add support for FATTR4_OPEN_ARGUMENTS
Add support for FATTR4_OPEN_ARGUMENTS. This a new mechanism for the
client to discover what OPEN features the server supports.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
fbd5573d0d nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
Add some preparatory code to various functions that handle delegation
types to allow them to handle the OPEN_DELEGATE_*_ATTRS_DELEG constants.
Add helpers for detecting whether it's a read or write deleg, and
whether the attributes are delegated.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
c9c99a33e2 nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
Add the OPEN4_SHARE_ACCESS_WANT constants from the nfs4.1 and delstid
draft into the nfs4_1.x file, and regenerate the headers and source
files. Do a mass renaming of NFS4_SHARE_WANT_* to
OPEN4_SHARE_ACCESS_WANT_* in the nfsd directory.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
8dfbea8bde nfsd: switch to autogenerated definitions for open_delegation_type4
Rename the enum with the same name in include/linux/nfs4.h, add the
proper enum to nfs4_1.x and regenerate the headers and source files.  Do
a mass rename of all NFS4_OPEN_DELEGATE_* to OPEN_DELEGATE_* in the nfsd
directory.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:30:01 -05:00
Jeff Layton
531503054e nfsd: fix handling of delegated change attr in CB_GETATTR
RFC8881, section 10.4.3 has some specific guidance as to how the
delegated change attribute should be handled. We currently don't follow
that guidance properly.

In particular, when the file is modified, the server always reports the
initial change attribute + 1. Section 10.4.3 however indicates that it
should be incremented on every GETATTR request from other clients.

Only request the change attribute until the file has been modified. If
there is an outstanding delegation, then increment the cached change
attribute on every GETATTR.

Fixes: 6487a13b5c ("NFSD: add support for CB_GETATTR callback")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21 15:29:40 -05:00
Chuck Lever
4163ee711c NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
Commit ab04de60ae ("NFSD: Optimize nfsd4_encode_fattr()") replaced
the use of write_bytes_to_xdr_buf() because it's expensive and the
data items to be encoded are already properly aligned.

However, there's no guarantee that the pointer returned from
xdr_reserve_space() will still point to the correct reserved space
in the encode buffer after one or more intervening calls to
xdr_reserve_space(). It just happens to work with the current
implementation of xdr_reserve_space().

This commit effectively reverts the optimization.

Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10 23:43:25 -05:00
Chuck Lever
b786caa65d NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer
There's no guarantee that the pointer returned from
xdr_reserve_space() will still point to the correct reserved space
in the encode buffer after one or more intervening calls to
xdr_reserve_space(). It just happens to work with the current
implementation of xdr_reserve_space().

Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10 23:42:20 -05:00
Chuck Lever
825562bc7d NFSD: Refactor nfsd4_do_encode_secinfo() again
Extract the code that encodes the secinfo4 union data type to
clarify the logic. The removed warning is pretty well obscured and
thus probably not terribly useful.

Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10 23:41:49 -05:00
Chuck Lever
201cb2048a NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer
There's no guarantee that the pointer returned from
xdr_reserve_space() will still point to the correct reserved space
in the encode buffer after one or more intervening calls to
xdr_reserve_space(). It just happens to work with the current
implementation of xdr_reserve_space().

Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10 23:41:42 -05:00
Chuck Lever
26ea81638f NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
Commit eeadcb7579 ("NFSD: Simplify READ_PLUS") replaced the use of
write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read()
at the time.

However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.

__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.

Fixes: eeadcb7579 ("NFSD: Simplify READ_PLUS")
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10 23:41:10 -05:00
Chuck Lever
c9fc7772ba NFSD: Insulate nfsd4_encode_read_plus() from page boundaries in the encode buffer
Commit eeadcb7579 ("NFSD: Simplify READ_PLUS") replaced the use of
write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read()
at the time.

However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.

__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.

Fixes: eeadcb7579 ("NFSD: Simplify READ_PLUS")
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10 23:40:38 -05:00
Chuck Lever
1a861150bd NFSD: Insulate nfsd4_encode_read() from page boundaries in the encode buffer
Commit 28d5bc468e ("NFSD: Optimize nfsd4_encode_readv()") replaced
the use of write_bytes_to_xdr_buf() because it's expensive and the
data items to be encoded are already properly aligned.

However, the current code will corrupt the encoded data if the XDR
data items that are reserved early and then poked into the XDR
buffer later happen to fall on a page boundary in the XDR encoding
buffer.

__xdr_commit_encode can shift encoded data items in the encoding
buffer so that pointers returned from xdr_reserve_space() no longer
address the same part of the encoding stream.

This isn't an issue for splice reads because the reserved encode
buffer areas must fall in the XDR buffers header for the splice to
work without error. For vectored reads, however, there is a
possibility of send buffer corruption in rare cases.

Fixes: 28d5bc468e ("NFSD: Optimize nfsd4_encode_readv()")
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10 23:39:46 -05:00
Chuck Lever
ef3675b45b NFSD: Encode COMPOUND operation status on page boundaries
J. David reports an odd corruption of a READDIR reply sent to a
FreeBSD client.

xdr_reserve_space() has to do a special trick when the @nbytes value
requests more space than there is in the current page of the XDR
buffer.

In that case, xdr_reserve_space() returns a pointer to the start of
the next page, and then the next call to xdr_reserve_space() invokes
__xdr_commit_encode() to copy enough of the data item back into the
previous page to make that data item contiguous across the page
boundary.

But we need to be careful in the case where buffer space is reserved
early for a data item whose value will be inserted into the buffer
later.

One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
encoding buffer for each COMPOUND operation. However, a READDIR
result can sometimes encode file names so that there are only 4
bytes left at the end of the current XDR buffer page (though plenty
of pages are left to handle the remaining encoding tasks).

If a COMPOUND operation follows the READDIR result (say, a GETATTR),
then nfsd4_encode_operation() will reserve 8 bytes for the op number
(9) and the op status (usually NFS4_OK). In this weird case,
xdr_reserve_space() returns a pointer to byte zero of the next buffer
page, as it assumes the data item will be copied back into place (in
the previous page) on the next call to xdr_reserve_space().

nfsd4_encode_operation() writes the op num into the buffer, then
saves the next 4-byte location for the op's status code. The next
xdr_reserve_space() call is part of GETATTR encoding, so the op num
gets copied back into the previous page, but the saved location for
the op status continues to point to the wrong spot in the current
XDR buffer page because __xdr_commit_encode() moved that data item.

After GETATTR encoding is complete, nfsd4_encode_operation() writes
the op status over the first XDR data item in the GETATTR result.
The NFS4_OK status code (0) makes it look like there are zero items
in the GETATTR's attribute bitmask.

The patch description of commit 2825a7f907 ("nfsd4: allow encoding
across page boundaries") [2014] remarks that NFSD "can't handle a
new operation starting close to the end of a page." This bug appears
to be one reason for that remark.

Reported-by: J David <j.david.lists@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
Tested-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10 23:37:41 -05:00
NeilBrown
fc8738c68d nfsd: add support for freeing unused session-DRC slots
Reducing the number of slots in the session slot table requires
confirmation from the client.  This patch adds reduce_session_slots()
which starts the process of getting confirmation, but never calls it.
That will come in a later patch.

Before we can free a slot we need to confirm that the client won't try
to use it again.  This involves returning a lower cr_maxrequests in a
SEQUENCE reply and then seeing a ca_maxrequests on the same slot which
is not larger than we limit we are trying to impose.  So for each slot
we need to remember that we have sent a reduced cr_maxrequests.

To achieve this we introduce a concept of request "generations".  Each
time we decide to reduce cr_maxrequests we increment the generation
number, and record this when we return the lower cr_maxrequests to the
client.  When a slot with the current generation reports a low
ca_maxrequests, we commit to that level and free extra slots.

We use an 16 bit generation number (64 seems wasteful) and if it cycles
we iterate all slots and reset the generation number to avoid false matches.

When we free a slot we store the seqid in the slot pointer so that it can
be restored when we reactivate the slot.  The RFC can be read as
suggesting that the slot number could restart from one after a slot is
retired and reactivated, but also suggests that retiring slots is not
required.  So when we reactive a slot we accept with the next seqid in
sequence, or 1.

When decoding sa_highest_slotid into maxslots we need to add 1 - this
matches how it is encoded for the reply.

se_dead is moved in struct nfsd4_session to remove a hole.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06 09:37:38 -05:00
Casey Schaufler
76ecf306ae lsm: use lsm_context in security_inode_getsecctx
Change the security_inode_getsecctx() interface to fill a lsm_context
structure instead of data and length pointers.  This provides
the information about which LSM created the context so that
security_release_secctx() can use the correct hook.

Cc: linux-nfs@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-12-04 14:58:09 -05:00
Casey Schaufler
6fba89813c lsm: ensure the correct LSM context releaser
Add a new lsm_context data structure to hold all the information about a
"security context", including the string, its size and which LSM allocated
the string. The allocation information is necessary because LSMs have
different policies regarding the lifecycle of these strings. SELinux
allocates and destroys them on each use, whereas Smack provides a pointer
to an entry in a list that never goes away.

Update security_release_secctx() to use the lsm_context instead of a
(char *, len) pair. Change its callers to do likewise.  The LSMs
supporting this hook have had comments added to remind the developer
that there is more work to be done.

The BPF security module provides all LSM hooks. While there has yet to
be a known instance of a BPF configuration that uses security contexts,
the possibility is real. In the existing implementation there is
potential for multiple frees in that case.

Cc: linux-integrity@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: linux-nfs@vger.kernel.org
Cc: Todd Kjos <tkjos@google.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-12-04 10:46:26 -05:00
Chuck Lever
30c1d2411a NFSD: Remove unused values from nfsd4_encode_components_esc()
Clean up. The computed value of @p is saved each time through the
loop but is never used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:23:02 -05:00
Chuck Lever
6b9c1080a6 NFSD: Remove unused results in nfsd4_encode_pathname4()
Clean up. The result of "*p++" is saved, but is not used before it
is overwritten. The result of xdr_encode_opaque() is saved each
time through the loop but is never used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:23:02 -05:00
Pali Rohár
bb4f07f240 nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT
Currently NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT do not bypass
only GSS, but bypass any method. This is a problem specially for NFS3
AUTH_NULL-only exports.

The purpose of NFSD_MAY_BYPASS_GSS_ON_ROOT is described in RFC 2623,
section 2.3.2, to allow mounting NFS2/3 GSS-only export without
authentication. So few procedures which do not expose security risk used
during mount time can be called also with AUTH_NONE or AUTH_SYS, to allow
client mount operation to finish successfully.

The problem with current implementation is that for AUTH_NULL-only exports,
the NFSD_MAY_BYPASS_GSS_ON_ROOT is active also for NFS3 AUTH_UNIX mount
attempts which confuse NFS3 clients, and make them think that AUTH_UNIX is
enabled and is working. Linux NFS3 client never switches from AUTH_UNIX to
AUTH_NONE on active mount, which makes the mount inaccessible.

Fix the NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT implementation
and really allow to bypass only exports which have enabled some real
authentication (GSS, TLS, or any other).

The result would be: For AUTH_NULL-only export if client attempts to do
mount with AUTH_UNIX flavor then it will receive access errors, which
instruct client that AUTH_UNIX flavor is not usable and will either try
other auth flavor (AUTH_NULL if enabled) or fails mount procedure.
Similarly if client attempt to do mount with AUTH_NULL flavor and only
AUTH_UNIX flavor is enabled then the client will receive access error.

This should fix problems with AUTH_NULL-only or AUTH_UNIX-only exports if
client attempts to mount it with other auth flavor (e.g. with AUTH_NULL for
AUTH_UNIX-only export, or with AUTH_UNIX for AUTH_NULL-only export).

Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:22:59 -05:00
Pali Rohár
600020927b nfsd: Fill NFSv4.1 server implementation fields in OP_EXCHANGE_ID response
NFSv4.1 OP_EXCHANGE_ID response from server may contain server
implementation details (domain, name and build time) in optional
nfs_impl_id4 field. Currently nfsd does not fill this field.

Send these information in NFSv4.1 OP_EXCHANGE_ID response. Fill them with
the same values as what is Linux NFSv4.1 client doing. Domain is hardcoded
to "kernel.org", name is composed in the same way as "uname -srvm" output
and build time is hardcoded to zeros.

NFSv4.1 client and server implementation fields are useful for statistic
purposes or for identifying type of clients and servers.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:22:58 -05:00
Jeff Layton
f6259e2e4f nfsd: have nfsd4_deleg_getattr_conflict pass back write deleg pointer
Currently we pass back the size and whether it has been modified, but
those just mirror values tracked inside the delegation. In a later
patch, we'll need to get at the timestamps in the delegation too, so
just pass back a reference to the write delegation, and use that to
properly override values in the iattr.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-11 13:42:07 -05:00
Jeff Layton
3a405432e7 nfsd: drop the nfsd4_fattr_args "size" field
We already have a slot for this in the kstat structure. Just overwrite
that instead of keeping a copy.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-11 13:42:07 -05:00
Jeff Layton
f67eef8da0 nfsd: drop inode parameter from nfsd4_change_attribute()
The inode that nfs4_open_delegation() passes to this function is
wrong, which throws off the result. The inode will end up getting a
directory-style change attr instead of a regular-file-style one.

Fix up nfs4_delegation_stat() to fetch STATX_MODE, and then drop the
inode parameter from nfsd4_change_attribute(), since it's no longer
needed.

Fixes: c5967721e1 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-11 13:42:06 -05:00
Chuck Lever
202f39039a NFSD: Fix NFSv4's PUTPUBFH operation
According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH.

Replace the XDR decoder for PUTPUBFH with a "noop" since we no
longer want the minorversion check, and PUTPUBFH has no arguments to
decode. (Ideally nfsd4_decode_noop should really be called
nfsd4_decode_void).

PUTPUBFH should now behave just like PUTROOTFH.

Reported-by: Cedric Blancher <cedric.blancher@gmail.com>
Fixes: e1a90ebd8b ("NFSD: Combine decode operations for v4 and v4.1")
Cc: Dan Shelton <dan.f.shelton@gmail.com>
Cc: Roland Mainz <roland.mainz@nrubsig.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20 19:31:03 -04:00
NeilBrown
438f81e0e9 nfsd: move error choice for incorrect object types to version-specific code.
If an NFS operation expects a particular sort of object (file, dir, link,
etc) but gets a file handle for a different sort of object, it must
return an error.  The actual error varies among NFS versions in non-trivial
ways.

For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only,
INVAL is suitable.

For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK
was found when not expected.  This take precedence over NOTDIR.

For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in
preference to EINVAL when none of the specific error codes apply.

When nfsd_mode_check() finds a symlink where it expected a directory it
needs to return an error code that can be converted to NOTDIR for v2 or
v3 but will be SYMLINK for v4.  It must be different from the error
code returns when it finds a symlink but expects a regular file - that
must be converted to EINVAL or SYMLINK.

So we introduce an internal error code nfserr_symlink_not_dir which each
version converts as appropriate.

nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is
only used by NFSv4 and only for OPEN.  NFSERR_INVAL is never a suitable
error if the object is the wrong time.  For v4.0 we use nfserr_symlink
for non-dirs even if not a symlink.  For v4.1 we have nfserr_wrong_type.
We handle this difference in-place in nfsd_check_obj_isreg() as there is
nothing to be gained by delaying the choice to nfsd4_map_status().

As a result of these changes, nfsd_mode_check() doesn't need an rqstp
arg any more.

Note that NFSv4 operations are actually performed in the xdr code(!!!)
so to the only place that we can map the status code successfully is in
nfsd4_encode_operation().

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20 19:31:03 -04:00
Jeff Layton
7e8ae8486e fs/nfsd: fix update of inode attrs in CB_GETATTR
Currently, we copy the mtime and ctime to the in-core inode and then
mark the inode dirty. This is fine for certain types of filesystems, but
not all. Some require a real setattr to properly change these values
(e.g. ceph or reexported NFS).

Fix this code to call notify_change() instead, which is the proper way
to effect a setattr. There is one problem though:

In this case, the client is holding a write delegation and has sent us
attributes to update our cache. We don't want to break the delegation
for this since that would defeat the purpose. Add a new ATTR_DELEG flag
that makes notify_change bypass the try_break_deleg call.

Fixes: c5967721e1 ("NFSD: handle GETATTR conflict with write delegation")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-08-26 19:04:00 -04:00
Jeff Layton
f58bab6fd4 nfsd: ensure that nfsd4_fattr_args.context is zeroed out
If nfsd4_encode_fattr4 ends up doing a "goto out" before we get to
checking for the security label, then args.context will be set to
uninitialized junk on the stack, which we'll then try to free.
Initialize it early.

Fixes: f59388a579 ("NFSD: Add nfsd4_encode_fattr4_sec_label()")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-08-22 14:49:10 -04:00
Dan Carpenter
dbc834e5db NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
These lengths come from xdr_stream_decode_u32() and so we should be a
bit careful with them.  Use size_add() and struct_size() to avoid
integer overflows.  Saving size_add()/struct_size() results to a u32 is
unsafe because it truncates away the high bits.

Also generally storing sizes in longs is safer.  Most systems these days
use 64 bit CPUs.  It's harder for an addition to overflow 64 bits than
it is to overflow 32 bits.  Also functions like vmalloc() can
successfully allocate UINT_MAX bytes, but nothing can allocate ULONG_MAX
bytes.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08 14:10:01 -04:00
Chuck Lever
cc63c21682 NFSD: Add COPY status code to OFFLOAD_STATUS response
Clients that send an OFFLOAD_STATUS might want to distinguish
between an async COPY operation that is still running, has
completed successfully, or that has failed.

The intention of this patch is to make NFSD behave like this:

 * Copy still running:
	OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied
	so far, and an empty osr_status array
 * Copy completed successfully:
	OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied,
	and an osr_status of NFS4_OK
 * Copy failed:
	OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied,
	and an osr_status other than NFS4_OK
 * Copy operation lost, canceled, or otherwise unrecognized:
	OFFLOAD_STATUS returns NFS4ERR_BAD_STATEID

NB: Though RFC 7862 Section 11.2 lists a small set of NFS status
codes that are valid for OFFLOAD_STATUS, there do not seem to be any
explicit spec limits on the status codes that may be returned in the
osr_status field.

At this time we have no unit tests for COPY and its brethren, as
pynfs does not yet implement support for NFSv4.2.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06 09:07:23 -04:00
Jeff Layton
33a1e6ea73 nfsd: trivial GET_DIR_DELEGATION support
This adds basic infrastructure for handing GET_DIR_DELEGATION calls from
clients, including the decoders and encoders. For now, it always just
returns NFS4_OK + GDD4_UNAVAIL.

Eventually clients may start sending this operation, and it's better if
we can return GDD4_UNAVAIL instead of having to abort the whole compound.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06 09:07:17 -04:00
Chuck Lever
18180a4550 NFSD: Fix nfsd4_encode_fattr4() crasher
Ensure that args.acl is initialized early. It is used in an
unconditional call to kfree() on the way out of
nfsd4_encode_fattr4().

Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 83ab8678ad ("NFSD: Add struct nfsd4_fattr_args")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-25 18:06:41 -04:00
Vasily Gorbik
f488138b52 NFSD: fix endianness issue in nfsd4_encode_fattr4
The nfs4 mount fails with EIO on 64-bit big endian architectures since
v6.7. The issue arises from employing a union in the nfsd4_encode_fattr4()
function to overlay a 32-bit array with a 64-bit values based bitmap,
which does not function as intended. Address the endianness issue by
utilizing bitmap_from_arr32() to copy 32-bit attribute masks into a
bitmap in an endianness-agnostic manner.

Cc: stable@vger.kernel.org
Fixes: fce7913b13 ("NFSD: Use a bitmask loop to encode FATTR4 results")
Link: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2060217
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-11 09:21:06 -04:00
Chuck Lever
9b350d3e34 NFSD: Clean up nfsd4_encode_replay()
Replace open-coded encoding logic with the use of conventional XDR
utility functions. Add a tracepoint to make replays observable in
field troubleshooting situations.

The WARN_ON is removed. A stack trace is of little use, as there is
only one call site for nfsd4_encode_replay(), and a buffer length
shortage here is unlikely.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-09 13:57:50 -05:00
Dai Ngo
c5967721e1 NFSD: handle GETATTR conflict with write delegation
If the GETATTR request on a file that has write delegation in effect
and the request attributes include the change info and size attribute
then the request is handled as below:

Server sends CB_GETATTR to client to get the latest change info and file
size. If these values are the same as the server's cached values then
the GETATTR proceeds as normal.

If either the change info or file size is different from the server's
cached values, or the file was already marked as modified, then:

    . update time_modify and time_metadata into file's metadata
      with current time

    . encode GETATTR as normal except the file size is encoded with
      the value returned from CB_GETATTR

    . mark the file as modified

If the CB_GETATTR fails for any reasons, the delegation is recalled
and NFS4ERR_DELAY is returned for the GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:32 -05:00
Jorge Mora
31e4bb8fb8 NFSD: fix LISTXATTRS returning more bytes than maxcount
The maxcount is the maximum number of bytes for the LISTXATTRS4resok
result. This includes the cookie and the count for the name array,
thus subtract 12 bytes from the maxcount: 8 (cookie) + 4 (array count)
when filling up the name array.

Fixes: 23e50fe3a5 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:07 -05:00
Jorge Mora
2f73f37d66 NFSD: fix LISTXATTRS returning a short list with eof=TRUE
If the XDR buffer is not large enough to fit all attributes
and the remaining bytes left in the XDR buffer (xdrleft) is
equal to the number of bytes for the current attribute, then
the loop will prematurely exit without setting eof to FALSE.
Also in this case, adding the eof flag to the buffer will
make the reply 4 bytes larger than lsxa_maxcount.

Need to check if there are enough bytes to fit not only the
next attribute name but also the eof as well.

Fixes: 23e50fe3a5 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:07 -05:00
Jorge Mora
61ab5e0758 NFSD: change LISTXATTRS cookie encoding to big-endian
Function nfsd4_listxattr_validate_cookie() expects the cookie
as an offset to the list thus it needs to be encoded in big-endian.

Fixes: 23e50fe3a5 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:06 -05:00
Jorge Mora
52a357db80 NFSD: fix nfsd4_listxattr_validate_cookie
If LISTXATTRS is sent with a correct cookie but a small maxcount,
this could lead function nfsd4_listxattr_validate_cookie to
return NFS4ERR_BAD_COOKIE. If maxcount = 20, then second check
on function gives RHS = 3 thus any cookie larger than 3 returns
NFS4ERR_BAD_COOKIE.

There is no need to validate the cookie on the return XDR buffer
since attribute referenced by cookie will be the first in the
return buffer.

Fixes: 23e50fe3a5 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:06 -05:00
Chuck Lever
a2c91753a4 NFSD: Modify NFSv4 to use nfsd_read_splice_ok()
Avoid the use of an atomic bitop, and prepare for adding a run-time
switch for using splice reads.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever
862bee84d7 NFSD: Revert 6c41d9a9bd
For some reason, the wait_on_bit() in nfsd4_deleg_getattr_conflict()
is waiting forever, preventing a clean server shutdown. The
requesting client might also hang waiting for a reply to the
conflicting GETATTR.

Invoking wait_on_bit() in an nfsd thread context is a hazard. The
correct fix is to replace this wait_on_bit() call site with a
mechanism that defers the conflicting GETATTR until the CB_GETATTR
completes or is known to have failed.

That will require some surgery and extended testing and it's late
in the v6.7-rc cycle, so I'm reverting now in favor of trying again
in a subsequent kernel release.

This is my fault: I should have recognized the ramifications of
calling wait_on_bit() in here before accepting this patch.

Thanks to Dai Ngo <dai.ngo@oracle.com> for diagnosing the issue.

Reported-by: Wolfgang Walter <linux-nfs@stwm.de>
Closes: https://lore.kernel.org/linux-nfs/e3d43ecdad554fbdcaa7181833834f78@stwm.de/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-18 11:22:16 -05:00