-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ3vs1AAKCRCRxhvAZXjc
omdqAP9Mn4HF85p5X7WRtUgrF7MGQft3EBfWE+sUxCMTc49NGQD/Ti7hqGNleEih
MmjUjLZSG1e3lFHYQm0nqmjO2RexbQ0=
=Li7D
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.13-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Relax assertions on failure to encode file handles
The ->encode_fh() method can fail for various reasons. None of them
warrant a WARN_ON().
- Fix overlayfs file handle encoding by allowing encoding an fid from
an inode without an alias
- Make sure fuse_dir_open() handles FOPEN_KEEP_CACHE. If it's not
specified fuse needs to invaludate the directory inode page cache
- Fix qnx6 so it builds with gcc-15
- Various fixes for netfslib and ceph and nfs filesystems:
- Ignore silly rename files from afs and nfs when building header
archives
- Fix read result collection in netfslib with multiple subrequests
- Handle ENOMEM for netfslib buffered reads
- Fix oops in nfs_netfs_init_request()
- Parse the secctx command immediately in cachefiles
- Remove a redundant smp_rmb() in netfslib
- Handle recursion in read retry in netfslib
- Fix clearing of folio_queue
- Fix missing cancellation of copy-to_cache when the cache for a
file is temporarly disabled in netfslib
- Sanity check the hfs root record
- Fix zero padding data issues in concurrent write scenarios
- Fix is_mnt_ns_file() after converting nsfs to path_from_stashed()
- Fix missing declaration of init_files
- Increase I/O priority when writing revoke records in jbd2
- Flush filesystem device before updating tail sequence in jbd2
* tag 'vfs-6.13-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
ovl: support encoding fid from inode with no alias
ovl: pass realinode to ovl_encode_real_fh() instead of realdentry
fuse: respect FOPEN_KEEP_CACHE on opendir
netfs: Fix is-caching check in read-retry
netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled
netfs: Fix ceph copy to cache on write-begin
netfs: Work around recursion by abandoning retry if nothing read
netfs: Fix missing barriers by using clear_and_wake_up_bit()
netfs: Remove redundant use of smp_rmb()
cachefiles: Parse the "secctx" immediately
nfs: Fix oops in nfs_netfs_init_request() when copying to cache
netfs: Fix enomem handling in buffered reads
netfs: Fix non-contiguous donation between completed reads
kheaders: Ignore silly-rename files
fs: relax assertions on failure to encode file handles
fs: fix missing declaration of init_files
fs: fix is_mnt_ns_file()
iomap: fix zero padding data issue in concurrent append writes
iomap: pass byte granular end position to iomap_add_to_ioend
jbd2: flush filesystem device before updating tail sequence
...
syzkaller reported recursion with a loop of three calls (netfs_rreq_assess,
netfs_retry_reads and netfs_rreq_terminated) hitting the limit of the stack
during an unbuffered or direct I/O read.
There are a number of issues:
(1) There is no limit on the number of retries.
(2) A subrequest is supposed to be abandoned if it does not transfer
anything (NETFS_SREQ_NO_PROGRESS), but that isn't checked under all
circumstances.
(3) The actual root cause, which is this:
if (atomic_dec_and_test(&rreq->nr_outstanding))
netfs_rreq_terminated(rreq, ...);
When we do a retry, we bump the rreq->nr_outstanding counter to
prevent the final cleanup phase running before we've finished
dispatching the retries. The problem is if we hit 0, we have to do
the cleanup phase - but we're in the cleanup phase and end up
repeating the retry cycle, hence the recursion.
Work around the problem by limiting the number of retries. This is based
on Lizhi Xu's patch[1], and makes the following changes:
(1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make
the filesystem set it if it managed to read or write at least one byte
of data. Clear this bit before issuing a subrequest.
(2) Add a ->retry_count member to the subrequest and increment it any time
we do a retry.
(3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with
->retry_count. If the latter is non-zero, we're doing a retry.
(4) Abandon a subrequest if retry_count is non-zero and we made no
progress.
(5) Use ->retry_count in both the write-side and the read-size.
[?] Question: Should I set a hard limit on retry_count in both read and
write? Say it hits 50, we always abandon it. The problem is that
these changes only mitigate the issue. As long as it made at least one
byte of progress, the recursion is still an issue. This patch
mitigates the problem, but does not fix the underlying cause. I have
patches that will do that, but it's an intrusive fix that's currently
pending for the next merge window.
The oops generated by KASAN looks something like:
BUG: TASK stack guard page was hit at ffffc9000482ff48 (stack is ffffc90004830000..ffffc90004838000)
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN NOPTI
...
RIP: 0010:mark_lock+0x25/0xc60 kernel/locking/lockdep.c:4686
...
mark_usage kernel/locking/lockdep.c:4646 [inline]
__lock_acquire+0x906/0x3ce0 kernel/locking/lockdep.c:5156
lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5825
local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
___slab_alloc+0x123/0x1880 mm/slub.c:3695
__slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
kmem_cache_alloc_noprof+0x2a7/0x2f0 mm/slub.c:4141
radix_tree_node_alloc.constprop.0+0x1e8/0x350 lib/radix-tree.c:253
idr_get_free+0x528/0xa40 lib/radix-tree.c:1506
idr_alloc_u32+0x191/0x2f0 lib/idr.c:46
idr_alloc+0xc1/0x130 lib/idr.c:87
p9_tag_alloc+0x394/0x870 net/9p/client.c:321
p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644
p9_client_zc_rpc.constprop.0+0x105/0x880 net/9p/client.c:793
p9_client_read_once+0x443/0x820 net/9p/client.c:1570
p9_client_read+0x13f/0x1b0 net/9p/client.c:1534
v9fs_issue_read+0x115/0x310 fs/9p/vfs_addr.c:74
netfs_retry_read_subrequests fs/netfs/read_retry.c:60 [inline]
netfs_retry_reads+0x153a/0x1d00 fs/netfs/read_retry.c:232
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
...
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_dispatch_unbuffered_reads fs/netfs/direct_read.c:103 [inline]
netfs_unbuffered_read fs/netfs/direct_read.c:127 [inline]
netfs_unbuffered_read_iter_locked+0x12f6/0x19b0 fs/netfs/direct_read.c:221
netfs_unbuffered_read_iter+0xc5/0x100 fs/netfs/direct_read.c:256
v9fs_file_read_iter+0xbf/0x100 fs/9p/vfs_file.c:361
do_iter_readv_writev+0x614/0x7f0 fs/read_write.c:832
vfs_readv+0x4cf/0x890 fs/read_write.c:1025
do_preadv fs/read_write.c:1142 [inline]
__do_sys_preadv fs/read_write.c:1192 [inline]
__se_sys_preadv fs/read_write.c:1187 [inline]
__x64_sys_preadv+0x22d/0x310 fs/read_write.c:1187
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
Fixes: ee4cdf7ba8 ("netfs: Speed up buffered reading")
Closes: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241108034020.3695718-1-lizhi.xu@windriver.com/ [1]
Link: https://lore.kernel.org/r/20241213135013.2964079-9-dhowells@redhat.com
Tested-by: syzbot+885c03ad650731743489@syzkaller.appspotmail.com
Suggested-by: Lizhi Xu <lizhi.xu@windriver.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Reported-by: syzbot+885c03ad650731743489@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
With recent netfs apis changes, the bytes written
value was not getting updated in /proc/fs/cifs/Stats.
Fix this by updating tcon->bytes in write operations.
Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Replace default hardcoded value for cifsAttrs with ATTR_ARCHIVE macro
Use SMB2_LEASE_KEY_SIZE macro for leasekey size in smb2_lease_break
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
cifs_tree_connect() no longer uses ioctl, so allow sessions to be
reconnected when sending ioctls.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We can access local_nls directly from @tcon->ses, so there is no need
to pass it as parameter in cifs_tree_connect().
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
smb2_set_next_command() no longer squashes request iovs into a single
iov, so the bounds check can be dropped.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The NT ACL format for an SMB3 POSIX Extensions chmod() is a single ACE with the
magic S-1-5-88-3-mode SID:
NT Security Descriptor
Revision: 1
Type: 0x8004, Self Relative, DACL Present
Offset to owner SID: 56
Offset to group SID: 124
Offset to SACL: 0
Offset to DACL: 20
Owner: S-1-5-21-3177838999-3893657415-1037673384-1000
Group: S-1-22-2-1000
NT User (DACL) ACL
Revision: NT4 (2)
Size: 36
Num ACEs: 1
NT ACE: S-1-5-88-3-438, flags 0x00, Access Allowed, mask 0x00000000
Type: Access Allowed
NT ACE Flags: 0x00
Size: 28
Access required: 0x00000000
SID: S-1-5-88-3-438
Owner and Group should be NULL, but the server is not required to fail the
request if they are present.
Signed-off-by: Ralph Boehme <slow@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
When using encryption, either enforced by the server or when using
'seal' mount option, the client will squash all compound request buffers
down for encryption into a single iov in smb2_set_next_command().
SMB2_ioctl_init() allocates a small buffer (448 bytes) to hold the
SMB2_IOCTL request in the first iov, and if the user passes an input
buffer that is greater than 328 bytes, smb2_set_next_command() will
end up writing off the end of @rqst->iov[0].iov_base as shown below:
mount.cifs //srv/share /mnt -o ...,seal
ln -s $(perl -e "print('a')for 1..1024") /mnt/link
BUG: KASAN: slab-out-of-bounds in
smb2_set_next_command.cold+0x1d6/0x24c [cifs]
Write of size 4116 at addr ffff8881148fcab8 by task ln/859
CPU: 1 UID: 0 PID: 859 Comm: ln Not tainted 6.12.0-rc3 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-2.fc40 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
? smb2_set_next_command.cold+0x1d6/0x24c [cifs]
print_report+0x156/0x4d9
? smb2_set_next_command.cold+0x1d6/0x24c [cifs]
? __virt_addr_valid+0x145/0x310
? __phys_addr+0x46/0x90
? smb2_set_next_command.cold+0x1d6/0x24c [cifs]
kasan_report+0xda/0x110
? smb2_set_next_command.cold+0x1d6/0x24c [cifs]
kasan_check_range+0x10f/0x1f0
__asan_memcpy+0x3c/0x60
smb2_set_next_command.cold+0x1d6/0x24c [cifs]
smb2_compound_op+0x238c/0x3840 [cifs]
? kasan_save_track+0x14/0x30
? kasan_save_free_info+0x3b/0x70
? vfs_symlink+0x1a1/0x2c0
? do_symlinkat+0x108/0x1c0
? __pfx_smb2_compound_op+0x10/0x10 [cifs]
? kmem_cache_free+0x118/0x3e0
? cifs_get_writable_path+0xeb/0x1a0 [cifs]
smb2_get_reparse_inode+0x423/0x540 [cifs]
? __pfx_smb2_get_reparse_inode+0x10/0x10 [cifs]
? rcu_is_watching+0x20/0x50
? __kmalloc_noprof+0x37c/0x480
? smb2_create_reparse_symlink+0x257/0x490 [cifs]
? smb2_create_reparse_symlink+0x38f/0x490 [cifs]
smb2_create_reparse_symlink+0x38f/0x490 [cifs]
? __pfx_smb2_create_reparse_symlink+0x10/0x10 [cifs]
? find_held_lock+0x8a/0xa0
? hlock_class+0x32/0xb0
? __build_path_from_dentry_optional_prefix+0x19d/0x2e0 [cifs]
cifs_symlink+0x24f/0x960 [cifs]
? __pfx_make_vfsuid+0x10/0x10
? __pfx_cifs_symlink+0x10/0x10 [cifs]
? make_vfsgid+0x6b/0xc0
? generic_permission+0x96/0x2d0
vfs_symlink+0x1a1/0x2c0
do_symlinkat+0x108/0x1c0
? __pfx_do_symlinkat+0x10/0x10
? strncpy_from_user+0xaa/0x160
__x64_sys_symlinkat+0xb9/0xf0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f08d75c13bb
Reported-by: David Howells <dhowells@redhat.com>
Fixes: e77fe73c7e ("cifs: we can not use small padding iovs together with encryption")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
For extra channels, point ->secmech.{enc,dec} to the primary
server ones.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make the write RPC tracepoints use the same trace macro complexes as the
read tracepoints and display the netfs request and subrequest IDs where
available (see commit 519be98971 "cifs: Add a tracepoint to track credits
involved in R/W requests").
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmbpAwkACgkQiiy9cAdy
T1FJhgv+PX+IIGyNNW0I3f3ZzIWqc1DCwxXHCa3gvr7TKimJ71AGbEdzFZZzl3AJ
CdxSLf2NQ6tBUxl65QuMC7XykqQXKvNnQEDPoQcHfFgTtYJi+zng1dDvvXSfFbWW
m2Hql1w6MNFeKlFBavbA6MI94MnZqE5J/yCtWqw3LvEn4l2JwYrAzS5Lw9qjtcER
DmlOsrEFgpsFhhpnyPZXJxaWKZIDG2OuG61LWkqyhvLOTtuFuc9cEsTWPdeRYAT6
KKh5z58wqG2JG0IkVjG1foBclv0zcZgUzqOr2/tzbabYye991kLnUitaTwd+u8xS
pTbVIw1E91sFEqVsr2IpnLUq68MKaahlNfHkNJD0dqaMKfGOujqtNRFw82Yki4w5
aTosgECyUiGKgwuE8HLtwlJaE4EizVdrqQiP2cUOrtuWPvOvnY7vjWKC8kmSM0Z/
u0ov6JdirVlnFE3dlS0i6ywKaolsrrPYUTbv4ihjQiGHtm+VjonH8VYsdg8sUV0e
5/+cyqaF
=B6Et
-----END PGP SIGNATURE-----
Merge tag 'v6.12-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- cleanups (moving duplicated code, removing unused code etc)
- fixes relating to "sfu" mount options (for better handling special
file types)
- SMB3.1.1 compression fixes/improvements
* tag 'v6.12-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
smb: client: fix compression heuristic functions
cifs: Update SFU comments about fifos and sockets
cifs: Add support for creating SFU symlinks
smb: use LIST_HEAD() to simplify code
cifs: Recognize SFU socket type
cifs: Show debug message when SFU Fifo type was detected
cifs: Put explicit zero byte into SFU block/char types
cifs: Add support for reading SFU symlink location
cifs: Fix recognizing SFU symlinks
smb: client: compress: fix an "illegal accesses" issue
smb: client: compress: fix a potential issue of freeing an invalid pointer
smb: client: compress: LZ77 code improvements cleanup
smb: client: insert compression check/call on write requests
smb3: mark compression as CONFIG_EXPERIMENTAL and fix missing compression operation
cifs: Remove obsoleted declaration for cifs_dir_open
smb: client: Use min() macro
cifs: convert to use ERR_CAST()
smb: add comment to STATUS_MCA_OCCURED
smb: move SMB2 Status code to common header file
smb: move some duplicate definitions to common/smbacl.h
...
Fix an upstream merge resolution issue[1]. The NETFS_SREQ_HIT_EOF flag,
and code to set it, got added via two different paths. The original path
saw it added in the netfslib read improvements[2], but it was also added,
and slightly differently, in a fix that was committed before v6.11:
1da29f2c39
netfs, cifs: Fix handling of short DIO read
However, the code added to smb2_readv_callback() to set the flag in didn't
get removed when the netfs read improvements series was rebased to take
account of the cifs fixes. The proposed merge resolution[2] deleted it
rather than rebase the patches.
Fix this by removing the redundant lines. Code to set the bit that derives
from the fix patch is still there, a few lines above in the source.
Fixes: 35219bc5c7 ("Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/CAHk-=wjr8fxk20-wx=63mZruW1LTvBvAKya1GQ1EhyzXb-okMA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-fsdevel/20240913-vfs-netfs-39ef6f974061@brauner/ [2]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On smb2_async_writev(), set CIFS_COMPRESS_REQ on request flags if
should_compress() returns true.
On smb_send_rqst() check the flags, and compress and send the request to
the server.
(*) If the compression fails with -EMSGSIZE (i.e. compressed size is >=
uncompressed size), the original uncompressed request is sent instead.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
There are only 4 different definitions between the client and server:
- STATUS_SERVER_UNAVAILABLE: from client/smb2status.h
- STATUS_FILE_NOT_AVAILABLE: from client/smb2status.h
- STATUS_NO_PREAUTH_INTEGRITY_HASH_OVERLAP: from server/smbstatus.h
- STATUS_INVALID_LOCK_RANGE: from server/smbstatus.h
Rename client/smb2status.h to common/smb2status.h, and merge the
2 different definitions of server to common header file.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Preparation for moving acl definitions to new common header file.
Use the following shell command to rename:
find fs/smb/client -type f -exec sed -i \
's/struct cifs_ace/struct smb_ace/g' {} +
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Preparation for moving acl definitions to new common header file.
Use the following shell command to rename:
find fs/smb/client -type f -exec sed -i \
's/struct cifs_sid/struct smb_sid/g' {} +
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Preparation for moving acl definitions to new common header file.
Use the following shell command to rename:
find fs/smb/client -type f -exec sed -i \
's/struct cifs_ntsd/struct smb_ntsd/g' {} +
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Improve the efficiency of buffered reads in a number of ways:
(1) Overhaul the algorithm in general so that it's a lot more compact and
split the read submission code between buffered and unbuffered
versions. The unbuffered version can be vastly simplified.
(2) Read-result collection is handed off to a work queue rather than being
done in the I/O thread. Multiple subrequests can be processes
simultaneously.
(3) When a subrequest is collected, any folios it fully spans are
collected and "spare" data on either side is donated to either the
previous or the next subrequest in the sequence.
Notes:
(*) Readahead expansion is massively slows down fio, presumably because it
causes a load of extra allocations, both folio and xarray, up front
before RPC requests can be transmitted.
(*) RDMA with cifs does appear to work, both with SIW and RXE.
(*) PG_private_2-based reading and copy-to-cache is split out into its own
file and altered to use folio_queue. Note that the copy to the cache
now creates a new write transaction against the cache and adds the
folios to be copied into it. This allows it to use part of the
writeback I/O code.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
Short DIO reads, particularly in relation to cifs, are not being handled
correctly by cifs and netfslib. This can be tested by doing a DIO read of
a file where the size of read is larger than the size of the file. When it
crosses the EOF, it gets a short read and this gets retried, and in the
case of cifs, the retry read fails, with the failure being translated to
ENODATA.
Fix this by the following means:
(1) Add a flag, NETFS_SREQ_HIT_EOF, for the filesystem to set when it
detects that the read did hit the EOF.
(2) Make the netfslib read assessment stop processing subrequests when it
encounters one with that flag set.
(3) Return rreq->transferred, the accumulated contiguous amount read to
that point, to userspace for a DIO read.
(4) Make cifs set the flag and clear the error if the read RPC returned
ENODATA.
(5) Make cifs set the flag and clear the error if a short read occurred
without error and the read-to file position is now at the remote inode
size.
Fixes: 69c3c023af ("cifs: Implement netfslib hooks")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
When netfslib asks cifs to issue a read operation, it prefaces this with a
call to ->clamp_length() which cifs uses to negotiate credits, providing
receive capacity on the server; however, in the event that a read op needs
reissuing, netfslib doesn't call ->clamp_length() again as that could
shorten the subrequest, leaving a gap.
This causes the retried read to be done with zero credits which causes the
server to reject it with STATUS_INVALID_PARAMETER. This is a problem for a
DIO read that is requested that would go over the EOF. The short read will
be retried, causing EINVAL to be returned to the user when it fails.
Fix this by making cifs_req_issue_read() negotiate new credits if retrying
(NETFS_SREQ_RETRYING now gets set in the read side as well as the write
side in this instance).
This isn't sufficient, however: the new credits might not be sufficient to
complete the remainder of the read, so also add an additional field,
rreq->actual_len, that holds the actual size of the op we want to perform
without having to alter subreq->len.
We then rely on repeated short reads being retried until we finish the read
or reach the end of file and make a zero-length read.
Also fix a couple of places where the subrequest start and length need to
be altered by the amount so far transferred when being used.
Fixes: 69c3c023af ("cifs: Implement netfslib hooks")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
rqst.rq_iter needs to be truncated otherwise we'll
also send the bytes into the stream socket...
This is the logic behind rqst.rq_npages = 0, which was removed in
"cifs: Change the I/O paths to use an iterator rather than a page list"
(d08089f649).
Cc: stable@vger.kernel.org
Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
This happens when called from SMB2_read() while using rdma
and reaching the rdma_readwrite_threshold.
Cc: stable@vger.kernel.org
Fixes: a6559cc1d3 ("cifs: split out smb3_use_rdma_offload() helper")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Setting encryption as required in security flags was broken.
For example (to require all mounts to be encrypted by setting):
"echo 0x400c5 > /proc/fs/cifs/SecurityFlags"
Would return "Invalid argument" and log "Unsupported security flags"
This patch fixes that (e.g. allowing overriding the default for
SecurityFlags 0x00c5, including 0x40000 to require seal, ie
SMB3.1.1 encryption) so now that works and forces encryption
on subsequent mounts.
Acked-by: Bharath SM <bharathsm@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
There are cases where services need to remount (or change their
credentials files) when keys have expired, but it can be helpful
to have a dynamic trace point to make it easier to notify the
service to refresh the storage account key.
Here is sample output, one from mount with bad password, one
from a reconnect where the password has been changed or expired
and reconnect fails (requiring remount with new storage account key)
TASK-PID CPU# ||||| TIMESTAMP FUNCTION
| | | ||||| | |
mount.cifs-11362 [000] ..... 6000.241620: smb3_key_expired:
rc=-13 user=testpassu conn_id=0x2 server=localhost addr=127.0.0.1:445
kworker/4:0-8458 [004] ..... 6044.892283: smb3_key_expired:
rc=-13 user=testpassu conn_id=0x3 server=localhost addr=127.0.0.1:445
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add a tracepoint to track the credit changes and server in_flight value
involved in the lifetime of a R/W request, logging it against the
request/subreq debugging ID. This requires the debugging IDs to be
recorded in the cifs_credits struct.
The tracepoint can be enabled with:
echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable
Also add a three-state flag to struct cifs_credits to note if we're
interested in determining when the in_flight contribution ends and, if so,
to track whether we've decremented the contribution yet.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
When a subrequest is marked for needing retry, netfs will call
cifs_prepare_write() which will make cifs repick the server for the op
before renegotiating credits; it then calls cifs_issue_write() which
invokes smb2_async_writev() - which re-repicks the server.
If a different server is then selected, this causes the increment of
server->in_flight to happen against one record and the decrement to happen
against another, leading to misaccounting.
Fix this by just removing the repick code in smb2_async_writev(). As this
is only called from netfslib-driven code, cifs_prepare_write() should
always have been called first, and so server should never be NULL and the
preparatory step is repeated in the event that we do a retry.
The problem manifests as a warning looking something like:
WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs]
...
RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs]
...
smb2_writev_callback+0x334/0x560 [cifs]
cifs_demultiplex_thread+0x77a/0x11b0 [cifs]
kthread+0x187/0x1d0
ret_from_fork+0x34/0x60
ret_from_fork_asm+0x1a/0x30
Which may be triggered by a number of different xfstests running against an
Azure server in multichannel mode. generic/249 seems the most repeatable,
but generic/215, generic/249 and generic/308 may also show it.
Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Cc: stable@vger.kernel.org
Reported-by: Steve French <smfrench@gmail.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Aurelien Aptel <aaptel@suse.com>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Move the reference pid from the cifs_io_subrequest struct to the
cifs_io_request struct as it's the same for all subreqs of a particular
request.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Defer read completion from the I/O thread to the cifsiod thread so as not
to slow down the I/O thread. This restores the behaviour of v6.9.
Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
There's now no need to make sure subreq->io_iter is advanced to match
subreq->transferred before calling one of the netfs subrequest termination
functions as the check has been removed netfslib and the iterator is reset
prior to retrying a subreq.
Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Make the cifs filesystem use netfslib to handle reading and writing on
behalf of cifs. The changes include:
(1) Various read_iter/write_iter type functions are turned into wrappers
around netfslib API functions or are pointed directly at those
functions:
cifs_file_direct{,_nobrl}_ops switch to use
netfs_unbuffered_read_iter and netfs_unbuffered_write_iter.
Large pieces of code that will be removed are #if'd out and will be removed
in subsequent patches.
[?] Why does cifs mark the page dirty in the destination buffer of a DIO
read? Should that happen automatically? Does netfs need to do that?
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Replace the 'replay' bool in cifs_writedata (now cifs_io_subrequest) with a
flag in the netfs_io_subrequest flags.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Use more fields from netfs_io_subrequest instead of those incorporated into
cifs_io_subrequest from cifs_readdata and cifs_writedata.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Replace the cifs_writedata struct with the same wrapper around
netfs_io_subrequest that was used to replace cifs_readdata.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Netfslib has a facility whereby the allocation for netfs_io_subrequest can
be increased to so that filesystem-specific data can be tagged on the end.
Prepare to use this by making a struct, cifs_io_subrequest, that wraps
netfs_io_subrequest, and absorb struct cifs_readdata into it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Add tracing for the refcounting/lifecycle of the cifs_tcon struct, marking
different events with different labels and giving each tcon its own debug
ID so that the tracelines corresponding to individual tcons can be
distinguished. This can be enabled with:
echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_tcon_ref/enable
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
There are various use cases that are becoming more common in which password
changes are scheduled on a server(s) periodically but the clients connected
to this server need to stay connected (even in the face of brief network
reconnects) due to mounts which can not be easily unmounted and mounted at
will, and servers that do password rotation do not always have the ability
to tell the clients exactly when to the new password will be effective,
so add support for an alt password ("password2=") on mount (and also
remount) so that we can anticipate the upcoming change to the server
without risking breaking existing mounts.
An alternative would have been to use the kernel keyring for this but the
processes doing the reconnect do not have access to the keyring but do
have access to the ses structure.
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In the current implementation, CIFS close sends a close to the
server and does not check for the success of the server close.
This patch adds functionality to check for server close return
status and retries in case of an EBUSY or EAGAIN error.
This can help avoid handle leaks
Cc: stable@vger.kernel.org
Signed-off-by: Ritvik Budhiraja <rbudhiraja@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Some code paths for querying server interfaces make a false
assumption that it will only get called for SMB3+. Since this
function now can get called from a generic code paths, the correct
thing to do is to have specific handler for this functionality
per SMB dialect, and call this handler.
This change adds such a handler and implements this handler only
for SMB 3.0 and 3.1.1.
Cc: stable@vger.kernel.org
Cc: Jan Čermák <sairon@sairon.cz>
Reported-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Change "compress=" mount option to a boolean flag, that, if set,
will enable negotiating compression algorithms with the server.
Do not de/compress anything for now.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add support for creating special files via WSL reparse points when
using 'reparse=wsl' mount option. They're faster than NFS reparse
points because they don't require extra roundtrips to figure out what
->d_type a specific dirent is as such information is already stored in
query dir responses and then making getdents() calls faster.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
There are cases where a session is disconnected and password has changed
on the server (or expired) for this user and this currently can not
be fixed without unmount and mounting again. This patch allows
remount to change the password (for the non Kerberos case, Kerberos
ticket refresh is handled differently) when the session is disconnected
and the user can not reconnect due to still using old password.
Future patches should also allow us to setup the keyring (cifscreds)
to have an "alternate password" so we would be able to change
the password before the session drops (without the risk of races
between when the password changes and the disconnect occurs -
ie cases where the old password is still needed because the new
password has not fully rolled out to all servers yet).
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
File open requests made to the server contain a
CreateGuid, which is used by the server to identify
the open request. If the same request needs to be
replayed, it needs to be sent with the same CreateGuid
in the durable handle v2 context.
Without doing so, we could end up leaking handles on
the server when:
1. multichannel is used AND
2. connection goes down, but not for all channels
This is because the replayed open request would have a
new CreateGuid and the server will treat this as a new
request and open a new handle.
This change fixes this by reusing the existing create_guid
stored in the cached fid struct.
REF: MS-SMB2 4.9 Replay Create Request on an Alternate Channel
Fixes: 4f1fffa237 ("cifs: commands that are retried should have replay flag set")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Send query dir requests with an info level of
SMB_FIND_FILE_FULL_DIRECTORY_INFO rather than
SMB_FIND_FILE_DIRECTORY_INFO when the client is generating its own
inode numbers (e.g. noserverino) so that reparse tags still
can be parsed directly from the responses, but server won't
send UniqueId (server inode number)
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In order to scale down the channels, the following sequence
of operations happen:
1. server struct is marked for terminate
2. the channel is deallocated in the ses->chans array
3. at a later point the cifsd thread actually terminates the server
Between 2 and 3, there can be calls to find the channel for
a server struct. When that happens, there can be an ugly warning
that's logged. But this is expected.
So this change does two things:
1. in cifs_ses_get_chan_index, if server->terminate is set, return
2. always make sure server->terminate is set with chan_lock held
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When the server reports query network interface info call
as unsupported following a tree connect, it means that
multichannel is unsupported, even if the server capabilities
report otherwise.
When this happens, cifs_chan_skip_or_disable is called to
disable multichannel on the client. However, we only need
to call this when multichannel is currently setup.
Fixes: f591062bdb ("cifs: handle servers that still advertise multichannel after disabling")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>