Commit graph

92 commits

Author SHA1 Message Date
Linus Torvalds
7031769e10 vfs-6.17-rc1.mmap_prepare
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc
 os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO
 pcYE40CacaekD8rFWwYUyszpgmv6ewc=
 =wCwp
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull mmap_prepare updates from Christian Brauner:
 "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b ("mm:
  introduce new .mmap_prepare() file callback").

  This is preferred to the existing f_op->mmap() hook as it does require
  a VMA to be established yet, thus allowing the mmap logic to invoke
  this hook far, far earlier, prior to inserting a VMA into the virtual
  address space, or performing any other heavy handed operations.

  This allows for much simpler unwinding on error, and for there to be a
  single attempt at merging a VMA rather than having to possibly
  reattempt a merge based on potentially altered VMA state.

  Far more importantly, it prevents inappropriate manipulation of
  incompletely initialised VMA state, which is something that has been
  the cause of bugs and complexity in the past.

  The intent is to gradually deprecate f_op->mmap, and in that vein this
  series coverts the majority of file systems to using f_op->mmap_prepare.

  Prerequisite steps are taken - firstly ensuring all checks for mmap
  capabilities use the file_has_valid_mmap_hooks() helper rather than
  directly checking for f_op->mmap (which is now not a valid check) and
  secondly updating daxdev_mapping_supported() to not require a VMA
  parameter to allow ext4 and xfs to be converted.

  Commit bb666b7c27 ("mm: add mmap_prepare() compatibility layer for
  nested file systems") handles the nasty edge-case of nested file
  systems like overlayfs, which introduces a compatibility shim to allow
  f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.

  This allows for nested filesystems to continue to function correctly
  with all file systems regardless of which callback is used. Once we
  finally convert all file systems, this shim can be removed.

  As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
  can nest all other file systems.

  We additionally do not update resctl - as this requires an update to
  remap_pfn_range() (or an alternative to it) which we defer to a later
  series, equally we do not update cramfs which needs a mixed mapping
  insertion with the same issue, nor do we update procfs, hugetlbfs,
  syfs or kernfs all of which require VMAs for internal state and hooks.
  We shall return to all of these later"

* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: update porting, vfs documentation to describe mmap_prepare()
  fs: replace mmap hook with .mmap_prepare for simple mappings
  fs: convert most other generic_file_*mmap() users to .mmap_prepare()
  fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
  mm/filemap: introduce generic_file_*_mmap_prepare() helpers
  fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
  fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
  fs/dax: make it possible to check dev dax support without a VMA
  fs: consistently use can_mmap_file() helper
  mm/nommu: use file_has_valid_mmap_hooks() helper
  mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28 13:43:25 -07:00
Wang Zhaolong
705c79101c smb: client: fix use-after-free in cifs_oplock_break
A race condition can occur in cifs_oplock_break() leading to a
use-after-free of the cinode structure when unmounting:

  cifs_oplock_break()
    _cifsFileInfo_put(cfile)
      cifsFileInfo_put_final()
        cifs_sb_deactive()
          [last ref, start releasing sb]
            kill_sb()
              kill_anon_super()
                generic_shutdown_super()
                  evict_inodes()
                    dispose_list()
                      evict()
                        destroy_inode()
                          call_rcu(&inode->i_rcu, i_callback)
    spin_lock(&cinode->open_file_lock)  <- OK
                            [later] i_callback()
                              cifs_free_inode()
                                kmem_cache_free(cinode)
    spin_unlock(&cinode->open_file_lock)  <- UAF
    cifs_done_oplock_break(cinode)       <- UAF

The issue occurs when umount has already released its reference to the
superblock. When _cifsFileInfo_put() calls cifs_sb_deactive(), this
releases the last reference, triggering the immediate cleanup of all
inodes under RCU. However, cifs_oplock_break() continues to access the
cinode after this point, resulting in use-after-free.

Fix this by holding an extra reference to the superblock during the
entire oplock break operation. This ensures that the superblock and
its inodes remain valid until the oplock break completes.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=220309
Fixes: b98749cac4 ("CIFS: keep FileInfo handle live during oplock break")
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-07-13 17:16:29 -05:00
David Howells
2c4fd3d141 cifs: Fix prepare_write to negotiate wsize if needed
Fix cifs_prepare_write() to negotiate the wsize if it is unset.

Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-21 11:03:24 -05:00
Lorenzo Stoakes
9d5403b103
fs: convert most other generic_file_*mmap() users to .mmap_prepare()
Update nearly all generic_file_mmap() and generic_file_readonly_mmap()
callers to use generic_file_mmap_prepare() and
generic_file_readonly_mmap_prepare() respectively.

We update blkdev, 9p, afs, erofs, ext2, nfs, ntfs3, smb, ubifs and vboxsf
file systems this way.

Remaining users we cannot yet update are ecryptfs, fuse and cramfs. The
former two are nested file systems that must support any underlying file
ssytem, and cramfs inserts a mixed mapping which currently requires a VMA.

Once all file systems have been converted to mmap_prepare(), we can then
update nested file systems.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/08db85970d89b17a995d2cffae96fb4cc462377f.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-19 13:56:57 +02:00
Paulo Alcantara
b64af6bcd3 smb: client: fix perf regression with deferred closes
Customer reported that one of their applications started failing to
open files with STATUS_INSUFFICIENT_RESOURCES due to NetApp server
hitting the maximum number of opens to same file that it would allow
for a single client connection.

It turned out the client was failing to reuse open handles with
deferred closes because matching ->f_flags directly without masking
off O_CREAT|O_EXCL|O_TRUNC bits first broke the comparision and then
client ended up with thousands of deferred closes to same file.  Those
bits are already satisfied on the original open, so no need to check
them against existing open handles.

Reproducer:

 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 #include <fcntl.h>
 #include <pthread.h>

 #define NR_THREADS      4
 #define NR_ITERATIONS   2500
 #define TEST_FILE       "/mnt/1/test/dir/foo"

 static char buf[64];

 static void *worker(void *arg)
 {
         int i, j;
         int fd;

         for (i = 0; i < NR_ITERATIONS; i++) {
                 fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_APPEND, 0666);
                 for (j = 0; j < 16; j++)
                         write(fd, buf, sizeof(buf));
                 close(fd);
         }
 }

 int main(int argc, char *argv[])
 {
         pthread_t t[NR_THREADS];
         int fd;
         int i;

         fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_TRUNC, 0666);
         close(fd);
         memset(buf, 'a', sizeof(buf));
         for (i = 0; i < NR_THREADS; i++)
                 pthread_create(&t[i], NULL, worker, NULL);
         for (i = 0; i < NR_THREADS; i++)
                 pthread_join(t[i], NULL);
         return 0;
 }

Before patch:

$ mount.cifs //srv/share /mnt/1 -o ...
$ mkdir -p /mnt/1/test/dir
$ gcc repro.c && ./a.out
...
number of opens: 1391

After patch:

$ mount.cifs //srv/share /mnt/1 -o ...
$ mkdir -p /mnt/1/test/dir
$ gcc repro.c && ./a.out
...
number of opens: 1

Cc: linux-cifs@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: Jay Shin <jaeshin@redhat.com>
Cc: Pierguido Lambri <plambri@redhat.com>
Fixes: b8ea3b1ff5 ("smb: enable reuse of deferred file handles for write operations")
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-12 12:42:48 -05:00
Linus Torvalds
0fb34422b5 vfs-6.16-rc1.netfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPUAAKCRCRxhvAZXjc
 ouMEAQCrviYPG/WMtPTH7nBIbfVQTfNEXt/TvN7u7OjXb+RwRAEAwe9tLy4GrS/t
 GuvUPWAthbhs77LTvxj6m3Gf49BOVgQ=
 =6FqN
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull netfs updates from Christian Brauner:

 - The main API document has been extensively updated/rewritten

 - Fix an oops in write-retry due to mis-resetting the I/O iterator

 - Fix the recording of transferred bytes for short DIO reads

 - Fix a request's work item to not require a reference, thereby
   avoiding the need to get rid of it in BH/IRQ context

 - Fix waiting and waking to be consistent about the waitqueue used

 - Remove NETFS_SREQ_SEEK_DATA_READ, NETFS_INVALID_WRITE,
   NETFS_ICTX_WRITETHROUGH, NETFS_READ_HOLE_CLEAR,
   NETFS_RREQ_DONT_UNLOCK_FOLIOS, and NETFS_RREQ_BLOCKED

 - Reorder structs to eliminate holes

 - Remove netfs_io_request::ractl

 - Only provide proc_link field if CONFIG_PROC_FS=y

 - Remove folio_queue::marks3

 - Fix undifferentiation of DIO reads from unbuffered reads

* tag 'vfs-6.16-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs: Fix undifferentiation of DIO reads from unbuffered reads
  netfs: Fix wait/wake to be consistent about the waitqueue used
  netfs: Fix the request's work item to not require a ref
  netfs: Fix setting of transferred bytes with short DIO reads
  netfs: Fix oops in write-retry from mis-resetting the subreq iterator
  fs/netfs: remove unused flag NETFS_RREQ_BLOCKED
  fs/netfs: remove unused flag NETFS_RREQ_DONT_UNLOCK_FOLIOS
  folio_queue: remove unused field `marks3`
  fs/netfs: declare field `proc_link` only if CONFIG_PROC_FS=y
  fs/netfs: remove `netfs_io_request.ractl`
  fs/netfs: reorder struct fields to eliminate holes
  fs/netfs: remove unused enum choice NETFS_READ_HOLE_CLEAR
  fs/netfs: remove unused flag NETFS_ICTX_WRITETHROUGH
  fs/netfs: remove unused source NETFS_INVALID_WRITE
  fs/netfs: remove unused flag NETFS_SREQ_SEEK_DATA_READ
2025-06-02 15:04:06 -07:00
David Howells
db26d62d79
netfs: Fix undifferentiation of DIO reads from unbuffered reads
On cifs, "DIO reads" (specified by O_DIRECT) need to be differentiated from
"unbuffered reads" (specified by cache=none in the mount parameters).  The
difference is flagged in the protocol and the server may behave
differently: Windows Server will, for example, mandate that DIO reads are
block aligned.

Fix this by adding a NETFS_UNBUFFERED_READ to differentiate this from
NETFS_DIO_READ, parallelling the write differentiation that already exists.
cifs will then do the right thing.

Fixes: 016dc8516a ("netfs: Implement unbuffered/DIO read support")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/3444961.1747987072@warthog.procyon.org.uk
Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
cc: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23 10:35:03 +02:00
David Howells
20d72b00ca
netfs: Fix the request's work item to not require a ref
When the netfs_io_request struct's work item is queued, it must be supplied
with a ref to the work item struct to prevent it being deallocated whilst
on the queue or whilst it is being processed.  This is tricky to manage as
we have to get a ref before we try and queue it and then we may find it's
already queued and is thus already holding a ref - in which case we have to
try and get rid of the ref again.

The problem comes if we're in BH or IRQ context and need to drop the ref:
if netfs_put_request() reduces the count to 0, we have to do the cleanup -
but the cleanup may need to wait.

Fix this by adding a new work item to the request, ->cleanup_work, and
dispatching that when the refcount hits zero.  That can then synchronously
cancel any outstanding work on the main work item before doing the cleanup.

Adding a new work item also deals with another problem upstream where it's
sometimes changing the work func in the put function and requeuing it -
which has occasionally in the past caused the cleanup to happen
incorrectly.

As a bonus, this allows us to get rid of the 'was_async' parameter from a
bunch of functions.  This indicated whether the put function might not be
permitted to sleep.

Fixes: 3d3c950467 ("netfs: Provide readahead and readpage netfs helpers")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250519090707.2848510-4-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <stfrench@microsoft.com>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21 14:35:20 +02:00
Paulo Alcantara
3965c23773 smb: client: fix zero rsize error messages
cifs_prepare_read() might be called with a disconnected channel, where
TCP_Server_Info::max_read is set to zero due to reconnect, so calling
->negotiate_rize() will set @rsize to default min IO size (64KiB) and
then logging

	CIFS: VFS: SMB: Zero rsize calculated, using minimum value
	65536

If the reconnect happens in cifsd thread, cifs_renegotiate_iosize()
will end up being called and then @rsize set to the expected value.

Since we can't rely on the value of @server->max_read by the time we
call cifs_prepare_read(), try to ->negotiate_rize() only if
@cifs_sb->ctx->rsize is zero.

Reported-by: Steve French <stfrench@microsoft.com>
Fixes: c59f7c9661 ("smb: client: ensure aligned IO sizes")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-14 19:26:38 -05:00
Paulo Alcantara
c59f7c9661 smb: client: ensure aligned IO sizes
Make all IO sizes multiple of PAGE_SIZE, either negotiated by the
server or passed through rsize, wsize and bsize mount options, to
prevent from breaking DIO reads and writes against servers that
enforce alignment as specified in MS-FSA 2.1.5.3 and 2.1.5.4.

Cc: linux-cifs@vger.kernel.org
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-01 08:35:58 -05:00
Chunjie Zhu
262b73ef44 smb3 client: fix open hardlink on deferred close file error
The following Python script results in unexpected behaviour when run on
a CIFS filesystem against a Windows Server:

    # Create file
    fd = os.open('test', os.O_WRONLY|os.O_CREAT)
    os.write(fd, b'foo')
    os.close(fd)

    # Open and close the file to leave a pending deferred close
    fd = os.open('test', os.O_RDONLY|os.O_DIRECT)
    os.close(fd)

    # Try to open the file via a hard link
    os.link('test', 'new')
    newfd = os.open('new', os.O_RDONLY|os.O_DIRECT)

The final open returns EINVAL due to the server returning
STATUS_INVALID_PARAMETER. The root cause of this is that the client
caches lease keys per inode, but the spec requires them to be related to
the filename which causes problems when hard links are involved:

From MS-SMB2 section 3.3.5.9.11:

"The server MUST attempt to locate a Lease by performing a lookup in the
LeaseTable.LeaseList using the LeaseKey in the
SMB2_CREATE_REQUEST_LEASE_V2 as the lookup key. If a lease is found,
Lease.FileDeleteOnClose is FALSE, and Lease.Filename does not match the
file name for the incoming request, the request MUST be failed with
STATUS_INVALID_PARAMETER"

On client side, we first check the context of file open, if it hits above
conditions, we first close all opening files which are belong to the same
inode, then we do open the hard link file.

Cc: stable@vger.kernel.org
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-13 17:24:55 -05:00
Linus Torvalds
8b175e2e18 14 smb3/cifs client fixes and minor update to maintainers file
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmfpzSUACgkQiiy9cAdy
 T1GTHQwAhjeBIlGw2kaJDHBn32BDSpmw7r4D6gGUU+cv4sL4O2flRaDmshpGvO3r
 vgUxI8VFwLRvPem7QHr9aseFBIu6jQwWzI2tgkq+XW4LeyRtcgWff8dT8bQPc3b9
 t/z1wAqZhlr8MY5mma+aHjWsdRZNYzMWNFSWURpDylqhhNFxUbl/u24RF08VG+It
 bqBi+RyNIX2u0jHAuSUKUW0xFImp+YSEqg/TqYw10vZ4ChtfYtCX5YcbQNHls2XE
 IA7p0uOfFLrLmTmmw95A8rLtDlREb9rLcD2bLeBR2qFGnbrZFvCg917S0WchTU58
 P2UnKAZJqEhnMBefuXZ/LGKju5bnLAV6YGl/lKPf53UE71C9r5zBID3YgeweKiYS
 aWEjlY/FeC/Gb7iniRDBWE2BCaI6Sp7y/CmLucy58xrGhpPoXlliDj2FRCWWAFi4
 zk2rCempLa+uiIbIQReLclWbxA/ysqMJLwbtEKGa/le45LdtxAKkiTNLJ3MQciwd
 s6+i344q
 =Mjh/
 -----END PGP SIGNATURE-----

Merge tag '6.15-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client updates from Steve French:

 - Fix for network namespace refcount leak

 - Multichannel fix and minor multichannel debug message cleanup

 - Fix potential null ptr reference in SMB3 close

 - Fix for special file handling when reparse points not supported by
   server

 - Two ACL fixes one for stricter ACE validation, one for incorrect
   perms requested

 - Three RFC1001 fixes: one for SMB3 mounts on port 139, one for better
   default hostname, and one for better session response processing

 - Minor update to email address for MAINTAINERS file

 - Allow disabling Unicode for access to old SMB1 servers

 - Three minor cleanups

* tag '6.15-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Add new mount option -o nounicode to disable SMB1 UNICODE mode
  cifs: Set default Netbios RFC1001 server name to hostname in UNC
  smb: client: Fix netns refcount imbalance causing leaks and use-after-free
  cifs: add validation check for the fields in smb_aces
  CIFS: Propagate min offload along with other parameters from primary to secondary channels.
  cifs: Improve establishing SMB connection with NetBIOS session
  cifs: Fix establishing NetBIOS session for SMB2+ connection
  cifs: Fix getting DACL-only xattr system.cifs_acl and system.smb3_acl
  cifs: Check if server supports reparse points before using them
  MAINTAINERS: reorder preferred email for Steve French
  cifs: avoid NULL pointer dereference in dbg call
  smb: client: Remove redundant check in smb2_is_path_accessible()
  smb: client: Remove redundant check in cifs_oplock_break()
  smb: mark the new channel addition log as informational log with cifs_info
  smb: minor cleanup to remove unused function declaration
2025-03-31 17:38:34 -07:00
Linus Torvalds
99c21beaab vfs-6.15-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90p4AAKCRCRxhvAZXjc
 ojMIAP9atkG3u7+490+NGWLdulQlaHnD51Owa9MiW87UfKpsTQEArwi/NrJqXJNT
 PFQ2xIa5TxG+9haChR89w3kjZ6b/hgs=
 =iDkx
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Add CONFIG_DEBUG_VFS infrastucture:
      - Catch invalid modes in open
      - Use the new debug macros in inode_set_cached_link()
      - Use debug-only asserts around fd allocation and install

   - Place f_ref to 3rd cache line in struct file to resolve false
     sharing

Cleanups:

   - Start using anon_inode_getfile_fmode() helper in various places

   - Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
     f_pos_lock

   - Add unlikely() to kcmp()

   - Remove legacy ->remount_fs method from ecryptfs after port to the
     new mount api

   - Remove invalidate_inodes() in favour of evict_inodes()

   - Simplify ep_busy_loopER by removing unused argument

   - Avoid mmap sem relocks when coredumping with many missing pages

   - Inline getname()

   - Inline new_inode_pseudo() and de-staticize alloc_inode()

   - Dodge an atomic in putname if ref == 1

   - Consistently deref the files table with rcu_dereference_raw()

   - Dedup handling of struct filename init and refcounts bumps

   - Use wq_has_sleeper() in end_dir_add()

   - Drop the lock trip around I_NEW wake up in evict()

   - Load the ->i_sb pointer once in inode_sb_list_{add,del}

   - Predict not reaching the limit in alloc_empty_file()

   - Tidy up do_sys_openat2() with likely/unlikely

   - Call inode_sb_list_add() outside of inode hash lock

   - Sort out fd allocation vs dup2 race commentary

   - Turn page_offset() into a wrapper around folio_pos()

   - Remove locking in exportfs around ->get_parent() call

   - try_lookup_one_len() does not need any locks in autofs

   - Fix return type of several functions from long to int in open

   - Fix return type of several functions from long to int in ioctls

  Fixes:

   - Fix watch queue accounting mismatch"

* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: sort out fd allocation vs dup2 race commentary, take 2
  fs: call inode_sb_list_add() outside of inode hash lock
  fs: tidy up do_sys_openat2() with likely/unlikely
  fs: predict not reaching the limit in alloc_empty_file()
  fs: load the ->i_sb pointer once in inode_sb_list_{add,del}
  fs: drop the lock trip around I_NEW wake up in evict()
  fs: use wq_has_sleeper() in end_dir_add()
  VFS/autofs: try_lookup_one_len() does not need any locks
  fs: dedup handling of struct filename init and refcounts bumps
  fs: consistently deref the files table with rcu_dereference_raw()
  exportfs: remove locking around ->get_parent() call.
  fs: use debug-only asserts around fd allocation and install
  fs: dodge an atomic in putname if ref == 1
  vfs: Remove invalidate_inodes()
  ecryptfs: remove NULL remount_fs from super_operations
  watch_queue: fix pipe accounting mismatch
  fs: place f_ref to 3rd cache line in struct file to resolve false sharing
  epoll: simplify ep_busy_loop by removing always 0 argument
  fs: Turn page_offset() into a wrapper around folio_pos()
  kcmp: improve performance adding an unlikely hint to task comparisons
  ...
2025-03-24 09:13:50 -07:00
Ivan Abramov
9fb2e20e4f smb: client: Remove redundant check in cifs_oplock_break()
There is an unnecessary NULL check of inode in cifs_oplock_break(), since
there are multiple dereferences of cinode prior to it.

Based on usage of cifs_oplock_break() in cifs_new_fileinfo() we can safely
assume that inode is not NULL, so there is no need to check inode in
cifs_oplock_break() at all.

Therefore, this redundant check can be removed.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-03-24 09:45:07 -05:00
Jan Kara
93fd0d46cb
vfs: Remove invalidate_inodes()
The function can be replaced by evict_inodes. The only difference is
that evict_inodes() skips the inodes with positive refcount without
touching ->i_lock, but they are equivalent as evict_inodes() repeats the
refcount check after having grabbed ->i_lock.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250307144318.28120-2-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08 12:19:22 +01:00
Shyam Prasad N
f1bf10d7e9 cifs: pick channels for individual subrequests
The netfs library could break down a read request into
multiple subrequests. When multichannel is used, there is
potential to improve performance when each of these
subrequests pick a different channel.

Today we call cifs_pick_channel when the main read request
is initialized in cifs_init_request. This change moves this to
cifs_prepare_read, which is the right place to pick channel since
it gets called for each subrequest.

Interestingly cifs_prepare_write already does channel selection
for individual subreq, but looks like it was missed for read.
This is especially important when multichannel is used with
increased rasize.

In my test setup, with rasize set to 8MB, a sequential read
of large file was taking 11.5s without this change. With the
change, it completed in 9s. The difference is even more signigicant
with bigger rasize.

Cc: <stable@vger.kernel.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-02-11 18:51:07 -06:00
Linus Torvalds
ca56a74a31 vfs-6.14-rc1.netfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pRKQAKCRCRxhvAZXjc
 ov2dAQCULWjTBWdF8Ro2bfNeXzWvUUnSPjoLJ9B4xlrOB9c2MAEAiwkKHkzAxUco
 hCvaRJc3H2ze2wrgbIABPKB2noQVVwk=
 =4ojv
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs netfs updates from Christian Brauner:
 "This contains read performance improvements and support for monolithic
  single-blob objects that have to be read/written as such (e.g. AFS
  directory contents). The implementation of the two parts is interwoven
  as each makes the other possible.

   - Read performance improvements

     The read performance improvements are intended to speed up some
     loss of performance detected in cifs and to a lesser extend in afs.

     The problem is that we queue too many work items during the
     collection of read results: each individual subrequest is collected
     by its own work item, and then they have to interact with each
     other when a series of subrequests don't exactly align with the
     pattern of folios that are being read by the overall request.

     Whilst the processing of the pages covered by individual
     subrequests as they complete potentially allows folios to be woken
     in parallel and with minimum delay, it can shuffle wakeups for
     sequential reads out of order - and that is the most common I/O
     pattern.

     The final assessment and cleanup of an operation is then held up
     until the last I/O completes - and for a synchronous sequential
     operation, this means the bouncing around of work items just adds
     latency.

     Two changes have been made to make this work:

     (1) All collection is now done in a single "work item" that works
         progressively through the subrequests as they complete (and
         also dispatches retries as necessary).

     (2) For readahead and AIO, this work item be done on a workqueue
         and can run in parallel with the ultimate consumer of the data;
         for synchronous direct or unbuffered reads, the collection is
         run in the application thread and not offloaded.

     Functions such as smb2_readv_callback() then just tell netfslib
     that the subrequest has terminated; netfslib does a minimal bit of
     processing on the spot - stat counting and tracing mostly - and
     then queues/wakes up the worker. This simplifies the logic as the
     collector just walks sequentially through the subrequests as they
     complete and walks through the folios, if buffered, unlocking them
     as it goes. It also keeps to a minimum the amount of latency
     injected into the filesystem's low-level I/O handling

     The way netfs supports filesystems using the deprecated
     PG_private_2 flag is changed: folios are flagged and added to a
     write request as they complete and that takes care of scheduling
     the writes to the cache. The originating read request can then just
     unlock the pages whatever happens.

   - Single-blob object support

     Single-blob objects are files for which the content of the file
     must be read from or written to the server in a single operation
     because reading them in parts may yield inconsistent results. AFS
     directories are an example of this as there exists the possibility
     that the contents are generated on the fly and would differ between
     reads or might change due to third party interference.

     Such objects will be written to and retrieved from the cache if one
     is present, though we allow/may need to propose multiple
     subrequests to do so. The important part is that read from/write to
     the *server* is monolithic.

     Single blob reading is, for the moment, fully synchronous and does
     result collection in the application thread and, also for the
     moment, the API is supplied the buffer in the form of a folio_queue
     chain rather than using the pagecache.

   - Related afs changes

     This series makes a number of changes to the kafs filesystem,
     primarily in the area of directory handling:

      - AFS's FetchData RPC reply processing is made partially
        asynchronous which allows the netfs_io_request's outstanding
        operation counter to be removed as part of reducing the
        collection to a single work item.

      - Directory and symlink reading are plumbed through netfslib using
        the single-blob object API and are now cacheable with fscache.
        This also allows the afs_read struct to be eliminated and
        netfs_io_subrequest to be used directly instead.

      - Directory and symlink content are now stored in a folio_queue
        buffer rather than in the pagecache. This means we don't require
        the RCU read lock and xarray iteration to access it, and folios
        won't randomly disappear under us because the VM wants them
        back.

      - The vnode operation lock is changed from a mutex struct to a
        private lock implementation. The problem is that the lock now
        needs to be dropped in a separate thread and mutexes don't
        permit that.

      - When a new directory or symlink is created, we now initialise it
        locally and mark it valid rather than downloading it (we know
        what it's likely to look like).

      - We now use the in-directory hashtable to reduce the number of
        entries we need to scan when doing a lookup. The edit routines
        have to maintain the hash chains.

      - Cancellation (e.g. by signal) of an async call after the
        rxrpc_call has been set up is now offloaded to the worker thread
        as there will be a notification from rxrpc upon completion. This
        avoids a double cleanup.

   - A "rolling buffer" implementation is created to abstract out the
     two separate folio_queue chaining implementations I had (one for
     read and one for write).

   - Functions are provided to create/extend a buffer in a folio_queue
     chain and tear it down again.

     This is used to handle AFS directories, but could also be used to
     create bounce buffers for content crypto and transport crypto.

   - The was_async argument is dropped from netfs_read_subreq_terminated()

     Instead we wake the read collection work item by either queuing it
     or waking up the app thread.

   - We don't need to use BH-excluding locks when communicating between
     the issuing thread and the collection thread as neither of them now
     run in BH context.

   - Also included are a number of new tracepoints; a split of the
     netfslib write collection code to put retrying into its own file
     (it gets more complicated with content encryption).

   - There are also some minor fixes AFS included, including fixing the
     AFS directory format struct layout, reducing some directory
     over-invalidation and making afs_mkdir() translate EEXIST to
     ENOTEMPY (which is not available on all systems the servers
     support).

   - Finally, there's a patch to try and detect entry into the folio
     unlock function with no folio_queue structs in the buffer (which
     isn't allowed in the cases that can get there).

     This is a debugging patch, but should be minimal overhead"

* tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
  netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
  afs: Add a tracepoint for afs_read_receive()
  afs: Locally initialise the contents of a new symlink on creation
  afs: Use the contained hashtable to search a directory
  afs: Make afs_mkdir() locally initialise a new directory's content
  netfs: Change the read result collector to only use one work item
  afs: Make {Y,}FS.FetchData an asynchronous operation
  afs: Fix cleanup of immediately failed async calls
  afs: Eliminate afs_read
  afs: Use netfslib for symlinks, allowing them to be cached
  afs: Use netfslib for directories
  afs: Make afs_init_request() get a key if not given a file
  netfs: Add support for caching single monolithic objects such as AFS dirs
  netfs: Add functions to build/clean a buffer in a folio_queue
  afs: Add more tracepoints to do with tracking validity
  cachefiles: Add auxiliary data trace
  cachefiles: Add some subrequest tracepoints
  netfs: Remove some extraneous directory invalidations
  afs: Fix directory format encoding struct
  afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
  ...
2025-01-20 09:29:11 -08:00
Bharath SM
b8ea3b1ff5 smb: enable reuse of deferred file handles for write operations
Previously, deferred file handles were reused only for read
operations, this commit extends to reusing deferred handles
for write operations. By reusing these handles we can reduce
the need for open/close operations over the wire.

Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-12-23 08:05:39 -06:00
David Howells
31fc366aa7
netfs: Drop the was_async arg from netfs_read_subreq_terminated()
Drop the was_async argument from netfs_read_subreq_terminated().  Almost
every caller is either in process context and passes false.  Some
filesystems delegate the call to a workqueue to avoid doing the work in
their network message queue parsing thread.

The only exception is netfs_cache_read_terminated() which handles
completion in the cache - which is usually a callback from the backing
filesystem in softirq context, though it can be from process context if an
error occurred.  In this case, delegate to a workqueue.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wiVC5Cgyz6QKXFu6fTaA6h4CjexDR-OV9kL6Vo5x9v8=A@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-10-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:03 +01:00
David Howells
360157829e
netfs: Drop the error arg from netfs_read_subreq_terminated()
Drop the error argument from netfs_read_subreq_terminated() in favour of
passing the value in subreq->error.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-9-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:03 +01:00
Shen Lichuan
e9f49feefb smb: client: Correct typos in multiple comments across various files
Fixed some confusing typos that were currently identified witch codespell,
the details are as follows:

-in the code comments:
fs/smb/client/cifsacl.h:58: inheritence ==> inheritance
fs/smb/client/cifsencrypt.c:242: origiginal ==> original
fs/smb/client/cifsfs.c:164: referece ==> reference
fs/smb/client/cifsfs.c:292: ned ==> need
fs/smb/client/cifsglob.h:779: initital ==> initial
fs/smb/client/cifspdu.h:784: altetnative ==> alternative
fs/smb/client/cifspdu.h:2409: conrol ==> control
fs/smb/client/cifssmb.c:1218: Expirement ==> Experiment
fs/smb/client/cifssmb.c:3021: conver ==> convert
fs/smb/client/cifssmb.c:3998: asterik ==> asterisk
fs/smb/client/file.c:2505: useable ==> usable
fs/smb/client/fs_context.h:263: timemout ==> timeout
fs/smb/client/misc.c:257: responsbility ==> responsibility
fs/smb/client/netmisc.c:1006: divisable ==> divisible
fs/smb/client/readdir.c:556: endianess ==> endianness
fs/smb/client/readdir.c:818: bu ==> by
fs/smb/client/smb2ops.c:2180: snaphots ==> snapshots
fs/smb/client/smb2ops.c:3586: otions ==> options
fs/smb/client/smb2pdu.c:2979: timestaps ==> timestamps
fs/smb/client/smb2pdu.c:4574: memmory ==> memory
fs/smb/client/smb2transport.c:699: origiginal ==> original
fs/smb/client/smbdirect.c:222: happenes ==> happens
fs/smb/client/smbdirect.c:1347: registartions ==> registrations
fs/smb/client/smbdirect.h:114: accoutning ==> accounting

Signed-off-by: Shen Lichuan <shenlichuan@vivo.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-02 17:52:24 -05:00
Linus Torvalds
4e0373f1f9 24 smb3 client fixes, about half cleanup, and SMB3.1.1 compression improvements, and also fixes for special file types with sfu mount option
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmbpAwkACgkQiiy9cAdy
 T1FJhgv+PX+IIGyNNW0I3f3ZzIWqc1DCwxXHCa3gvr7TKimJ71AGbEdzFZZzl3AJ
 CdxSLf2NQ6tBUxl65QuMC7XykqQXKvNnQEDPoQcHfFgTtYJi+zng1dDvvXSfFbWW
 m2Hql1w6MNFeKlFBavbA6MI94MnZqE5J/yCtWqw3LvEn4l2JwYrAzS5Lw9qjtcER
 DmlOsrEFgpsFhhpnyPZXJxaWKZIDG2OuG61LWkqyhvLOTtuFuc9cEsTWPdeRYAT6
 KKh5z58wqG2JG0IkVjG1foBclv0zcZgUzqOr2/tzbabYye991kLnUitaTwd+u8xS
 pTbVIw1E91sFEqVsr2IpnLUq68MKaahlNfHkNJD0dqaMKfGOujqtNRFw82Yki4w5
 aTosgECyUiGKgwuE8HLtwlJaE4EizVdrqQiP2cUOrtuWPvOvnY7vjWKC8kmSM0Z/
 u0ov6JdirVlnFE3dlS0i6ywKaolsrrPYUTbv4ihjQiGHtm+VjonH8VYsdg8sUV0e
 5/+cyqaF
 =B6Et
 -----END PGP SIGNATURE-----

Merge tag 'v6.12-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client updates from Steve French:

 - cleanups (moving duplicated code, removing unused code etc)

 - fixes relating to "sfu" mount options (for better handling special
   file types)

 - SMB3.1.1 compression fixes/improvements

* tag 'v6.12-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
  smb: client: fix compression heuristic functions
  cifs: Update SFU comments about fifos and sockets
  cifs: Add support for creating SFU symlinks
  smb: use LIST_HEAD() to simplify code
  cifs: Recognize SFU socket type
  cifs: Show debug message when SFU Fifo type was detected
  cifs: Put explicit zero byte into SFU block/char types
  cifs: Add support for reading SFU symlink location
  cifs: Fix recognizing SFU symlinks
  smb: client: compress: fix an "illegal accesses" issue
  smb: client: compress: fix a potential issue of freeing an invalid pointer
  smb: client: compress: LZ77 code improvements cleanup
  smb: client: insert compression check/call on write requests
  smb3: mark compression as CONFIG_EXPERIMENTAL and fix missing compression operation
  cifs: Remove obsoleted declaration for cifs_dir_open
  smb: client: Use min() macro
  cifs: convert to use ERR_CAST()
  smb: add comment to STATUS_MCA_OCCURED
  smb: move SMB2 Status code to common header file
  smb: move some duplicate definitions to common/smbacl.h
  ...
2024-09-19 06:53:40 +02:00
Hongbo Li
21dcbc17eb smb: use LIST_HEAD() to simplify code
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD(). No functional impact.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-15 10:42:45 -05:00
David Howells
ee4cdf7ba8
netfs: Speed up buffered reading
Improve the efficiency of buffered reads in a number of ways:

 (1) Overhaul the algorithm in general so that it's a lot more compact and
     split the read submission code between buffered and unbuffered
     versions.  The unbuffered version can be vastly simplified.

 (2) Read-result collection is handed off to a work queue rather than being
     done in the I/O thread.  Multiple subrequests can be processes
     simultaneously.

 (3) When a subrequest is collected, any folios it fully spans are
     collected and "spare" data on either side is donated to either the
     previous or the next subrequest in the sequence.

Notes:

 (*) Readahead expansion is massively slows down fio, presumably because it
     causes a load of extra allocations, both folio and xarray, up front
     before RPC requests can be transmitted.

 (*) RDMA with cifs does appear to work, both with SIW and RXE.

 (*) PG_private_2-based reading and copy-to-cache is split out into its own
     file and altered to use folio_queue.  Note that the copy to the cache
     now creates a new write transaction against the cache and adds the
     folios to be copied into it.  This allows it to use part of the
     writeback I/O code.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12 12:20:41 +02:00
David Howells
52d55922e0
netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
Move max_len/max_nr_segs from struct netfs_io_subrequest to struct
netfs_io_stream as we only issue one subreq at a time and then don't need
these values again for that subreq unless and until we have to retry it -
in which case we want to renegotiate them.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-8-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-05 11:00:41 +02:00
David Howells
73425800ac
netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode
Move CIFS_INO_MODIFIED_ATTR to netfs_inode as NETFS_ICTX_MODIFIED_ATTR and
then make netfs_perform_write() set it.  This means that cifs doesn't need
to implement the ->post_modify() hook.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-7-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-05 11:00:41 +02:00
David Howells
6a5dcd4877 cifs: Fix lack of credit renegotiation on read retry
When netfslib asks cifs to issue a read operation, it prefaces this with a
call to ->clamp_length() which cifs uses to negotiate credits, providing
receive capacity on the server; however, in the event that a read op needs
reissuing, netfslib doesn't call ->clamp_length() again as that could
shorten the subrequest, leaving a gap.

This causes the retried read to be done with zero credits which causes the
server to reject it with STATUS_INVALID_PARAMETER.  This is a problem for a
DIO read that is requested that would go over the EOF.  The short read will
be retried, causing EINVAL to be returned to the user when it fails.

Fix this by making cifs_req_issue_read() negotiate new credits if retrying
(NETFS_SREQ_RETRYING now gets set in the read side as well as the write
side in this instance).

This isn't sufficient, however: the new credits might not be sufficient to
complete the remainder of the read, so also add an additional field,
rreq->actual_len, that holds the actual size of the op we want to perform
without having to alter subreq->len.

We then rely on repeated short reads being retried until we finish the read
or reach the end of file and make a zero-length read.

Also fix a couple of places where the subrequest start and length need to
be altered by the amount so far transferred when being used.

Fixes: 69c3c023af ("cifs: Implement netfslib hooks")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-28 07:47:36 -05:00
Steve French
e4be320eec smb3: fix broken cached reads when posix locks
Mandatory locking is enforced for cached reads, which violates
default posix semantics, and also it is enforced inconsistently.
This affected recent versions of libreoffice, and can be
demonstrated by opening a file twice from the same client,
locking it from handle one and trying to read from it from
handle two (which fails, returning EACCES).

There is already a mount option "forcemandatorylock"
(which defaults to off), so with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on read to a locked range (ie we will
only fail in this case, if the user mounts with
"forcemandatorylock").

An earlier patch fixed the write path.

Fixes: 85160e03a7 ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable@vger.kernel.org
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Reported-by: abartlet@samba.org
Reported-by: Kevin Ottens <kevin.ottens@enioka.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-18 17:01:06 -05:00
Linus Torvalds
e0fac5fc8b three client fixes, including two for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmbBGkoACgkQiiy9cAdy
 T1HAJAv9G2efGXOuLHuDKM4IkoUBoeAsC/o5g5sVbZfINON1Ra0vQBLmRLunhAlW
 xIY2Ln92jMdvM6wNwFcsAI5bIWTiIrjdqP/HY9kiKRU5O5NvqNWeyPEDOB3aM41O
 UXq8jNKyyyyFD1P4QJNYMeZucTZatLJVb7WRZHGDEDcVMrCWdDVcnPwnMfyNeD0w
 GndMPAAxiQxV+AoL+RgE6+nfVr4EwHI3VFG/h3FyNcaMp2ZSzYHDu/TIwmGBHq6P
 DCJyxjKMJoXKzKO+3hVp3tKzKZ9EuE3ljb8liBbZ8g6J4quCHbQWC3Mh8Jhmgav6
 1KhDRKI6vjHZwu8tWjBEgadhwcRBHMuz/YZL+zrx3QHjA/AgV20Y7oyvyXKusj9t
 G5C1bTExusdhLnEOGN4+udxjAHrMkW36R6Vux5D85WYmhR3k2AbIdZevA+mLADKU
 veTye1VAX5vy9h0atyV69Zta9aBU6q3Mhcpgrcbj0u3C/Iuu1DafrEmb5hGgW7Dw
 xnGynYax
 =af3x
 -----END PGP SIGNATURE-----

Merge tag 'v6.11-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - fix for clang warning - additional null check

 - fix for cached write with posix locks

 - flexible structure fix

* tag 'v6.11-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: smb2pdu.h: Use static_assert() to check struct sizes
  smb3: fix lock breakage for cached writes
  smb/client: avoid possible NULL dereference in cifs_free_subrequest()
2024-08-17 16:31:12 -07:00
Steve French
836bb3268d smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).

Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").

Fixes: 85160e03a7 ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable@vger.kernel.org
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Reported-by: abartlet@samba.org
Reported-by: Kevin Ottens <kevin.ottens@enioka.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-15 16:04:47 -05:00
Su Hui
74c2ab6d65 smb/client: avoid possible NULL dereference in cifs_free_subrequest()
Clang static checker (scan-build) warning:
	cifsglob.h:line 890, column 3
	Access to field 'ops' results in a dereference of a null pointer.

Commit 519be98971 ("cifs: Add a tracepoint to track credits involved in
R/W requests") adds a check for 'rdata->server', and let clang throw this
warning about NULL dereference.

When 'rdata->credits.value != 0 && rdata->server == NULL' happens,
add_credits_and_wake_if() will call rdata->server->ops->add_credits().
This will cause NULL dereference problem. Add a check for 'rdata->server'
to avoid NULL dereference.

Cc: stable@vger.kernel.org
Fixes: 69c3c023af ("cifs: Implement netfslib hooks")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-15 15:32:30 -05:00
Dominique Martinet
e3786b29c5
9p: Fix DIO read through netfs
If a program is watching a file on a 9p mount, it won't see any change in
size if the file being exported by the server is changed directly in the
source filesystem, presumably because 9p doesn't have change notifications,
and because netfs skips the reads if the file is empty.

Fix this by attempting to read the full size specified when a DIO read is
requested (such as when 9p is operating in unbuffered mode) and dealing
with a short read if the EOF was less than the expected read.

To make this work, filesystems using netfslib must not set
NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
I don't want to mandatorily clear this flag in netfslib for DIO because,
say, ceph might make a read from an object that is not completely filled,
but does not reside at the end of file - and so we need to clear the
excess.

This can be tested by watching an empty file over 9p within a VM (such as
in the ktest framework):

        while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo

then writing something into the empty file.  The watcher should immediately
display the file content and break out of the loop.  Without this fix, it
remains in the loop indefinitely.

Fixes: 80105ed2fd ("9p: Use netfslib read/write_iter")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/1229195.1723211769@warthog.procyon.org.uk
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-13 13:53:09 +02:00
David Howells
a07d38afd1 cifs: Fix missing fscache invalidation
A network filesystem needs to implement a netfslib hook to invalidate
fscache if it's to be able to use the cache.

Fix cifs to implement the cache invalidation hook.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-20 13:55:29 -05:00
David Howells
519be98971 cifs: Add a tracepoint to track credits involved in R/W requests
Add a tracepoint to track the credit changes and server in_flight value
involved in the lifetime of a R/W request, logging it against the
request/subreq debugging ID.  This requires the debugging IDs to be
recorded in the cifs_credits struct.

The tracepoint can be enabled with:

	echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable

Also add a three-state flag to struct cifs_credits to note if we're
interested in determining when the in_flight contribution ends and, if so,
to track whether we've decremented the contribution yet.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-19 11:08:57 -05:00
David Howells
61ea6b3a31 cifs: Fix setting of zero_point after DIO write
At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to
adjust the size of the file if it needs it.  This will reduce the
zero_point (the point above which we assume a read will just return zeros)
if it's more than the new i_size, but won't increase it.

With DIO writes, however, we definitely want to increase it as we have
clobbered the local pagecache and then written some data that's not
available locally.

Fix cifs to make the zero_point above the end of a DIO or unbuffered write.

This fixes corruption seen occasionally with the generic/708 xfs-test.  In
that case, the read-back of some of the written data is being
short-circuited and replaced with zeroes.

Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Cc: stable@vger.kernel.org
Reported-by: Steve French <sfrench@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-19 11:08:57 -05:00
David Howells
d2c5eb57b6 cifs: Fix missing error code set
In cifs_strict_readv(), the default rc (-EACCES) is accidentally cleared by
a successful return from netfs_start_io_direct(), such that if
cifs_find_lock_conflict() fails, we don't return an error.

Fix this by resetting the default error code.

Fixes: 14b1cd2534 ("cifs: Fix locking in cifs_strict_readv()")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-19 11:08:57 -05:00
David Howells
08f70c0a93 cifs: Fix read-performance regression by dropping readahead expansion
cifs_expand_read() is causing a performance regression of around 30% by
causing extra pagecache to be allocated for an inode in the readahead path
before we begin actually dispatching RPC requests, thereby delaying the
actual I/O.  The expansion is sized according to the rsize parameter, which
seems to be 4MiB on my test system; this is a big step up from the first
requests made by the fio test program.

Simple repro (look at read bandwidth number):
     fio --name=writetest --filename=/xfstest.test/foo --time_based --runtime=60 --size=16M --numjobs=1 --rw=read

Fix this by removing cifs_expand_readahead().  Readahead expansion is
mostly useful for when we're using the local cache if the local cache has a
block size greater than PAGE_SIZE, so we can dispense with it when not
caching.

Fixes: 69c3c023af ("cifs: Implement netfslib hooks")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-02 21:23:41 -05:00
David Howells
3f59138580 cifs: Move the 'pid' from the subreq to the req
Move the reference pid from the cifs_io_subrequest struct to the
cifs_io_request struct as it's the same for all subreqs of a particular
request.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-06-20 15:25:08 -05:00
David Howells
969b3010cb cifs: Only pick a channel once per read request
In cifs, only pick a channel when setting up a read request rather than
doing so individually for every subrequest and instead use that channel for
all.  This mirrors what the code in v6.9 does.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-06-20 15:21:44 -05:00
Barry Song
29433a17a7 cifs: drop the incorrect assertion in cifs_swap_rw()
Since commit 2282679fb2 ("mm: submit multipage write for SWP_FS_OPS
swap-space"), we can plug multiple pages then unplug them all together.
That means iov_iter_count(iter) could be way bigger than PAGE_SIZE, it
actually equals the size of iov_iter_npages(iter, INT_MAX).

Note this issue has nothing to do with large folios as we don't support
THP_SWPOUT to non-block devices.

Fixes: 2282679fb2 ("mm: submit multipage write for SWP_FS_OPS swap-space")
Reported-by: Christoph Hellwig <hch@lst.de>
Closes: https://lore.kernel.org/linux-mm/20240614100329.1203579-1-hch@lst.de/
Cc: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Steve French <sfrench@samba.org>
Cc: Trond Myklebust <trondmy@kernel.org>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Paulo Alcantara <pc@manguebit.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Shyam Prasad N <sprasad@microsoft.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Bharath SM <bharathsm@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-06-18 22:47:25 -05:00
Steve French
16e00683dc smb3: reenable swapfiles over SMB3 mounts
With the changes to folios/netfs it is now easier to reenable
swapfile support over SMB3 which fixes various xfstests

Reviewed-by: David Howells <dhowells@redhat.com>
Suggested-by: David Howells <dhowells@redhat.com>
Fixes: e1209d3a7a ("mm: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space")
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-05-21 11:14:55 -05:00
Steve French
14b1cd2534 cifs: Fix locking in cifs_strict_readv()
Fix to take the i_rwsem (through the netfs locking wrappers) before taking
cinode->lock_sem.

Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Reported-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-05-13 17:02:05 -05:00
David Howells
b593634424 cifs: Remove some code that's no longer used, part 3
Remove some code that was #if'd out with the netfslib conversion.  This is
split into parts for file.c as the diff generator otherwise produces a hard
to read diff for part of it where a big chunk is cut out.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-05-01 18:08:22 +01:00
David Howells
2f99c0bce6 cifs: Remove some code that's no longer used, part 2
Remove some code that was #if'd out with the netfslib conversion.  This is
split into parts for file.c as the diff generator otherwise produces a hard
to read diff for part of it where a big chunk is cut out.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-05-01 18:08:22 +01:00
David Howells
742b3443e2 cifs: Remove some code that's no longer used, part 1
Remove some code that was #if'd out with the netfslib conversion.  This is
split into parts for file.c as the diff generator otherwise produces a hard
to read diff for part of it where a big chunk is cut out.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-05-01 18:08:21 +01:00
David Howells
3ee1a1fc39 cifs: Cut over to using netfslib
Make the cifs filesystem use netfslib to handle reading and writing on
behalf of cifs.  The changes include:

 (1) Various read_iter/write_iter type functions are turned into wrappers
     around netfslib API functions or are pointed directly at those
     functions:

	cifs_file_direct{,_nobrl}_ops switch to use
	netfs_unbuffered_read_iter and netfs_unbuffered_write_iter.

Large pieces of code that will be removed are #if'd out and will be removed
in subsequent patches.

[?] Why does cifs mark the page dirty in the destination buffer of a DIO
    read?  Should that happen automatically?  Does netfs need to do that?

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-05-01 18:08:21 +01:00
David Howells
69c3c023af cifs: Implement netfslib hooks
Provide implementation of the netfslib hooks that will be used by netfslib
to ask cifs to set up and perform operations.  Of particular note are

 (*) cifs_clamp_length() - This is used to negotiate the size of the next
     subrequest in a read request, taking into account the credit available
     and the rsize.  The credits are attached to the subrequest.

 (*) cifs_req_issue_read() - This is used to issue a subrequest that has
     been set up and clamped.

 (*) cifs_prepare_write() - This prepares to fill a subrequest by picking a
     channel, reopening the file and requesting credits so that we can set
     the maximum size of the subrequest and also sets the maximum number of
     segments if we're doing RDMA.

 (*) cifs_issue_write() - This releases any unneeded credits and issues an
     asynchronous data write for the contiguous slice of file covered by
     the subrequest.  This should possibly be folded in to all
     ->async_writev() ops and that called directly.

 (*) cifs_begin_writeback() - This gets the cached writable handle through
     which we do writeback (this does not affect writethrough, unbuffered
     or direct writes).

At this point, cifs is not wired up to actually *use* netfslib; that will
be done in a subsequent patch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-05-01 18:08:21 +01:00
David Howells
1a5b4edd97 cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c
Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c so that
they are colocated with similar functions rather than being split with
cifsfs.c.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-05-01 18:08:20 +01:00
David Howells
dc5939de82 cifs: Replace the writedata replay bool with a netfs sreq flag
Replace the 'replay' bool in cifs_writedata (now cifs_io_subrequest) with a
flag in the netfs_io_subrequest flags.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-05-01 18:08:19 +01:00
David Howells
56257334e8 cifs: Make wait_mtu_credits take size_t args
Make the wait_mtu_credits functions use size_t for the size and num
arguments rather than unsigned int as netfslib uses size_t/ssize_t for
arguments and return values to allow for extra capacity.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: netfs@lists.linux.dev
cc: linux-mm@kvack.org
2024-05-01 18:08:19 +01:00