linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-08-05 16:54:27 +00:00

Author	SHA1	Message	Date
Linus Torvalds	57fcb7d930	vfs-6.17-rc1.fileattr -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCpgAKCRCRxhvAZXjc oqfFAQDcy3rROUF3W34KcSi7rDmaKVSX53d1tUoqH+1zDRpSlwEAriKDNC1ybudp YAnxVzkRHjHs1296WIuwKq5lfhJ60Q4= =geAl -----END PGP SIGNATURE----- Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get\|set] return -EOPNOTSUPP selinux: implement inode_file_[g\|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file	2025-07-28 15:24:14 -07:00
NeilBrown	fe4d3360f9	ovl: rename ovl_cleanup_unlocked() to ovl_cleanup() The only remaining user of ovl_cleanup() is ovl_cleanup_locked(), so we no longer need both. This patch renames ovl_cleanup() to ovl_cleanup_locked() and makes it static. ovl_cleanup_unlocked() is renamed to ovl_cleanup(). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-22-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-18 11:10:43 +02:00
NeilBrown	2fa14cf2dc	ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed Rather than locking the directory(s) before calling ovl_cleanup_and_whiteout(), change it (and ovl_whiteout()) to do the locking, so the locking can be fine grained as will be needed for proposed locking changes. Sometimes this is called to whiteout something in the index dir, in which case only that dir must be locked. In one case it is called on something in an upperdir, so two directories must be locked. We use ovl_lock_rename_workdir() for this and remove the restriction that upperdir cannot be indexdir - because now sometimes it is. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-18-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-18 11:10:42 +02:00
NeilBrown	8290fb412d	ovl: narrow locking in ovl_cleanup_index() ovl_cleanup_index() takes a lock on the directory and then does a lookup and possibly one of two different cleanups. This patch narrows the locking to use the _unlocked() versions of the lookup and one cleanup, and just takes the lock for the other cleanup. A subsequent patch will take the lock into the cleanup. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-12-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-18 11:10:41 +02:00
NeilBrown	d2c995581c	ovl: Call ovl_create_temp() without lock held. ovl currently locks a directory or two and then performs multiple actions in one or both directories. This is incompatible with proposed changes which will lock just the dentry of objects being acted on. This patch moves calls to ovl_create_temp() out of the locked regions and has it take and release the relevant lock itself. The lock that was taken before this function was called is now taken after. This means that any code between where the lock was taken and ovl_create_temp() is now unlocked. This necessitates the use of ovl_cleanup_unlocked() and the creation of ovl_lookup_upper_unlocked(). These will be used more widely in future patches. Now that the file is created before the lock is taken for rename, we need to ensure the parent wasn't changed before the lock was gained. ovl_lock_rename_workdir() is changed to optionally receive the dentries that will be involved in the rename. If either is present but has the wrong parent, an error is returned. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-4-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-18 11:10:40 +02:00
NeilBrown	9d23967b18	ovl: simplify an error path in ovl_copy_up_workdir() If ovl_copy_up_data() fails the error is not immediately handled but the code continues on to call ovl_start_write() and lock_rename(), presumably because both of these locks are needed for the cleanup. Only then (if the lock was successful) is the error checked. This makes the code a little hard to follow and could be fragile. This patch changes to handle the error after the ovl_start_write() (which cannot fail, so there aren't multiple errors to deail with). A new ovl_cleanup_unlocked() is created which takes the required directory lock. This will be used extensively in later patches. In general we need to check the parent is still correct after taking the lock (as ovl_copy_up_workdir() does after a successful lock_rename()) so that is included in ovl_cleanup_unlocked() using new ovl_parent_lock() and ovl_parent_unlock() calls (it is planned to move this API into VFS code eventually, though in a slightly different form). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-2-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-18 11:10:40 +02:00
Amir Goldstein	083957f961	ovl: support layers on case-folding capable filesystems Case folding is often applied to subtrees and not on an entire filesystem. Disallowing layers from filesystems that support case folding is over limiting. Replace the rule that case-folding capable are not allowed as layers with a rule that case folded directories are not allowed in a merged directory stack. Should case folding be enabled on an underlying directory while overlayfs is mounted the outcome is generally undefined. Specifically in ovl_lookup(), we check the base underlying directory and fail with -ESTALE and write a warning to kmsg if an underlying directory case folding is enabled. Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/linux-fsdevel/20250520051600.1903319-1-kent.overstreet@linux.dev/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250602171702.1941891-1-amir73il@gmail.com Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-18 11:09:33 +02:00
Christian Brauner	ca115d7e75	tree-wide: s/struct fileattr/struct file_kattr/g Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-04 16:14:39 +02:00
NeilBrown	bc9241367a	VFS: change old_dir and new_dir in struct renamedata to dentrys all users of 'struct renamedata' have the dentry for the old and new directories, and often have no use for the inode except to store it in the renamedata. This patch changes struct renamedata to hold the dentry, rather than the inode, for the old and new directories, and changes callers to match. The names are also changed from a _dir suffix to _parent. This is consistent with other usage in namei.c and elsewhere. This results in the removal of several local variables and several dereferences of ->d_inode at the cost of adding ->d_inode dereferences to vfs_rename(). Acked-by: Miklos Szeredi <miklos@szeredi.hu> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/174977089072.608730.4244531834577097454@noble.neil.brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-16 16:30:45 +02:00
Thorsten Blum	7314166ee7	ovl: Replace offsetof() with struct_size() in ovl_stack_free() Compared to offsetof(), struct_size() provides additional compile-time checks for structs with flexible arrays (e.g., __must_be_array()). No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-05-05 12:47:57 +02:00
Kees Cook	8a39f1c870	ovl: Check for NULL d_inode() in ovl_dentry_upper() In ovl_path_type() and ovl_is_metacopy_dentry() GCC notices that it is possible for OVL_E() to return NULL (which implies that d_inode(dentry) may be NULL). This would result in out of bounds reads via container_of(), seen with GCC 15's -Warray-bounds -fdiagnostics-details. For example: In file included from arch/x86/include/generated/asm/rwonce.h:1, from include/linux/compiler.h:339, from include/linux/export.h:5, from include/linux/linkage.h:7, from include/linux/fs.h:5, from fs/overlayfs/util.c:7: In function 'ovl_upperdentry_dereference', inlined from 'ovl_dentry_upper' at ../fs/overlayfs/util.c:305:9, inlined from 'ovl_path_type' at ../fs/overlayfs/util.c:216:6: include/asm-generic/rwonce.h:44:26: error: array subscript 0 is outside array bounds of 'struct inode[7486503276667837]' [-Werror=array-bounds=] 44 \| #define __READ_ONCE(x) ((const volatile __unqual_scalar_typeof(x) )&(x)) \| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/asm-generic/rwonce.h:50:9: note: in expansion of macro '__READ_ONCE' 50 \| __READ_ONCE(x); \ \| ^~~~~~~~~~~ fs/overlayfs/ovl_entry.h:195:16: note: in expansion of macro 'READ_ONCE' 195 \| return READ_ONCE(oi->__upperdentry); \| ^~~~~~~~~ 'ovl_path_type': event 1 185 \| return inode ? OVL_I(inode)->oe : NULL; 'ovl_path_type': event 2 Avoid this by allowing ovl_dentry_upper() to return NULL if d_inode() is NULL, as that means the problematic dereferencing can never be reached. Note that this fixes the over-eager compiler warning in an effort to being able to enable -Warray-bounds globally. There is no known behavioral bug here. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2025-04-30 15:56:11 +02:00
Christian Brauner	51c0bcf097	tree-wide: s/revert_creds_light()/revert_creds()/g Rename all calls to revert_creds_light() back to revert_creds(). Link: https://lore.kernel.org/r/20241125-work-cred-v2-6-68b9d38bb5b2@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-12-02 11:25:09 +01:00
Christian Brauner	6771e004b4	tree-wide: s/override_creds_light()/override_creds()/g Rename all calls to override_creds_light() back to overrid_creds(). Link: https://lore.kernel.org/r/20241125-work-cred-v2-5-68b9d38bb5b2@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-12-02 11:25:09 +01:00
Vasiliy Kovalev	c8b359dddb	ovl: Filter invalid inodes with missing lookup function Add a check to the ovl_dentry_weird() function to prevent the processing of directory inodes that lack the lookup function. This is important because such inodes can cause errors in overlayfs when passed to the lowerstack. Reported-by: syzbot+a8c9d476508bd14a90e5@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=a8c9d476508bd14a90e5 Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-unionfs/CAJfpegvx-oS9XGuwpJx=Xe28_jzWx5eRo1y900_ZzWY+=gGzUg@mail.gmail.com/ Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Cc: <stable@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2024-11-20 10:23:04 +01:00
Vinicius Costa Gomes	c5b28fc161	ovl: Optimize override/revert creds Use override_creds_light() in ovl_override_creds() and revert_creds_light() in ovl_revert_creds(). The _light() functions do not change the 'usage' of the credentials in question, as they refer to the credentials associated with the mounter, which have a longer lifetime. In ovl_setup_cred_for_create(), do not need to modify the mounter credentials (returned by override_creds_light()) 'usage' counter. Add a warning to verify that we are indeed working with the mounter credentials (stored in the superblock). Failure in this assumption means that creds may leak. Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2024-11-15 08:55:39 +01:00
Vinicius Costa Gomes	fc5a1d2287	ovl: use wrapper ovl_revert_creds() Introduce ovl_revert_creds() wrapper of revert_creds() to match callers of ovl_override_creds(). Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2024-11-11 10:45:04 +01:00
Al Viro	af58dc1f50	kernel_file_open(): get rid of inode argument always equal to ->dentry->d_inode of the path argument these days. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-04-15 16:03:24 -04:00
Linus Torvalds	0f1a876682	vfs-6.9.uuid -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem5LwAKCRCRxhvAZXjc onZsAQCjMNabNWAty2VBAQrNIpGkZ+AMA2DxEajPldaPiJH5zQEA9ea7feB3T47i NUrXXfMQ5DSop+k5Y65pPkEpbX4rhQo= =NZgd -----END PGP SIGNATURE----- Merge tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs uuid updates from Christian Brauner: "This adds two new ioctl()s for getting the filesystem uuid and retrieving the sysfs path based on the path of a mounted filesystem. Getting the filesystem uuid has been implemented in filesystem specific code for a while it's now lifted as a generic ioctl" * tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: add support for FS_IOC_GETFSSYSFSPATH fs: add FS_IOC_GETFSSYSFSPATH fat: Hook up sb->s_uuid fs: FS_IOC_GETUUID ovl: convert to super_set_uuid() fs: super_set_uuid()	2024-03-11 11:02:06 -07:00
Kent Overstreet	dd9019604c	ovl: convert to super_set_uuid() We don't want to be settingc sb->s_uuid directly anymore, as there's a length field that also has to be set, and this conversion was not completely trivial. Acked-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240207025624.1019754-3-kent.overstreet@linux.dev Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-unionfs@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-08 21:20:11 +01:00
Amir Goldstein	420332b941	ovl: mark xwhiteouts directory with overlay.opaque='x' An opaque directory cannot have xwhiteouts, so instead of marking an xwhiteouts directory with a new xattr, overload overlay.opaque xattr for marking both opaque dir ('y') and xwhiteouts dir ('x'). This is more efficient as the overlay.opaque xattr is checked during lookup of directory anyway. This also prevents unnecessary checking the xattr when reading a directory without xwhiteouts, i.e. most of the time. Note that the xwhiteouts marker is not checked on the upper layer and on the last layer in lowerstack, where xwhiteouts are not expected. Fixes: `bc8df7a3dc` ("ovl: Add an alternative type of whiteout") Cc: <stable@vger.kernel.org> # v6.7 Reviewed-by: Alexander Larsson <alexl@redhat.com> Tested-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2024-01-23 12:39:48 +02:00
Linus Torvalds	bf4e7080ae	fix directory locking scheme on rename broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+lyQAKCRBZ7Krx/gZQ 60MWAP94hTqeMIpjhsUIkrTnylrIFaiw4UCWFJzIRG1QQYKqCgD/XUaWI9np7dL6 0wR/j4CQSdJjiEFKUFE2pD3QoSuJYAQ= =+x0+ -----END PGP SIGNATURE----- Merge tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change	2024-01-11 20:00:22 -08:00
Al Viro	a8b0026847	rename(): avoid a deadlock in the case of parents having no common ancestor ... and fix the directory locking documentation and proof of correctness. Holding ->s_vfs_rename_mutex almost prevents ->d_parent changes; the case where we really don't want it is splicing the root of disconnected tree to somewhere. In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an ancestor of Y" only if X and Y are already in the same tree. Otherwise it can go from false to true, and one can construct a deadlock on that. Make lock_two_directories() report an error in such case and update the callers of lock_rename()/lock_rename_child() to handle such errors. And yes, such conditions are not impossible to create ;-/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2023-11-25 02:54:14 -05:00
Amir Goldstein	02d70090e0	ovl: remove redundant ofs->indexdir member When the index feature is disabled, ofs->indexdir is NULL. When the index feature is enabled, ofs->indexdir has the same value as ofs->workdir and takes an extra reference. This makes the code harder to understand when it is not always clear that ofs->indexdir in one function is the same dentry as ofs->workdir in another function. Remove this redundancy, by referencing ofs->workdir directly in index helpers and by using the ovl_indexdir() accessor in generic code. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-11-20 09:49:09 +02:00
Amir Goldstein	b28060db71	ovl: fix misformatted comment Remove misleading /** prefix from a regular comment. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311121628.byHp8tkv-lkp@intel.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-11-14 08:09:36 +02:00
Linus Torvalds	13d88ac54d	vfs-6.7.fsid -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZUpEaAAKCRCRxhvAZXjc ounBAQCAoS66gnOZ+k4kOWwB2zZ1Ueh3dPFC7IcEZ+pwFS8hpAEAxUQxV0TSWf5l W/1oKRtAJyuSYvehHeMUSJmHVBiM8w4= =bNm0 -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fanotify fsid updates from Christian Brauner: "This work is part of the plan to enable fanotify to serve as a drop-in replacement for inotify. While inotify is availabe on all filesystems, fanotify currently isn't. In order to support fanotify on all filesystems two things are needed: (1) all filesystems need to support AT_HANDLE_FID (2) all filesystems need to report a non-zero f_fsid This contains (1) and allows filesystems to encode non-decodable file handlers for fanotify without implementing any exportfs operations by encoding a file id of type FILEID_INO64_GEN from i_ino and i_generation. Filesystems that want to opt out of encoding non-decodable file ids for fanotify that don't support NFS export can do so by providing an empty export_operations struct. This also partially addresses (2) by generating f_fsid for simple filesystems as well as freevxfs. Remaining filesystems will be dealt with by separate patches. Finally, this contains the patch from the current exportfs maintainers which moves exportfs under vfs with Chuck, Jeff, and Amir as maintainers and vfs.git as tree" * tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: create an entry for exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined freevxfs: derive f_fsid from bdev->bd_dev fs: report f_fsid from s_dev for "simple" filesystems exportfs: support encoding non-decodeable file handles by default exportfs: define FILEID_INO64_GEN* file handle types exportfs: make ->encode_fh() a mandatory method for NFS export exportfs: add helpers to check if filesystem can encode/decode file handles	2023-11-07 12:11:26 -08:00
Alexander Larsson	bc8df7a3dc	ovl: Add an alternative type of whiteout An xattr whiteout (called "xwhiteout" in the code) is a reguar file of zero size with the "overlay.whiteout" xattr set. A file like this in a directory with the "overlay.whiteouts" xattrs set will be treated the same way as a regular whiteout. The "overlay.whiteouts" directory xattr is used in order to efficiently handle overlay checks in readdir(), as we only need to checks xattrs in affected directories. The advantage of this kind of whiteout is that they can be escaped using the standard overlay xattr escaping mechanism. So, a file with a "overlay.overlay.whiteout" xattr would be unescaped to "overlay.whiteout", which could then be consumed by another overlayfs as a whiteout. Overlayfs itself doesn't create whiteouts like this, but a userspace mechanism could use this alternative mechanism to convert images that may contain whiteouts to be used with overlayfs. To work as a whiteout for both regular overlayfs mounts as well as userxattr mounts both the "user.overlay.whiteout" and the "trusted.overlay.whiteout" xattrs will need to be created. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:59 +02:00
Amir Goldstein	5b02bfc1e7	ovl: do not encode lower fh with upper sb_writers held When lower fs is a nested overlayfs, calling encode_fh() on a lower directory dentry may trigger copy up and take sb_writers on the upper fs of the lower nested overlayfs. The lower nested overlayfs may have the same upper fs as this overlayfs, so nested sb_writers lock is illegal. Move all the callers that encode lower fh to before ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:57 +02:00
Amir Goldstein	c63e56a4a6	ovl: do not open/llseek lower file with upper sb_writers held overlayfs file open (ovl_maybe_lookup_lowerdata) and overlay file llseek take the ovl_inode_lock, without holding upper sb_writers. In case of nested lower overlay that uses same upper fs as this overlay, lockdep will warn about (possibly false positive) circular lock dependency when doing open/llseek of lower ovl file during copy up with our upper sb_writers held, because the locking ordering seems reverse to the locking order in ovl_copy_up_start(): - lower ovl_inode_lock - upper sb_writers Let the copy up "transaction" keeps an elevated mnt write count on upper mnt, but leaves taking upper sb_writers to lower level helpers only when they actually need it. This allows to avoid holding upper sb_writers during lower file open/llseek and prevents the lockdep warning. Minimizing the scope of upper sb_writers during copy up is also needed for fixing another possible deadlocks by a following patch. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:57 +02:00
Amir Goldstein	162d064440	ovl: reorder ovl_want_write() after ovl_inode_lock() Make the locking order of ovl_inode_lock() strictly between the two vfs stacked layers, i.e.: - ovl vfs locks: sb_writers, inode_lock, ... - ovl_inode_lock - upper vfs locks: sb_writers, inode_lock, ... To that effect, move ovl_want_write() into the helpers ovl_nlink_start() and ovl_copy_up_start which currently take the ovl_inode_lock() after ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:57 +02:00
Amir Goldstein	d08d3b3c2c	ovl: split ovl_want_write() into two helpers ovl_get_write_access() gets write access to upper mnt without taking freeze protection on upper sb and ovl_start_write() only takes freeze protection on upper sb. These helpers will be used to breakup the large ovl_want_write() scope during copy up into finer grained freeze protection scopes. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:57 +02:00
Amir Goldstein	f7621b11e8	ovl: protect copying of realinode attributes to ovl inode ovl_copyattr() may be called concurrently from aio completion context without any lock and that could lead to overlay inode attributes getting permanently out of sync with real inode attributes. Use ovl inode spinlock to protect ovl_copyattr(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-10-31 00:12:55 +02:00
Amir Goldstein	66c62769bc	exportfs: add helpers to check if filesystem can encode/decode file handles The logic of whether filesystem can encode/decode file handles is open coded in many places. In preparation to changing the logic, move the open coded logic into inline helpers. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-24 17:57:45 +02:00
Jeff Layton	4ddbd0f1fe	overlayfs: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-58-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-18 14:08:25 +02:00
Linus Torvalds	63580f669d	overlayfs update for 6.6 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmTu0QoACgkQEVvVyTe/ 1WpbzBAAjIZXzhn8KldDpG0muw9JKaSOxM45uhZE1s/2uKsVCyp4k3lubTbxxYO1 S9rUjhF2gSJFOfuSOK/XXEKXyu4MGT7iy7pKswu0k8+AHDDRBksPXJKA/AkhLPUr vX1pU6aWw2OSn1xdhIgY+F4DveyzYQL/CEoUzFyRPxSB0G/yjktRAjdZ2HL4cAvN eVXPyTj0bd4LVj1ITla4uj8DbgivrqmRJbZ9bKnSRE8GXWBriJhV//M2Q3QRno+W 04TtAvyh+klQeqZFVOQ0reZUFZzYBBZZTmqoFiUzTny7oljWl5F0+JfJOHhRGknG LYZCia34+T6TZPhOnZzT/szTDoXVvNJhEf+vBQCqhaCugqJc/2uJdw9CW8ZcDvA9 ZNOMxEbXE4VgGjJ0HM6MoDMUoIEUiNWEnXWEaKyCAfOPqgYwPy+QeDO4JtBPQpRn fwZx7Xpc1FLpTc9feHxzox9o81S8rPRMycUBg2c3KZB6TFnYNDxWIIo365naMCzz A8IDVGf+gd+S4NaZvh9FUijciIslYfyFgqwQERZmJnpDk1d1NyeUC7Nn7EkmUpyp guRaC+rUcqYP4CpuSHTCPle94qHqiAkbsKSJWebZ2M1j9fjZ+okPw0k83Nih79vu vRhs70Ah51v1lpBb0mlDjsV3vKm3Apv8nMJKZvVuC+Cw6Qiob5s= =F4Hi -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - add verification feature needed by composefs (Alexander Larsson) - improve integration of overlayfs and fanotify (Amir Goldstein) - fortify some overlayfs code (Andrea Righi) * tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: validate superblock in OVL_FS() ovl: make consistent use of OVL_FS() ovl: Kconfig: introduce CONFIG_OVERLAY_FS_DEBUG ovl: auto generate uuid for new overlay filesystems ovl: store persistent uuid/fsid with uuid=on ovl: add support for unique fsid per instance ovl: support encoding non-decodable file handles ovl: Handle verity during copy-up ovl: Validate verity xattr when resolving lowerdata ovl: Add versioned header for overlay.metacopy xattr ovl: Add framework for verity support	2023-08-30 11:54:09 -07:00
Andrea Righi	f01d08899f	ovl: make consistent use of OVL_FS() Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a struct super_block. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-08-12 19:02:54 +03:00
Amir Goldstein	cbb44f0935	ovl: auto generate uuid for new overlay filesystems Add a new mount option uuid=auto, which is the default. If a persistent UUID xattr is found it is used. Otherwise, an existing ovelrayfs with copied up subdirs in upper dir that was never mounted with uuid=on retains the null UUID. A new overlayfs with no copied up subdirs, generates the persistent UUID on first mount. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-08-12 19:02:50 +03:00
Amir Goldstein	d9544c1b0d	ovl: store persistent uuid/fsid with uuid=on With uuid=on, store a persistent uuid in xattr on the upper dir to give the overlayfs instance a persistent identifier. This also makes f_fsid persistent and more reliable for reporting fid info in fanotify events. uuid=on is not supported on non-upper overlayfs or with upper fs that does not support xattrs. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-08-12 19:02:50 +03:00
Alexander Larsson	0c71faf5a6	ovl: Handle verity during copy-up During regular metacopy, if lowerdata file has fs-verity enabled, and the verity option is enabled, we add the digest to the metacopy xattr. If verity is required, and lowerdata does not have fs-verity enabled, fall back to full copy-up (or the generated metacopy would not validate). Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-08-12 19:02:38 +03:00
Alexander Larsson	184996e92e	ovl: Validate verity xattr when resolving lowerdata The new digest field in the metacopy xattr is used during lookup to record whether the header contained a digest in the OVL_HAS_DIGEST flags. When accessing file data the first time, if OVL_HAS_DIGEST is set, we reload the metadata and check that the source lowerdata inode matches the specified digest in it (according to the enabled verity options). If the verity check passes we store this info in the inode flags as OVL_VERIFIED_DIGEST, so that we can avoid doing it again if the inode remains in memory. The verification is done in ovl_maybe_validate_verity() which needs to be called in the same places as ovl_maybe_lookup_lowerdata(), so there is a new ovl_verify_lowerdata() helper that calls these in the right order, and all current callers of ovl_maybe_lookup_lowerdata() are changed to call it instead. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-08-12 19:02:38 +03:00
Alexander Larsson	bf07089081	ovl: Add versioned header for overlay.metacopy xattr Historically overlay.metacopy was a zero-size xattr, and it's existence marked a metacopy file. This change adds a versioned header with a flag field, a length and a digest. The initial use-case of this will be for validating a fs-verity digest, but the flags field could also be used later for other new features. ovl_check_metacopy_xattr() now returns the size of the xattr, emulating a size of OVL_METACOPY_MIN_SIZE for empty xattrs to distinguish it from the no-xattr case. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-08-12 19:02:38 +03:00
Jeff Layton	9aa7111523	overlayfs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-64-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-07-24 10:30:03 +02:00
Amir Goldstein	af5f2396b6	ovl: store enum redirect_mode in config instead of a string Do all the logic to set the mode during mount options parsing and do not keep the option string around. Use a constant_table to translate from enum redirect mode to string in preperation for new mount api option parsing. The mount option "off" is translated to either "follow" or "nofollow", depending on the "redirect_always_follow" build/module config, so in effect, there are only three possible redirect modes. This results in a minor change to the string that is displayed in show_options() - when redirect_dir is enabled by default and the user mounts with the option "redirect_dir=off", instead of displaying the mode "redirect_dir=off" in show_options(), the displayed mode will be either "redirect_dir=follow" or "redirect_dir=nofollow", depending on the value of "redirect_always_follow" build/module config. The displayed mode reflects the effective mode, so mounting overlayfs again with the dispalyed redirect_dir option will result with the same effective and displayed mode. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-06-19 14:02:01 +03:00
Amir Goldstein	42dd69ae1a	ovl: implement lazy lookup of lowerdata in data-only layers Defer lookup of lowerdata in the data-only layers to first data access or before copy up. We perform lowerdata lookup before copy up even if copy up is metadata only copy up. We can further optimize this lookup later if needed. We do best effort lazy lookup of lowerdata for d_real_inode(), because this interface does not expect errors. The only current in-tree caller of d_real_inode() is trace_uprobe and this caller is likely going to be followed reading from the file, before placing uprobes on offset within the file, so lowerdata should be available when setting the uprobe. Tested-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:14 +03:00
Amir Goldstein	4166564478	ovl: prepare for lazy lookup of lowerdata inode Make the code handle the case of numlower > 1 and missing lowerdata dentry gracefully. Missing lowerdata dentry is an indication for lazy lookup of lowerdata and in that case the lowerdata_redirect path is stored in ovl_inode. Following commits will defer lookup and perform the lazy lookup on access. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:14 +03:00
Amir Goldstein	2b21da9208	ovl: prepare to store lowerdata redirect for lazy lowerdata lookup Prepare to allow ovl_lookup() to leave the last entry in a non-dir lowerstack empty to signify lazy lowerdata lookup. In this case, ovl_lookup() stores the redirect path from metacopy to lowerdata in ovl_inode, which is going to be used later to perform the lazy lowerdata lookup. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:14 +03:00
Amir Goldstein	ab1eb5ffb7	ovl: deduplicate lowerdata and lowerstack[] The ovl_inode contains a copy of lowerdata in lowerstack[], so the lowerdata inode member can be removed. Use accessors ovl_lowerdata*() to get the lowerdata whereever the member was accessed directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	ac900ed4f2	ovl: deduplicate lowerpath and lowerstack[] The ovl_inode contains a copy of lowerpath in lowerstack[0], so the lowerpath member can be removed. Use accessor ovl_lowerpath() to get the lowerpath whereever the member was accessed directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	0af950f57f	ovl: move ovl_entry into ovl_inode The lower stacks of all the ovl inode aliases should be identical and there is redundant information in ovl_entry and ovl_inode. Move lowerstack into ovl_inode and keep only the OVL_E_FLAGS per overlay dentry. Following patches will deduplicate redundant ovl_inode fields. Note that for pure upper and negative dentries, OVL_E(dentry) may be NULL now, so it is imporatnt to use the ovl_numlower() accessor. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	163db0da35	ovl: factor out ovl_free_entry() and ovl_stack_*() helpers In preparation for moving lowerstack into ovl_inode. Note that in ovl_lookup() the temp stack dentry refs are now cloned into the final ovl_lowerstack instead of being transferred, so cleanup always needs to call ovl_stack_free(stack). Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	5522c9c7cb	ovl: use ovl_numlower() and ovl_lowerstack() accessors This helps fortify against dereferencing a NULL ovl_entry, before we move the ovl_entry reference into ovl_inode. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00

1 2 3 4

167 commits