Commit graph

153 commits

Author SHA1 Message Date
Linus Torvalds
934600daa7 vfs-6.17-rc1.ovl
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINakQAKCRCRxhvAZXjc
 okGZAP9CUQfiiT3DUq0pAeuXR2BjpjM8hnNTlO7REC/AmoDWcQD/SDZWfjP2uhtk
 TgGlT1fS5cVcRf72+8JBtT7LGmDB7wA=
 =5vdH
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull overlayfs updates from Christian Brauner:
 "This contains overlayfs updates for this cycle.

  The changes for overlayfs in here are primarily focussed on preparing
  for some proposed changes to directory locking.

  Overlayfs currently will sometimes lock a directory on the upper
  filesystem and do a few different things while holding the lock. This
  is incompatible with the new potential scheme.

  This series narrows the region of code protected by the directory
  lock, taking it multiple times when necessary. This theoretically
  opens up the possibilty of other changes happening on the upper
  filesytem between the unlock and the lock. To some extent the patches
  guard against that by checking the dentries still have the expect
  parent after retaking the lock. In general, concurrent changes to the
  upper and lower filesystems aren't supported properly anyway"

* tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  ovl: properly print correct variable
  ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()
  ovl: change ovl_create_real() to receive dentry parent
  ovl: narrow locking in ovl_check_rename_whiteout()
  ovl: narrow locking in ovl_whiteout()
  ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed
  ovl: narrow locking on ovl_remove_and_whiteout()
  ovl: change ovl_workdir_cleanup() to take dir lock as needed.
  ovl: narrow locking in ovl_workdir_cleanup_recurse()
  ovl: narrow locking in ovl_indexdir_cleanup()
  ovl: narrow locking in ovl_workdir_create()
  ovl: narrow locking in ovl_cleanup_index()
  ovl: narrow locking in ovl_cleanup_whiteouts()
  ovl: narrow locking in ovl_rename()
  ovl: simplify gotos in ovl_rename()
  ovl: narrow locking in ovl_create_over_whiteout()
  ovl: narrow locking in ovl_clear_empty()
  ovl: narrow locking in ovl_create_upper()
  ovl: narrow the locked region in ovl_copy_up_workdir()
  ovl: Call ovl_create_temp() without lock held.
  ...
2025-07-28 12:20:06 -07:00
Amir Goldstein
083957f961
ovl: support layers on case-folding capable filesystems
Case folding is often applied to subtrees and not on an entire
filesystem.

Disallowing layers from filesystems that support case folding is over
limiting.

Replace the rule that case-folding capable are not allowed as layers
with a rule that case folded directories are not allowed in a merged
directory stack.

Should case folding be enabled on an underlying directory while
overlayfs is mounted the outcome is generally undefined.

Specifically in ovl_lookup(), we check the base underlying directory
and fail with -ESTALE and write a warning to kmsg if an underlying
directory case folding is enabled.

Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/linux-fsdevel/20250520051600.1903319-1-kent.overstreet@linux.dev/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250602171702.1941891-1-amir73il@gmail.com
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18 11:09:33 +02:00
Linus Torvalds
fe78e02600 vfs-6.16-rc3.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaE/PTwAKCRCRxhvAZXjc
 oo7dAQDCEgd22Of2ibYK0wza1RE17Qjm1Qljt0tHUxki/3Gr/QD9EAJyIhEjplMj
 ntEQrlByWVw8aOVWwtSjFVq55mMLrgI=
 =SNXv
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix a regression in overlayfs caused by reworking the lookup_one*()
   set of helpers

 - Make sure that the name of the dentry is printed in overlayfs'
   mkdir() helper

 - Add missing iocb values to TRACE_IOCB_STRINGS define

 - Unlock the superblock during iterate_supers_type(). This was an
   accidental internal api change

 - Drop a misleading assert in file_seek_cur_needs_f_lock() helper

 - Never refuse to return PIDFD_GET_INGO when parent pid is zero

   That can trivially happen in container scenarios where the parent
   process might be located in an ancestor pid namespace

 - Don't revalidate in try_lookup_noperm() as that causes regression for
   filesystems such as cifs

 - Fix simple_xattr_list() and reset the err variable after
   security_inode_listsecurity() got called so as not to confuse
   userspace about the length of the xattr

* tag 'vfs-6.16-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: drop assert in file_seek_cur_needs_f_lock
  fs: unlock the superblock during iterate_supers_type
  ovl: fix debug print in case of mkdir error
  VFS: change try_lookup_noperm() to skip revalidation
  fs: add missing values to TRACE_IOCB_STRINGS
  fs/xattr.c: fix simple_xattr_list()
  ovl: fix regression caused by lookup helpers API changes
  pidfs: never refuse ppid == 0 in PIDFD_GET_INFO
2025-06-16 08:18:43 -07:00
Linus Torvalds
28fb80f089 overlayfs update for 6.16
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCaEKzdwAKCRDh3BK/laaZ
 PLpuAQCK2B/LsbyLslWVN6lWbQNwiPHF7l49+GjS2BaWVDxnTwEAwdpaktgg7tRI
 wsMp9CEc0lbp8lMDjHDOEqhc/Qvejg4=
 =Y7HA
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-v2-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs update from Miklos Szeredi:

 - Fix a regression in getting the path of an open file (e.g. in
   /proc/PID/maps) for a nested overlayfs setup (André Almeida)

 - Support data-only layers and verity in a user namespace (unprivileged
   composefs use case)

 - Fix a gcc warning (Kees)

 - Cleanups

* tag 'ovl-update-v2-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: Annotate struct ovl_entry with __counted_by()
  ovl: Replace offsetof() with struct_size() in ovl_stack_free()
  ovl: Replace offsetof() with struct_size() in ovl_cache_entry_new()
  ovl: Check for NULL d_inode() in ovl_dentry_upper()
  ovl: Use str_on_off() helper in ovl_show_options()
  ovl: don't require "metacopy=on" for "verity"
  ovl: relax redirect/metacopy requirements for lower -> data redirect
  ovl: make redirect/metacopy rejection consistent
  ovl: Fix nested backing file paths
2025-06-06 17:54:09 -07:00
Amir Goldstein
714d02b419
ovl: fix regression caused by lookup helpers API changes
The lookup helpers API was changed by merge of vfs-6.16-rc1.async.dir to
pass a non-const qstr pointer argument to lookup_one*() helpers.

All of the callers of this API were changed to pass a pointer to temp
copy of qstr, except overlays that was passing a const pointer to
dentry->d_name that was changed to pass a non-const copy instead
when doing a lookup in lower layer which is not the fs of said dentry.

This wrong use of the API caused a regression in fstest overlay/012.

Fix the regression by making a non-const copy of dentry->d_name prior
to calling the lookup API, but the API should be fixed to not allow this
class of bugs.

Cc: NeilBrown <neilb@suse.de>
Fixes: 5741909697 ("VFS: improve interface for lookup_one functions")
Fixes: 390e34bc14 ("VFS: change lookup_one_common and lookup_noperm_common to take a qstr")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250605101530.2336320-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-05 13:17:08 +02:00
Miklos Szeredi
5ef7bcdeec ovl: relax redirect/metacopy requirements for lower -> data redirect
Allow the special case of a redirect from a lower layer to a data layer
without having to turn on metacopy.  This makes the feature work with
userxattr, which in turn allows data layers to be usable in user
namespaces.

Minimize the risk by only enabling redirect from a single lower layer to a
data layer iff a data layer is specified.  The only way to access a data
layer is to enable this, so there's really no reason not to enable this.

This can be used safely if the lower layer is read-only and the
user.overlay.redirect xattr cannot be modified.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-04-30 10:55:27 +02:00
Miklos Szeredi
a6fcfe9bb2 ovl: make redirect/metacopy rejection consistent
When overlayfs finds a file with metacopy and/or redirect attributes and
the metacopy and/or redirect features are not enabled, then it refuses to
act on those attributes while also issuing a warning.

There was an inconsistency in not checking metacopy found from the index.

And also only warning on an upper metacopy if it found the next file on the
lower layer, while always warning for metacopy found on a lower layer.

Fix these inconsistencies and make the logic more straightforward, paving
the way for following patches to change when data redirects are allowed.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-04-30 10:55:27 +02:00
NeilBrown
fa6fe07d15
VFS: rename lookup_one_len family to lookup_noperm and remove permission check
The lookup_one_len family of functions is (now) only used internally by
a filesystem on itself either
- in a context where permission checking is irrelevant such as by a
  virtual filesystem populating itself, or xfs accessing its ORPHANAGE
  or dquota accessing the quota file; or
- in a context where a permission check (MAY_EXEC on the parent) has just
  been performed such as a network filesystem finding in "silly-rename"
  file in the same directory.  This is also the context after the
  _parentat() functions where currently lookup_one_qstr_excl() is used.

So the permission check is pointless.

The name "one_len" is unhelpful in understanding the purpose of these
functions and should be changed.  Most of the callers pass the len as
"strlen()" so using a qstr and QSTR() can simplify the code.

This patch renames these functions (include lookup_positive_unlocked()
which is part of the family despite the name) to have a name based on
"lookup_noperm".  They are changed to receive a 'struct qstr' instead
of separate name and len.  In a few cases the use of QSTR() results in a
new call to strlen().

try_lookup_noperm() takes a pointer to a qstr instead of the whole
qstr.  This is consistent with d_hash_and_lookup() (which is nearly
identical) and useful for lookup_noperm_unlocked().

The new lookup_noperm_common() doesn't take a qstr yet.  That will be
tidied up in a subsequent patch.

Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08 11:24:36 +02:00
NeilBrown
5741909697
VFS: improve interface for lookup_one functions
The family of functions:
  lookup_one()
  lookup_one_unlocked()
  lookup_one_positive_unlocked()

appear designed to be used by external clients of the filesystem rather
than by filesystems acting on themselves as the lookup_one_len family
are used.

They are used by:
   btrfs/ioctl - which is a user-space interface rather than an internal
     activity
   exportfs - i.e. from nfsd or the open_by_handle_at interface
   overlayfs - at access the underlying filesystems
   smb/server - for file service

They should be used by nfsd (more than just the exportfs path) and
cachefs but aren't.

It would help if the documentation didn't claim they should "not be
called by generic code".

Also the path component name is passed as "name" and "len" which are
(confusingly?) separate by the "base".  In some cases the len in simply
"strlen" and so passing a qstr using QSTR() would make the calling
clearer.
Other callers do pass separate name and len which are stored in a
struct.  Sometimes these are already stored in a qstr, other times it
easily could be.

So this patch changes these three functions to receive a 'struct qstr *',
and improves the documentation.

QSTR_LEN() is added to make it easy to pass a QSTR containing a known
len.

[brauner@kernel.org: take a struct qstr pointer]
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-2-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:25:32 +02:00
Linus Torvalds
a86bf2283d assorted stuff for this merge window
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5yJdgAKCRBZ7Krx/gZQ
 69W4AQDwgxceiQ6icx3rFhCWQigne4jdMO84kd8tNaa+xHGe1AD/WnkeChc5DqjQ
 wZWZxAAzml9SS01IcSiHWaF5fgrjlA0=
 =rXOq
 -----END PGP SIGNATURE-----

Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs cleanups from Al Viro:
 "Two unrelated patches - one is a removal of long-obsolete include in
  overlayfs (it used to need fs/internal.h, but the extern it wanted has
  been moved back to include/linux/namei.h) and another introduces
  convenience helper constructing struct qstr by a NUL-terminated
  string"

* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  add a string-to-qstr constructor
  fs/overlayfs/namei.c: get rid of include ../internal.h
2025-02-01 15:07:56 -08:00
Al Viro
5f4e6f7f8b fs/overlayfs/namei.c: get rid of include ../internal.h
Added for the sake of vfs_path_lookup(), which is in linux/namei.h
these days.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-16 21:05:09 -05:00
Amir Goldstein
07aeefae7f
ovl: pass realinode to ovl_encode_real_fh() instead of realdentry
We want to be able to encode an fid from an inode with no alias.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250105162404.357058-2-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-06 15:43:55 +01:00
Vinicius Costa Gomes
fc5a1d2287 ovl: use wrapper ovl_revert_creds()
Introduce ovl_revert_creds() wrapper of revert_creds() to
match callers of ovl_override_creds().

Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-11-11 10:45:04 +01:00
Amir Goldstein
420332b941 ovl: mark xwhiteouts directory with overlay.opaque='x'
An opaque directory cannot have xwhiteouts, so instead of marking an
xwhiteouts directory with a new xattr, overload overlay.opaque xattr
for marking both opaque dir ('y') and xwhiteouts dir ('x').

This is more efficient as the overlay.opaque xattr is checked during
lookup of directory anyway.

This also prevents unnecessary checking the xattr when reading a
directory without xwhiteouts, i.e. most of the time.

Note that the xwhiteouts marker is not checked on the upper layer and
on the last layer in lowerstack, where xwhiteouts are not expected.

Fixes: bc8df7a3dc ("ovl: Add an alternative type of whiteout")
Cc: <stable@vger.kernel.org> # v6.7
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Tested-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-01-23 12:39:48 +02:00
Amir Goldstein
02d70090e0 ovl: remove redundant ofs->indexdir member
When the index feature is disabled, ofs->indexdir is NULL.
When the index feature is enabled, ofs->indexdir has the same value as
ofs->workdir and takes an extra reference.

This makes the code harder to understand when it is not always clear
that ofs->indexdir in one function is the same dentry as ofs->workdir
in another function.

Remove this redundancy, by referencing ofs->workdir directly in index
helpers and by using the ovl_indexdir() accessor in generic code.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-11-20 09:49:09 +02:00
Alexander Larsson
bc8df7a3dc ovl: Add an alternative type of whiteout
An xattr whiteout (called "xwhiteout" in the code) is a reguar file of
zero size with the "overlay.whiteout" xattr set. A file like this in a
directory with the "overlay.whiteouts" xattrs set will be treated the
same way as a regular whiteout.

The "overlay.whiteouts" directory xattr is used in order to
efficiently handle overlay checks in readdir(), as we only need to
checks xattrs in affected directories.

The advantage of this kind of whiteout is that they can be escaped
using the standard overlay xattr escaping mechanism. So, a file with a
"overlay.overlay.whiteout" xattr would be unescaped to
"overlay.whiteout", which could then be consumed by another overlayfs
as a whiteout.

Overlayfs itself doesn't create whiteouts like this, but a userspace
mechanism could use this alternative mechanism to convert images that
may contain whiteouts to be used with overlayfs.

To work as a whiteout for both regular overlayfs mounts as well as
userxattr mounts both the "user.overlay.whiteout*" and the
"trusted.overlay.whiteout*" xattrs will need to be created.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31 00:12:59 +02:00
Amir Goldstein
5b02bfc1e7 ovl: do not encode lower fh with upper sb_writers held
When lower fs is a nested overlayfs, calling encode_fh() on a lower
directory dentry may trigger copy up and take sb_writers on the upper fs
of the lower nested overlayfs.

The lower nested overlayfs may have the same upper fs as this overlayfs,
so nested sb_writers lock is illegal.

Move all the callers that encode lower fh to before ovl_want_write().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31 00:12:57 +02:00
Andrea Righi
f01d08899f ovl: make consistent use of OVL_FS()
Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a
struct super_block.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:54 +03:00
Amir Goldstein
b0504bfe1b ovl: add support for unique fsid per instance
The legacy behavior of ovl_statfs() reports the f_fsid filled by
underlying upper fs. This fsid is not unique among overlayfs instances
on the same upper fs.

With mount option uuid=on, generate a non-persistent uuid per overlayfs
instance and use it as the seed for f_fsid, similar to tmpfs.

This is useful for reporting fanotify events with fid info from different
instances of overlayfs over the same upper fs.

The old behavior of null uuid and upper fs fsid is retained with the
mount option uuid=null, which is the default.

The mount option uuid=off that disables uuid checks in underlying layers
also retains the legacy behavior.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:50 +03:00
Alexander Larsson
184996e92e ovl: Validate verity xattr when resolving lowerdata
The new digest field in the metacopy xattr is used during lookup to
record whether the header contained a digest in the OVL_HAS_DIGEST
flags.

When accessing file data the first time, if OVL_HAS_DIGEST is set, we
reload the metadata and check that the source lowerdata inode matches
the specified digest in it (according to the enabled verity
options). If the verity check passes we store this info in the inode
flags as OVL_VERIFIED_DIGEST, so that we can avoid doing it again if
the inode remains in memory.

The verification is done in ovl_maybe_validate_verity() which needs to
be called in the same places as ovl_maybe_lookup_lowerdata(), so there
is a new ovl_verify_lowerdata() helper that calls these in the right
order, and all current callers of ovl_maybe_lookup_lowerdata() are
changed to call it instead.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:38 +03:00
Alexander Larsson
bf07089081 ovl: Add versioned header for overlay.metacopy xattr
Historically overlay.metacopy was a zero-size xattr, and it's
existence marked a metacopy file. This change adds a versioned header
with a flag field, a length and a digest. The initial use-case of this
will be for validating a fs-verity digest, but the flags field could
also be used later for other new features.

ovl_check_metacopy_xattr() now returns the size of the xattr,
emulating a size of OVL_METACOPY_MIN_SIZE for empty xattrs to
distinguish it from the no-xattr case.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:38 +03:00
Amir Goldstein
af5f2396b6 ovl: store enum redirect_mode in config instead of a string
Do all the logic to set the mode during mount options parsing and
do not keep the option string around.

Use a constant_table to translate from enum redirect mode to string
in preperation for new mount api option parsing.

The mount option "off" is translated to either "follow" or "nofollow",
depending on the "redirect_always_follow" build/module config, so
in effect, there are only three possible redirect modes.

This results in a minor change to the string that is displayed
in show_options() - when redirect_dir is enabled by default and the user
mounts with the option "redirect_dir=off", instead of displaying the mode
"redirect_dir=off" in show_options(), the displayed mode will be either
"redirect_dir=follow" or "redirect_dir=nofollow", depending on the value
of "redirect_always_follow" build/module config.

The displayed mode reflects the effective mode, so mounting overlayfs
again with the dispalyed redirect_dir option will result with the same
effective and displayed mode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-19 14:02:01 +03:00
Amir Goldstein
42dd69ae1a ovl: implement lazy lookup of lowerdata in data-only layers
Defer lookup of lowerdata in the data-only layers to first data access
or before copy up.

We perform lowerdata lookup before copy up even if copy up is metadata
only copy up.  We can further optimize this lookup later if needed.

We do best effort lazy lookup of lowerdata for d_real_inode(), because
this interface does not expect errors.  The only current in-tree caller
of d_real_inode() is trace_uprobe and this caller is likely going to be
followed reading from the file, before placing uprobes on offset within
the file, so lowerdata should be available when setting the uprobe.

Tested-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:14 +03:00
Amir Goldstein
2b21da9208 ovl: prepare to store lowerdata redirect for lazy lowerdata lookup
Prepare to allow ovl_lookup() to leave the last entry in a non-dir
lowerstack empty to signify lazy lowerdata lookup.

In this case, ovl_lookup() stores the redirect path from metacopy to
lowerdata in ovl_inode, which is going to be used later to perform the
lazy lowerdata lookup.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:14 +03:00
Amir Goldstein
5436ab0a86 ovl: implement lookup in data-only layers
Lookup in data-only layers only for a lower metacopy with an absolute
redirect xattr.

The metacopy xattr is not checked on files found in the data-only layers
and redirect xattr are not followed in the data-only layers.

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
37ebf056d6 ovl: introduce data-only lower layers
Introduce the format lowerdir=lower1:lower2::lowerdata1::lowerdata2
where the lower layers on the right of the :: separators are not merged
into the overlayfs merge dirs.

Data-only lower layers are only allowed at the bottom of the stack.

The files in those layers are only meant to be accessible via absolute
redirect from metacopy files in lower layers.  Following changes will
implement lookup in the data layers.

This feature was requested for composefs ostree use case, where the
lower data layer should only be accessiable via absolute redirects
from metacopy inodes.

The lower data layers are not required to a have a unique uuid or any
uuid at all, because they are never used to compose the overlayfs inode
st_ino/st_dev.

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
ab1eb5ffb7 ovl: deduplicate lowerdata and lowerstack[]
The ovl_inode contains a copy of lowerdata in lowerstack[], so the
lowerdata inode member can be removed.

Use accessors ovl_lowerdata*() to get the lowerdata whereever the member
was accessed directly.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
ac900ed4f2 ovl: deduplicate lowerpath and lowerstack[]
The ovl_inode contains a copy of lowerpath in lowerstack[0], so the
lowerpath member can be removed.

Use accessor ovl_lowerpath() to get the lowerpath whereever the member
was accessed directly.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
0af950f57f ovl: move ovl_entry into ovl_inode
The lower stacks of all the ovl inode aliases should be identical
and there is redundant information in ovl_entry and ovl_inode.

Move lowerstack into ovl_inode and keep only the OVL_E_FLAGS
per overlay dentry.

Following patches will deduplicate redundant ovl_inode fields.

Note that for pure upper and negative dentries, OVL_E(dentry) may be
NULL now, so it is imporatnt to use the ovl_numlower() accessor.

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
163db0da35 ovl: factor out ovl_free_entry() and ovl_stack_*() helpers
In preparation for moving lowerstack into ovl_inode.

Note that in ovl_lookup() the temp stack dentry refs are now cloned
into the final ovl_lowerstack instead of being transferred, so cleanup
always needs to call ovl_stack_free(stack).

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
5522c9c7cb ovl: use ovl_numlower() and ovl_lowerstack() accessors
This helps fortify against dereferencing a NULL ovl_entry,
before we move the ovl_entry reference into ovl_inode.

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Amir Goldstein
a6ff2bc0be ovl: use OVL_E() and OVL_E_FLAGS() accessors
Instead of open coded instances, because we are about to split
the two apart.

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:12 +03:00
Amir Goldstein
b07d5cc93e ovl: update of dentry revalidate flags after copy up
After copy up, we may need to update d_flags if upper dentry is on a
remote fs and lower dentries are not.

Add helpers to allow incremental update of the revalidate flags.

Fixes: bccece1ead ("ovl: allow remote upper")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:12 +03:00
Christian Brauner
4609e1f18e
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Stanislav Goriainov
cf4ef7801a ovl: Add comment on upperredirect reassignment
If memory for uperredirect was allocated with kstrdup() in upperdir != NULL
and d.redirect != NULL path, it may seem that it can be lost when
upperredirect is reassigned later, but it's not possible.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 0a2d0d3f2f ("ovl: Check redirect on index as well")
Signed-off-by: Stanislav Goriainov <goriainov@ispras.ru>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Amir Goldstein
8ea2876577 ovl: do not reconnect upper index records in ovl_indexdir_cleanup()
ovl_indexdir_cleanup() is called on mount of overayfs with nfs_export
feature to cleanup stale index records for lower and upper files that have
been deleted while overlayfs was offline.

This has the side effect (good or bad) of pre populating inode cache with
all the copied up upper inodes, while verifying the index entries.

For copied up directories, the upper file handles are decoded to conncted
upper dentries.  This has the even bigger side effect of reading the
content of all the parent upper directories which may take significantly
more time and IO than just reading the upper inodes.

Do not request connceted upper dentries for verifying upper directory index
entries, because we have no use for the connected dentry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Al Viro
2d3430875a overlayfs: constify path
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01 17:38:07 -04:00
William Dean
4f1196288d ovl: fix spelling mistakes
fix follow spelling misktakes:
	decendant  ==> descendant
	indentify  ==> identify
	underlaying ==> underlying

Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-08-02 15:41:10 +02:00
Christian Brauner
ba9ea771ec ovl: handle idmappings for layer lookup
Make the two places where lookup helpers can be called either on lower
or upper layers take the mount's idmapping into account. To this end we
pass down the mount in struct ovl_lookup_data. It can later also be used
to construct struct path for various other helpers. This is needed to
support idmapped base layers with overlay.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28 16:31:12 +02:00
Christian Brauner
dad7017a84 ovl: use ovl_path_getxattr() wrapper
Add a helper that allows to retrieve ovl xattrs from either lower or
upper layers. To stop passing mnt and dentry separately everywhere use
struct path which more accurately reflects the tight coupling between
mount and dentry in this helper. Swich over all places to pass a path
argument that can operate on either upper or lower layers. This is
needed to support idmapped base layers with overlayfs.

Some helpers are always called with an upper dentry, which is now utilized
by these helpers to create the path.  Make this usage explicit by renaming
the argument to "upperdentry" and by renaming the function as well in some
cases.  Also add a check in ovl_do_getxattr() to catch misuse of these
functions.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28 16:31:11 +02:00
Amir Goldstein
c914c0e27e ovl: use wrappers to all vfs_*xattr() calls
Use helpers ovl_*xattr() to access user/trusted.overlay.* xattrs
and use helpers ovl_do_*xattr() to access generic xattrs. This is a
preparatory patch for using idmapped base layers with overlay.

Note that a few of those places called vfs_*xattr() calls directly to
reduce the amount of debug output. But as Miklos pointed out since
overlayfs has been stable for quite some time the debug output isn't all
that relevant anymore and the additional debug in all locations was
actually quite helpful when developing this patch series.

Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-04-28 16:31:10 +02:00
Amir Goldstein
ffb24e3c65 ovl: relax lookup error on mismatch origin ftype
We get occasional reports of lookup errors due to mismatched
origin ftype from users that re-format a lower squashfs image.

Commit 13c6ad0f45 ("ovl: document lower modification caveats")
tries to discourage the practice of re-formating lower layers and
describes the expected behavior as undefined.

Commit b0e0f69731 ("ovl: restrict lower null uuid for "xino=auto"")
limits the configurations in which origin file handles are followed.

In addition to these measures, change the behavior in case of detecting
a mismatch origin ftype in lookup to issue a warning, not follow origin,
but not fail the lookup operation either.

That should make overall more users happy without any big consequences.

Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxgPq9E9xxwU2CDyHy-_yCZZeymg+3n+-6AqkGGE1YtwvQ@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:44 +02:00
Amir Goldstein
a0c236b117 ovl: pass ovl_fs to ovl_check_setxattr()
Instead of passing the overlay dentry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:43 +02:00
Linus Torvalds
d652502ef4 overlayfs update for 5.13
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYIwTsgAKCRDh3BK/laaZ
 PDktAP41eScbCiFzXDRjXw9S7Wfd8HEct0y1p+9BUh8m3VdHfwEA0pDlJWNaJdYW
 nFixPJ5GsAfxo+1ags0vn06CUS/K4gA=
 =QlbJ
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs update from Miklos Szeredi:

 - Fix a regression introduced in 5.2 that resulted in valid overlayfs
   mounts being rejected with ELOOP (Too many levels of symbolic links)

 - Fix bugs found by various tools

 - Miscellaneous improvements and cleanups

* tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: add debug print to ovl_do_getxattr()
  ovl: invalidate readdir cache on changes to dir with origin
  ovl: allow upperdir inside lowerdir
  ovl: show "userxattr" in the mount data
  ovl: trivial typo fixes in the file inode.c
  ovl: fix misspellings using codespell tool
  ovl: do not copy attr several times
  ovl: remove ovl_map_dev_ino() return value
  ovl: fix error for ovl_fill_super()
  ovl: fix missing revert_creds() on error path
  ovl: fix leaked dentry
  ovl: restrict lower null uuid for "xino=auto"
  ovl: check that upperdir path is not on a read-only mount
  ovl: plumb through flush method
2021-04-30 15:17:08 -07:00
Mickaël Salaün
eaab1d45cd ovl: fix leaked dentry
Since commit 6815f479ca ("ovl: use only uppermetacopy state in
ovl_lookup()"), overlayfs doesn't put temporary dentry when there is a
metacopy error, which leads to dentry leaks when shutting down the related
superblock:

  overlayfs: refusing to follow metacopy origin for (/file0)
  ...
  BUG: Dentry (____ptrval____){i=3f33,n=file3}  still in use (1) [unmount of overlay overlay]
  ...
  WARNING: CPU: 1 PID: 432 at umount_check.cold+0x107/0x14d
  CPU: 1 PID: 432 Comm: unmount-overlay Not tainted 5.12.0-rc5 #1
  ...
  RIP: 0010:umount_check.cold+0x107/0x14d
  ...
  Call Trace:
   d_walk+0x28c/0x950
   ? dentry_lru_isolate+0x2b0/0x2b0
   ? __kasan_slab_free+0x12/0x20
   do_one_tree+0x33/0x60
   shrink_dcache_for_umount+0x78/0x1d0
   generic_shutdown_super+0x70/0x440
   kill_anon_super+0x3e/0x70
   deactivate_locked_super+0xc4/0x160
   deactivate_super+0xfa/0x140
   cleanup_mnt+0x22e/0x370
   __cleanup_mnt+0x1a/0x30
   task_work_run+0x139/0x210
   do_exit+0xb0c/0x2820
   ? __kasan_check_read+0x1d/0x30
   ? find_held_lock+0x35/0x160
   ? lock_release+0x1b6/0x660
   ? mm_update_next_owner+0xa20/0xa20
   ? reacquire_held_locks+0x3f0/0x3f0
   ? __sanitizer_cov_trace_const_cmp4+0x22/0x30
   do_group_exit+0x135/0x380
   __do_sys_exit_group.isra.0+0x20/0x20
   __x64_sys_exit_group+0x3c/0x50
   do_syscall_64+0x45/0x70
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  ...
  VFS: Busy inodes after unmount of overlay. Self-destruct in 5 seconds.  Have a nice day...

This fix has been tested with a syzkaller reproducer.

Cc: Amir Goldstein <amir73il@gmail.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 6815f479ca ("ovl: use only uppermetacopy state in ovl_lookup()")
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210329164907.2133175-1-mic@digikod.net
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:36 +02:00
Al Viro
6e3e2c4362 new helper: inode_wrong_type()
inode_wrong_type(inode, mode) returns true if setting inode->i_mode
to given value would've changed the inode type.  We have enough of
those checks open-coded to make a helper worthwhile.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-03-08 10:19:35 -05:00
Miklos Szeredi
c846af050f ovl: check privs before decoding file handle
CAP_DAC_READ_SEARCH is required by open_by_handle_at(2) so check it in
ovl_decode_real_fh() as well to prevent privilege escalation for
unprivileged overlay mounts.

[Amir] If the mounter is not capable in init ns, ovl_check_origin() and
ovl_verify_index() will not function as expected and this will break index
and nfs export features.  So check capability in ovl_can_decode_fh(), to
auto disable those features.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-12-14 15:26:14 +01:00
Kevin Locke
0a8d0b64dd ovl: warn about orphan metacopy
When the lower file of a metacopy is inaccessible, -EIO is returned.  For
users not familiar with overlayfs internals, such as myself, the meaning of
this error may not be apparent or easy to determine, since the (metacopy)
file is present and open/stat succeed when accessed outside of the overlay.

Add a rate-limited warning for orphan metacopy to give users a hint when
investigating such errors.

Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxi23Zsmfb4rCed1n=On0NNA5KZD74jjjeyz+et32sk-gg@mail.gmail.com/
Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-11-12 11:31:55 +01:00
Pavel Tikhomirov
5830fb6b54 ovl: introduce new "uuid=off" option for inodes index feature
This replaces uuid with null in overlayfs file handles and thus relaxes
uuid checks for overlay index feature. It is only possible in case there is
only one filesystem for all the work/upper/lower directories and bare file
handles from this backing filesystem are unique. In other case when we have
multiple filesystems lets just fallback to "uuid=on" which is and
equivalent of how it worked before with all uuid checks.

This is needed when overlayfs is/was mounted in a container with index
enabled (e.g.: to be able to resolve inotify watch file handles on it to
paths in CRIU), and this container is copied and started alongside with the
original one. This way the "copy" container can't have the same uuid on the
superblock and mounting the overlayfs from it later would fail.

That is an example of the problem on top of loop+ext4:

dd if=/dev/zero of=loopbackfile.img bs=100M count=10
losetup -fP loopbackfile.img
losetup -a
  #/dev/loop0: [64768]:35 (/loop-test/loopbackfile.img)
mkfs.ext4 loopbackfile.img
mkdir loop-mp
mount -o loop /dev/loop0 loop-mp
mkdir loop-mp/{lower,upper,work,merged}
mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
umount loop-mp/merged
umount loop-mp
e2fsck -f /dev/loop0
tune2fs -U random /dev/loop0

mount -o loop /dev/loop0 loop-mp
mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
  #mount: /loop-test/loop-mp/merged:
  #mount(2) system call failed: Stale file handle.

If you just change the uuid of the backing filesystem, overlay is not
mounting any more. In Virtuozzo we copy container disks (ploops) when
create the copy of container and we require fs uuid to be unique for a new
container.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-11-12 11:31:55 +01:00
Pavel Tikhomirov
1cdb0cb662 ovl: propagate ovl_fs to ovl_decode_real_fh and ovl_encode_real_fh
This will be used in next patch to be able to change uuid checks and add
uuid nullification based on ofs->config.index for a new "uuid=off" mode.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-11-12 11:31:55 +01:00