linux/Documentation/filesystems/fuse-passthrough.rst

.. SPDX-License-Identifier: GPL-2.0

================
FUSE Passthrough
================

Introduction
============

FUSE (Filesystem in Userspace) passthrough is a feature designed to improve the
performance of FUSE filesystems for I/O operations. Typically, FUSE operations
involve communication between the kernel and a userspace FUSE daemon, which can
incur overhead. Passthrough allows certain operations on a FUSE file to bypass
the userspace daemon and be executed directly by the kernel on an underlying
"backing file".

This is achieved by the FUSE daemon registering a file descriptor (pointing to
the backing file on a lower filesystem) with the FUSE kernel module. The kernel
then receives an identifier (``backing_id``) for this registered backing file.
When a FUSE file is subsequently opened, the FUSE daemon can, in its response to
the ``OPEN`` request, include this ``backing_id`` and set the
``FOPEN_PASSTHROUGH`` flag. This establishes a direct link for specific
operations.

Currently, passthrough is supported for operations like ``read(2)``/``write(2)``
(via ``read_iter``/``write_iter``), ``splice(2)``, and ``mmap(2)``.

Enabling Passthrough
====================

To use FUSE passthrough:

  1. The FUSE filesystem must be compiled with ``CONFIG_FUSE_PASSTHROUGH``
     enabled.
  2. The FUSE daemon, during the ``FUSE_INIT`` handshake, must negotiate the
     ``FUSE_PASSTHROUGH`` capability and specify its desired
     ``max_stack_depth``.
  3. The (privileged) FUSE daemon uses the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl
     on its connection file descriptor (e.g., ``/dev/fuse``) to register a
     backing file descriptor and obtain a ``backing_id``.
  4. When handling an ``OPEN`` or ``CREATE`` request for a FUSE file, the daemon
     replies with the ``FOPEN_PASSTHROUGH`` flag set in
     ``fuse_open_out::open_flags`` and provides the corresponding ``backing_id``
     in ``fuse_open_out::backing_id``.
  5. The FUSE daemon should eventually call ``FUSE_DEV_IOC_BACKING_CLOSE`` with
     the ``backing_id`` to release the kernel's reference to the backing file
     when it's no longer needed for passthrough setups.

Privilege Requirements
======================

Setting up passthrough functionality currently requires the FUSE daemon to
possess the ``CAP_SYS_ADMIN`` capability. This requirement stems from several
security and resource management considerations that are actively being
discussed and worked on. The primary reasons for this restriction are detailed
below.

Resource Accounting and Visibility
----------------------------------

The core mechanism for passthrough involves the FUSE daemon opening a file
descriptor to a backing file and registering it with the FUSE kernel module via
the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl. This ioctl returns a ``backing_id``
associated with a kernel-internal ``struct fuse_backing`` object, which holds a
reference to the backing ``struct file``.

A significant concern arises because the FUSE daemon can close its own file
descriptor to the backing file after registration. The kernel, however, will
still hold a reference to the ``struct file`` via the ``struct fuse_backing``
object as long as it's associated with a ``backing_id`` (or subsequently, with
an open FUSE file in passthrough mode).

This behavior leads to two main issues for unprivileged FUSE daemons:

  1. **Invisibility to lsof and other inspection tools**: Once the FUSE
     daemon closes its file descriptor, the open backing file held by the kernel
     becomes "hidden." Standard tools like ``lsof``, which typically inspect
     process file descriptor tables, would not be able to identify that this
     file is still open by the system on behalf of the FUSE filesystem. This
     makes it difficult for system administrators to track resource usage or
     debug issues related to open files (e.g., preventing unmounts).

  2. **Bypassing RLIMIT_NOFILE**: The FUSE daemon process is subject to
     resource limits, including the maximum number of open file descriptors
     (``RLIMIT_NOFILE``). If an unprivileged daemon could register backing files
     and then close its own FDs, it could potentially cause the kernel to hold
     an unlimited number of open ``struct file`` references without these being
     accounted against the daemon's ``RLIMIT_NOFILE``. This could lead to a
     denial-of-service (DoS) by exhausting system-wide file resources.

The ``CAP_SYS_ADMIN`` requirement acts as a safeguard against these issues,
restricting this powerful capability to trusted processes.

**NOTE**: ``io_uring`` solves this similar issue by exposing its "fixed files",
which are visible via ``fdinfo`` and accounted under the registering user's
``RLIMIT_NOFILE``.

Filesystem Stacking and Shutdown Loops
--------------------------------------

Another concern relates to the potential for creating complex and problematic
filesystem stacking scenarios if unprivileged users could set up passthrough.
A FUSE passthrough filesystem might use a backing file that resides:

  * On the *same* FUSE filesystem.
  * On another filesystem (like OverlayFS) which itself might have an upper or
    lower layer that is a FUSE filesystem.

These configurations could create dependency loops, particularly during
filesystem shutdown or unmount sequences, leading to deadlocks or system
instability. This is conceptually similar to the risks associated with the
``LOOP_SET_FD`` ioctl, which also requires ``CAP_SYS_ADMIN``.

To mitigate this, FUSE passthrough already incorporates checks based on
filesystem stacking depth (``sb->s_stack_depth`` and ``fc->max_stack_depth``).
For example, during the ``FUSE_INIT`` handshake, the FUSE daemon can negotiate
the ``max_stack_depth`` it supports. When a backing file is registered via
``FUSE_DEV_IOC_BACKING_OPEN``, the kernel checks if the backing file's
filesystem stack depth is within the allowed limit.

The ``CAP_SYS_ADMIN`` requirement provides an additional layer of security,
ensuring that only privileged users can create these potentially complex
stacking arrangements.

General Security Posture
------------------------

As a general principle for new kernel features that allow userspace to instruct
the kernel to perform direct operations on its behalf based on user-provided
file descriptors, starting with a higher privilege requirement (like
``CAP_SYS_ADMIN``) is a conservative and common security practice. This allows
the feature to be used and tested while further security implications are
evaluated and addressed.