2017-04-02 14:58:01 -06:00
|
|
|
=====================================
|
|
|
|
The Linux kernel user-space API guide
|
|
|
|
=====================================
|
|
|
|
|
|
|
|
.. _man-pages: https://www.kernel.org/doc/man-pages/
|
|
|
|
|
|
|
|
While much of the kernel's user-space API is documented elsewhere
|
|
|
|
(particularly in the man-pages_ project), some user-space information can
|
|
|
|
also be found in the kernel tree itself. This manual is intended to be the
|
|
|
|
place where this information is gathered.
|
|
|
|
|
2024-01-22 15:18:30 -07:00
|
|
|
|
|
|
|
System calls
|
|
|
|
============
|
|
|
|
|
2017-04-02 14:58:01 -06:00
|
|
|
.. toctree::
|
2024-01-22 15:18:30 -07:00
|
|
|
:maxdepth: 1
|
|
|
|
|
|
|
|
unshare
|
|
|
|
futex2
|
|
|
|
ebpf/index
|
|
|
|
ioctl/index
|
2024-04-15 16:35:23 +00:00
|
|
|
mseal
|
2024-01-22 15:18:30 -07:00
|
|
|
|
|
|
|
Security-related interfaces
|
|
|
|
===========================
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:maxdepth: 1
|
2017-04-02 14:58:01 -06:00
|
|
|
|
2017-05-13 04:51:38 -07:00
|
|
|
no_new_privs
|
2017-05-13 04:51:37 -07:00
|
|
|
seccomp_filter
|
2021-04-22 17:41:22 +02:00
|
|
|
landlock
|
2024-01-22 15:18:30 -07:00
|
|
|
lsm
|
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it didn't
have proper documentation. This led to a lot of confusion, especially
about whether or not memfd created with the MFD_NOEXEC_SEAL flag is
sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea is
to make it easier to use memfd in the most common way, which is NOEXEC +
F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl vm.noexec to help
existing applications move to a more secure way of using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1], Those
are based on the viewpoint that each flag is an atomic unit, which is a
reasonable assumption. However, MFD_NOEXEC_SEAL was designed with the
intent of promoting the most secure method of using memfd, therefore a
combination of multiple functionalities into one bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one year,
and multiple applications and distributions have backported and utilized
it. Altering ABI now presents a degree of risk and may lead to
disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code to
use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system vm.noexec
= 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying the
semantics of MFD_NOEXEC_SEAL, thereby preventing any potential future
confusion.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[jeffxu@chromium.org: updates per Randy]
Link: https://lkml.kernel.org/r/20240611034903.3456796-2-jeffxu@chromium.org
[jeffxu@chromium.org: v3]
Link: https://lkml.kernel.org/r/20240611231409.3899809-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240607203543.2151433-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Barnabás Pőcze <pobrn@protonmail.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: David Rheinsberg <david@readahead.eu>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-07 20:35:41 +00:00
|
|
|
mfd_noexec
|
2018-04-29 15:20:11 +02:00
|
|
|
spec_ctrl
|
2024-01-22 15:18:30 -07:00
|
|
|
tee
|
exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
be allowed for execution. The main use case is for script interpreters
and dynamic linkers to check execution permission according to the
kernel's security policy. Another use case is to add context to access
logs e.g., which script (instead of interpreter) accessed a file. As
any executable code, scripts could also use this check [1].
This is different from faccessat(2) + X_OK which only checks a subset of
access rights (i.e. inode permission and mount options for regular
files), but not the full context (e.g. all LSM access checks). The main
use case for access(2) is for SUID processes to (partially) check access
on behalf of their caller. The main use case for execveat(2) +
AT_EXECVE_CHECK is to check if a script execution would be allowed,
according to all the different restrictions in place. Because the use
of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
execution, user space gets the same error codes.
An interesting point of using execveat(2) instead of openat2(2) is that
it decouples the check from the enforcement. Indeed, the security check
can be logged (e.g. with audit) without blocking an execution
environment not yet ready to enforce a strict security policy.
LSMs can control or log execution requests with
security_bprm_creds_for_exec(). However, to enforce a consistent and
complete access control (e.g. on binary's dependencies) LSMs should
restrict file executability, or measure executed files, with
security_file_open() by checking file->f_flags & __FMODE_EXEC.
Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
doesn't make sense for the kernel to parse the checked files, look for
interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
if the format is unknown. Because of that, security_bprm_check() is
never called when AT_EXECVE_CHECK is used.
It should be noted that script interpreters cannot directly use
execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
being executed to interpret the script. Unlike the kernel, script
interpreters may just interpret the shebang as a simple comment, which
should not change for backward compatibility reasons.
Because scripts or libraries files might not currently have the
executable permission set, or because we might want specific users to be
allowed to run arbitrary scripts, the following patch provides a dynamic
configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
SECBIT_EXEC_DENY_INTERACTIVE securebits.
This is a redesign of the CLIP OS 4's O_MAYEXEC:
https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
This patch has been used for more than a decade with customized script
interpreters. Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Jeff Xu <jeffxu@chromium.org>
Tested-by: Jeff Xu <jeffxu@chromium.org>
Link: https://docs.python.org/3/library/io.html#io.open_code [1]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241212174223.389435-2-mic@digikod.net
Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-12 18:42:16 +01:00
|
|
|
check_exec
|
2024-01-22 15:18:30 -07:00
|
|
|
|
|
|
|
Devices and I/O
|
|
|
|
===============
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:maxdepth: 1
|
|
|
|
|
2019-06-18 11:39:21 -03:00
|
|
|
accelerators/ocxl
|
2023-08-03 16:47:29 +01:00
|
|
|
dma-buf-alloc-exchange
|
2024-01-15 08:48:39 +08:00
|
|
|
gpio/index
|
2022-11-29 16:29:28 -04:00
|
|
|
iommufd
|
2020-03-10 13:57:48 +01:00
|
|
|
media/index
|
2024-01-22 15:18:30 -07:00
|
|
|
dcdbas
|
2021-08-31 18:36:34 +08:00
|
|
|
vduse
|
2023-12-21 13:48:15 +01:00
|
|
|
isapnp
|
2024-01-22 15:18:30 -07:00
|
|
|
|
|
|
|
Everything else
|
|
|
|
===============
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:maxdepth: 1
|
|
|
|
|
|
|
|
ELF
|
|
|
|
netlink/index
|
|
|
|
sysfs-platform_profile
|
2024-01-02 16:50:01 +08:00
|
|
|
vduse
|
|
|
|
futex2
|
|
|
|
perf_ring_buffer
|
2024-12-13 13:35:10 -06:00
|
|
|
ntsync
|
2017-04-02 15:18:32 -06:00
|
|
|
|
2017-04-02 14:58:01 -06:00
|
|
|
.. only:: subproject and html
|
|
|
|
|
|
|
|
Indices
|
|
|
|
=======
|
|
|
|
|
|
|
|
* :ref:`genindex`
|