linux/kernel
Kumar Kartikeya Dwivedi 18688de203 bpf: Fix UAF due to race between btf_try_get_module and load_module
While working on code to populate kfunc BTF ID sets for module BTF from
its initcall, I noticed that by the time the initcall is invoked, the
module BTF can already be seen by userspace (and the BPF verifier). The
existing btf_try_get_module calls try_module_get which only fails if
mod->state == MODULE_STATE_GOING, i.e. it can increment module reference
when module initcall is happening in parallel.

Currently, BTF parsing happens from MODULE_STATE_COMING notifier
callback. At this point, the module initcalls have not been invoked.
The notifier callback parses and prepares the module BTF, allocates an
ID, which publishes it to userspace, and then adds it to the btf_modules
list allowing the kernel to invoke btf_try_get_module for the BTF.

However, at this point, the module has not been fully initialized (i.e.
its initcalls have not finished). The code in module.c can still fail
and free the module, without caring for other users. However, nothing
stops btf_try_get_module from succeeding between the state transition
from MODULE_STATE_COMING to MODULE_STATE_LIVE.

This leads to a use-after-free issue when BPF program loads
successfully in the state transition, load_module's do_init_module call
fails and frees the module, and BPF program fd on close calls module_put
for the freed module. Future patch has test case to verify we don't
regress in this area in future.

There are multiple points after prepare_coming_module (in load_module)
where failure can occur and module loading can return error. We
illustrate and test for the race using the last point where it can
practically occur (in module __init function).

An illustration of the race:

CPU 0                           CPU 1
			  load_module
			    notifier_call(MODULE_STATE_COMING)
			      btf_parse_module
			      btf_alloc_id	// Published to userspace
			      list_add(&btf_mod->list, btf_modules)
			    mod->init(...)
...				^
bpf_check		        |
check_pseudo_btf_id             |
  btf_try_get_module            |
    returns true                |  ...
...                             |  module __init in progress
return prog_fd                  |  ...
...                             V
			    if (ret < 0)
			      free_module(mod)
			    ...
close(prog_fd)
 ...
 bpf_prog_free_deferred
  module_put(used_btf.mod) // use-after-free

We fix this issue by setting a flag BTF_MODULE_F_LIVE, from the notifier
callback when MODULE_STATE_LIVE state is reached for the module, so that
we return NULL from btf_try_get_module for modules that are not fully
formed. Since try_module_get already checks that module is not in
MODULE_STATE_GOING state, and that is the only transition a live module
can make before being removed from btf_modules list, this is enough to
close the race and prevent the bug.

A later selftest patch crafts the race condition artifically to verify
that it has been fixed, and that verifier fails to load program (with
ENXIO).

Lastly, a couple of comments:

 1. Even if this race didn't exist, it seems more appropriate to only
    access resources (ksyms and kfuncs) of a fully formed module which
    has been initialized completely.

 2. This patch was born out of need for synchronization against module
    initcall for the next patch, so it is needed for correctness even
    without the aforementioned race condition. The BTF resources
    initialized by module initcall are set up once and then only looked
    up, so just waiting until the initcall has finished ensures correct
    behavior.

Fixes: 541c3bad8d ("bpf: Support BPF ksym variables in kernel modules")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18 14:26:41 -08:00
..
bpf bpf: Fix UAF due to race between btf_try_get_module and load_module 2022-01-18 14:26:41 -08:00
cgroup Networking changes for 5.17. 2022-01-10 19:06:09 -08:00
configs
debug kdb: Adopt scheduler's task classification 2021-11-03 17:21:37 +00:00
dma dma-mapping updates for Linux 5.16 2021-11-09 10:56:41 -08:00
entry entry: Snapshot thread flags 2021-12-01 00:06:43 +01:00
events perf: Add a counter for number of user access events in context 2021-12-14 11:30:54 +00:00
futex futex: Fix PREEMPT_RT build 2021-10-19 17:27:05 +02:00
gcov
irq irq: remove unused flags argument from __handle_irq_event_percpu() 2022-01-07 00:25:25 +01:00
kcsan arm64: Enable KCSAN 2021-12-14 18:54:34 +00:00
livepatch Tracing updates for 5.16: 2021-11-01 20:05:19 -07:00
locking locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner() 2021-12-18 10:55:51 +01:00
power PM: hibernate: Allow ACPI hardware signature to be honoured 2021-12-08 16:06:10 +01:00
printk printk fixup for 5.16 2021-11-18 10:50:45 -08:00
rcu RCU pull request for v5.16 2021-11-01 20:25:38 -07:00
sched - Add a set of thread_info.flags accessors which snapshot it before 2022-01-10 11:34:10 -08:00
time timekeeping: Really make sure wall_to_monotonic isn't positive 2021-12-17 23:06:22 +01:00
trace Networking changes for 5.17. 2022-01-10 19:06:09 -08:00
.gitignore
acct.c kernel: remove spurious blkdev.h includes 2021-10-18 06:17:01 -06:00
async.c
audit.c audit: improve robustness of the audit queue handling 2021-12-15 13:16:39 -05:00
audit.h audit/stable-5.16 PR 20211101 2021-11-01 21:17:39 -07:00
audit_fsnotify.c fsnotify: clarify contract for create event hooks 2021-10-27 12:32:34 +02:00
audit_tree.c audit/stable-5.16 PR 20211101 2021-11-01 21:17:39 -07:00
audit_watch.c \n 2021-11-06 16:43:20 -07:00
auditfilter.c audit: add filtering for io_uring records 2021-09-19 22:34:38 -04:00
auditsc.c audit/stable-5.16 PR 20211101 2021-11-01 21:17:39 -07:00
backtracetest.c
bounds.c
capability.c
cfi.c cfi: Use rcu_read_{un}lock_sched_notrace 2021-08-11 13:11:12 -07:00
compat.c arch: remove compat_alloc_user_space 2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu.c sched/scs: Reset task stack state in bringup_cpu() 2021-11-24 12:20:27 +01:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-25 12:20:55 -08:00
crash_dump.c
cred.c ucounts: In set_cred_ucounts assume new->ucounts is non-NULL 2021-10-20 10:45:34 -05:00
delayacct.c
dma.c
exec_domain.c
exit.c Merge branch 'per_signal_struct_coredumps-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-03 12:15:29 -07:00
extable.c extable: use is_kernel_text() helper 2021-11-09 10:02:51 -08:00
fail_function.c
fork.c A single fix for POSIX CPU timers to address a problem where POSIX CPU 2021-11-14 10:43:38 -08:00
freezer.c
gen_kheaders.sh
groups.c
hung_task.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
iomem.c
irq_work.c irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT 2021-10-15 11:25:18 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c kallsyms: strip LTO suffixes from static functions 2021-10-04 10:58:25 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt preempt: Restore preemption model selection configs 2021-11-11 13:09:33 +01:00
kcov.c kcov: replace local_irq_save() with a local_lock_t 2021-11-09 10:02:52 -08:00
kexec.c kexec: avoid compat_alloc_user_space 2021-09-08 15:32:34 -07:00
kexec_core.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
kexec_elf.c
kexec_file.c memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED 2021-11-06 13:30:42 -07:00
kexec_internal.h
kheaders.c
kmod.c
kprobes.c kprobes: Limit max data_size of the kretprobe instances 2021-12-01 21:04:34 -05:00
ksysfs.c
kthread.c Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
latencytop.c
Makefile Tracing updates for 5.16: 2021-11-01 20:05:19 -07:00
module-internal.h
module.c module: change to print useful messages from elf_validity_check() 2021-11-05 15:13:10 -07:00
module_signature.c
module_signing.c
notifier.c notifier: Return an error when a callback has already been registered 2021-12-29 10:37:33 +01:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Remove repeated verbose license text 2021-08-27 16:30:18 +08:00
panic.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
params.c params: lift param_set_uint_minmax to common code 2021-08-16 14:42:22 +02:00
pid.c pid: add pidfd_get_task() helper 2021-10-14 13:29:18 +02:00
pid_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
profile.c profiling: fix shift-out-of-bounds bugs 2021-09-08 11:50:26 -07:00
ptrace.c
range.c
reboot.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-11-12 11:53:16 -08:00
regset.c
relay.c
resource.c kernel/resource: disallow access to exclusive system RAM regions 2021-11-09 10:02:52 -08:00
resource_kunit.c
rseq.c KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest 2021-09-22 10:24:01 -04:00
scftorture.c scftorture: Warn on individual scf_torture_init() error conditions 2021-09-16 10:27:48 -07:00
scs.c scs: Release kasan vmalloc poison in scs_free process 2021-09-30 09:37:27 +01:00
seccomp.c Merge branch 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-09-01 14:52:05 -07:00
signal.c signal: Skip the altstack update when not needed 2021-12-14 13:08:36 -08:00
smp.c sched: Improve wake_up_all_idle_cpus() take #2 2021-10-22 15:32:46 +02:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c timers/nohz: Last resort update jiffies on nohz_full IRQ entry 2021-12-02 15:07:22 +01:00
stackleak.c
stacktrace.c stacktrace: move filter_irq_stacks() to kernel/stacktrace.c 2021-11-06 13:30:43 -07:00
static_call.c static_call: Fix static_call_text_reserved() vs __init 2021-07-05 10:46:33 +02:00
stop_machine.c
sys.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
sys_ni.c futex: Implement sys_futex_waitv() 2021-10-07 13:51:11 +02:00
sysctl-test.c
sysctl.c net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
task_work.c
taskstats.c
torture.c torture: Replace deprecated CPU-hotplug functions. 2021-08-10 10:48:07 -07:00
tracepoint.c tracepoint: Fix kerneldoc comments 2021-08-16 11:39:51 -04:00
tsacct.c mm/mmap.c: fix a data race of mm->total_vm 2021-11-06 13:30:35 -07:00
ucount.c ucounts: Fix rlimit max values check 2021-12-09 15:37:18 -06:00
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname.c
utsname_sysctl.c
watch_queue.c
watchdog.c kernel: watchdog: modify the explanation related to watchdog thread 2021-06-29 10:53:46 -07:00
watchdog_hld.c
workqueue.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00