Deferred unwind changes for 6.17

This is the core infrastructure for the deferred unwinder that is required
 for sframes[1]. Several other patch series is based on this work although
 those patch series are not dependent on each other. In order to simplify the
 development, having this core series upstream will allow the other series to
 be worked on in parallel. The other series are:
 
 - The two patches to implement x86:
   https://lore.kernel.org/linux-trace-kernel/20250717004958.260781923@kernel.org/
   https://lore.kernel.org/linux-trace-kernel/20250717004958.432327787@kernel.org/
 
 - The s390 work:
   https://lore.kernel.org/linux-trace-kernel/20250710163522.3195293-1-jremus@linux.ibm.com/
 
 - The perf work:
   https://lore.kernel.org/linux-trace-kernel/20250718164119.089692174@kernel.org/
 
 - The ftrace work:
   https://lore.kernel.org/linux-trace-kernel/20250424192612.505622711@goodmis.org/
 
 - The sframe work:
   https://lore.kernel.org/linux-trace-kernel/20250717012848.927473176@kernel.org/
 
 And more is on the way.
 
 The core infrastructure adds the following in kernel APIs:
 
 - int unwind_user_faultable(struct unwind_stacktrace *trace);
 
     Performs a user space stack trace that may fault user pages in.
 
 - int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func);
 
     Allows a tracer to register with the unwind deferred infrastructure.
 
 - int unwind_deferred_request(struct unwind_work *work, u64 *cookie);
 
     Used when a tracer request a deferred trace. Can be called from interrupt
     or NMI context.
 
 - void unwind_deferred_cancel(struct unwind_work *work);
 
     Called by a tracer to unregister from the deferred unwind infrastructure.
 
 - void unwind_deferred_task_exit(struct task_struct *task);
 
     Called by task exit code to flush any pending unwind requests.
 
 - void unwind_task_init(struct task_struct *task);
 
     Called by do_fork() to initialize the task struct for the deferred
     unwinder.
 
 - void unwind_task_free(struct task_struct *task);
 
     Called by do_exit() to free up any resources used by the deferred
     unwinder.
 
 None of the above is actually compiled unless an architecture enables it,
 which none currently do.
 
 [1] https://sourceware.org/binutils/wiki/sframe
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIt9IhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qqqzAQCMT/6qmSq7O746JF0MuGC6fTZnSbAc
 XGz4JigEqLTRewEA2kaJmD7PBsSRzFdiK2gvyKn95l+PZyWtE9MjTsqeSAc=
 =Lsbm
 -----END PGP SIGNATURE-----

Merge tag 'trace-deferred-unwind-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull initial deferred unwind infrastructure from Steven Rostedt:
 "This is the core infrastructure for the deferred unwinder that is
  required for sframes[1]. Several other patch series are based on this
  work although those patch series are not dependent on each other. In
  order to simplify the development, having this core series upstream
  will allow the other series to be worked on in parallel. The other
  series are:

    - The two patches to implement x86 support [2] [3]

    - The s390 work [4]

    - The perf work [5]

    - The ftrace work [6]

    - The sframe work [7]

  And more is on the way.

  The core infrastructure adds the following in kernel APIs:

    - int unwind_user_faultable(struct unwind_stacktrace *trace);

        Performs a user space stack trace that may fault user pages in.

    - int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func);

        Allows a tracer to register with the unwind deferred
        infrastructure.

    - int unwind_deferred_request(struct unwind_work *work, u64 *cookie);

        Used when a tracer request a deferred trace. Can be called from
        interrupt or NMI context.

    - void unwind_deferred_cancel(struct unwind_work *work);

        Called by a tracer to unregister from the deferred unwind
        infrastructure.

    - void unwind_deferred_task_exit(struct task_struct *task);

        Called by task exit code to flush any pending unwind requests.

    - void unwind_task_init(struct task_struct *task);

        Called by do_fork() to initialize the task struct for the
        deferred unwinder.

    - void unwind_task_free(struct task_struct *task);

        Called by do_exit() to free up any resources used by the
        deferred unwinder.

    None of the above is actually compiled unless an architecture enables it,
    which none currently do"

Link: https://sourceware.org/binutils/wiki/sframe [1]
Link: https://lore.kernel.org/linux-trace-kernel/20250717004958.260781923@kernel.org/ [2]
Link: https://lore.kernel.org/linux-trace-kernel/20250717004958.432327787@kernel.org/ [3]
Link: https://lore.kernel.org/linux-trace-kernel/20250710163522.3195293-1-jremus@linux.ibm.com/ [4]
Link: https://lore.kernel.org/linux-trace-kernel/20250718164119.089692174@kernel.org/ [5]
Link: https://lore.kernel.org/linux-trace-kernel/20250424192612.505622711@goodmis.org/ [6]
Link: https://lore.kernel.org/linux-trace-kernel/20250717012848.927473176@kernel.org/ [7]

* tag 'trace-deferred-unwind-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  unwind: Finish up unwind when a task exits
  unwind deferred: Use SRCU unwind_deferred_task_work()
  unwind: Add USED bit to only have one conditional on way back to user space
  unwind deferred: Add unwind_completed mask to stop spurious callbacks
  unwind deferred: Use bitmask to determine which callbacks to call
  unwind_user/deferred: Make unwind deferral requests NMI-safe
  unwind_user/deferred: Add deferred unwinding interface
  unwind_user/deferred: Add unwind cache
  unwind_user/deferred: Add unwind_user_faultable()
  unwind_user: Add user space unwinding API with frame pointer support
This commit is contained in:
Linus Torvalds 2025-08-01 09:46:24 -07:00
commit c6439bfaab
16 changed files with 703 additions and 0 deletions

View file

@ -26253,6 +26253,13 @@ F: Documentation/driver-api/uio-howto.rst
F: drivers/uio/
F: include/linux/uio_driver.h
USERSPACE STACK UNWINDING
M: Josh Poimboeuf <jpoimboe@kernel.org>
M: Steven Rostedt <rostedt@goodmis.org>
S: Maintained
F: include/linux/unwind*.h
F: kernel/unwind/
UTIL-LINUX PACKAGE
M: Karel Zak <kzak@redhat.com>
L: util-linux@vger.kernel.org

View file

@ -444,6 +444,13 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH
It uses the same command line parameters, and sysctl interface,
as the generic hardlockup detectors.
config UNWIND_USER
bool
config HAVE_UNWIND_USER_FP
bool
select UNWIND_USER
config HAVE_PERF_REGS
bool
help

View file

@ -59,6 +59,7 @@ mandatory-y += tlbflush.h
mandatory-y += topology.h
mandatory-y += trace_clock.h
mandatory-y += uaccess.h
mandatory-y += unwind_user.h
mandatory-y += vermagic.h
mandatory-y += vga.h
mandatory-y += video.h

View file

@ -0,0 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_GENERIC_UNWIND_USER_H
#define _ASM_GENERIC_UNWIND_USER_H
#endif /* _ASM_GENERIC_UNWIND_USER_H */

View file

@ -7,6 +7,7 @@
#include <linux/context_tracking.h>
#include <linux/tick.h>
#include <linux/kmsan.h>
#include <linux/unwind_deferred.h>
#include <asm/entry-common.h>
@ -256,6 +257,7 @@ static __always_inline void exit_to_user_mode(void)
lockdep_hardirqs_on_prepare();
instrumentation_end();
unwind_reset_info();
user_enter_irqoff();
arch_exit_to_user_mode();
lockdep_hardirqs_on(CALLER_ADDR0);

View file

@ -47,6 +47,7 @@
#include <linux/rv.h>
#include <linux/uidgid_types.h>
#include <linux/tracepoint-defs.h>
#include <linux/unwind_deferred_types.h>
#include <asm/kmap_size.h>
/* task_struct member predeclarations (sorted alphabetically): */
@ -1646,6 +1647,10 @@ struct task_struct {
struct user_event_mm *user_event_mm;
#endif
#ifdef CONFIG_UNWIND_USER
struct unwind_task_info unwind_info;
#endif
/* CPU-specific state of this task: */
struct thread_struct thread;

View file

@ -0,0 +1,81 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_UNWIND_USER_DEFERRED_H
#define _LINUX_UNWIND_USER_DEFERRED_H
#include <linux/task_work.h>
#include <linux/unwind_user.h>
#include <linux/unwind_deferred_types.h>
struct unwind_work;
typedef void (*unwind_callback_t)(struct unwind_work *work, struct unwind_stacktrace *trace, u64 cookie);
struct unwind_work {
struct list_head list;
unwind_callback_t func;
int bit;
};
#ifdef CONFIG_UNWIND_USER
enum {
UNWIND_PENDING_BIT = 0,
UNWIND_USED_BIT,
};
enum {
UNWIND_PENDING = BIT(UNWIND_PENDING_BIT),
/* Set if the unwinding was used (directly or deferred) */
UNWIND_USED = BIT(UNWIND_USED_BIT)
};
void unwind_task_init(struct task_struct *task);
void unwind_task_free(struct task_struct *task);
int unwind_user_faultable(struct unwind_stacktrace *trace);
int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func);
int unwind_deferred_request(struct unwind_work *work, u64 *cookie);
void unwind_deferred_cancel(struct unwind_work *work);
void unwind_deferred_task_exit(struct task_struct *task);
static __always_inline void unwind_reset_info(void)
{
struct unwind_task_info *info = &current->unwind_info;
unsigned long bits;
/* Was there any unwinding? */
if (unlikely(info->unwind_mask)) {
bits = info->unwind_mask;
do {
/* Is a task_work going to run again before going back */
if (bits & UNWIND_PENDING)
return;
} while (!try_cmpxchg(&info->unwind_mask, &bits, 0UL));
current->unwind_info.id.id = 0;
if (unlikely(info->cache)) {
info->cache->nr_entries = 0;
info->cache->unwind_completed = 0;
}
}
}
#else /* !CONFIG_UNWIND_USER */
static inline void unwind_task_init(struct task_struct *task) {}
static inline void unwind_task_free(struct task_struct *task) {}
static inline int unwind_user_faultable(struct unwind_stacktrace *trace) { return -ENOSYS; }
static inline int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func) { return -ENOSYS; }
static inline int unwind_deferred_request(struct unwind_work *work, u64 *timestamp) { return -ENOSYS; }
static inline void unwind_deferred_cancel(struct unwind_work *work) {}
static inline void unwind_deferred_task_exit(struct task_struct *task) {}
static inline void unwind_reset_info(void) {}
#endif /* !CONFIG_UNWIND_USER */
#endif /* _LINUX_UNWIND_USER_DEFERRED_H */

View file

@ -0,0 +1,39 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_UNWIND_USER_DEFERRED_TYPES_H
#define _LINUX_UNWIND_USER_DEFERRED_TYPES_H
struct unwind_cache {
unsigned long unwind_completed;
unsigned int nr_entries;
unsigned long entries[];
};
/*
* The unwind_task_id is a unique identifier that maps to a user space
* stacktrace. It is generated the first time a deferred user space
* stacktrace is requested after a task has entered the kerenl and
* is cleared to zero when it exits. The mapped id will be a non-zero
* number.
*
* To simplify the generation of the 64 bit number, 32 bits will be
* the CPU it was generated on, and the other 32 bits will be a per
* cpu counter that gets incremented by two every time a new identifier
* is generated. The LSB will always be set to keep the value
* from being zero.
*/
union unwind_task_id {
struct {
u32 cpu;
u32 cnt;
};
u64 id;
};
struct unwind_task_info {
unsigned long unwind_mask;
struct unwind_cache *cache;
struct callback_head work;
union unwind_task_id id;
};
#endif /* _LINUX_UNWIND_USER_DEFERRED_TYPES_H */

View file

@ -0,0 +1,14 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_UNWIND_USER_H
#define _LINUX_UNWIND_USER_H
#include <linux/unwind_user_types.h>
#include <asm/unwind_user.h>
#ifndef ARCH_INIT_USER_FP_FRAME
#define ARCH_INIT_USER_FP_FRAME
#endif
int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries);
#endif /* _LINUX_UNWIND_USER_H */

View file

@ -0,0 +1,44 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_UNWIND_USER_TYPES_H
#define _LINUX_UNWIND_USER_TYPES_H
#include <linux/types.h>
/*
* Unwind types, listed in priority order: lower numbers are attempted first if
* available.
*/
enum unwind_user_type_bits {
UNWIND_USER_TYPE_FP_BIT = 0,
NR_UNWIND_USER_TYPE_BITS,
};
enum unwind_user_type {
/* Type "none" for the start of stack walk iteration. */
UNWIND_USER_TYPE_NONE = 0,
UNWIND_USER_TYPE_FP = BIT(UNWIND_USER_TYPE_FP_BIT),
};
struct unwind_stacktrace {
unsigned int nr;
unsigned long *entries;
};
struct unwind_user_frame {
s32 cfa_off;
s32 ra_off;
s32 fp_off;
bool use_fp;
};
struct unwind_user_state {
unsigned long ip;
unsigned long sp;
unsigned long fp;
enum unwind_user_type current_type;
unsigned int available_types;
bool done;
};
#endif /* _LINUX_UNWIND_USER_TYPES_H */

View file

@ -54,6 +54,7 @@ obj-y += rcu/
obj-y += livepatch/
obj-y += dma/
obj-y += entry/
obj-y += unwind/
obj-$(CONFIG_MODULES) += module/
obj-$(CONFIG_KCMP) += kcmp.o

View file

@ -68,6 +68,7 @@
#include <linux/rethook.h>
#include <linux/sysfs.h>
#include <linux/user_events.h>
#include <linux/unwind_deferred.h>
#include <linux/uaccess.h>
#include <linux/pidfs.h>
@ -938,6 +939,7 @@ void __noreturn do_exit(long code)
tsk->exit_code = code;
taskstats_exit(tsk, group_dead);
unwind_deferred_task_exit(tsk);
trace_sched_process_exit(tsk, group_dead);
/*

View file

@ -105,6 +105,7 @@
#include <uapi/linux/pidfd.h>
#include <linux/pidfs.h>
#include <linux/tick.h>
#include <linux/unwind_deferred.h>
#include <asm/pgalloc.h>
#include <linux/uaccess.h>
@ -732,6 +733,7 @@ void __put_task_struct(struct task_struct *tsk)
WARN_ON(refcount_read(&tsk->usage));
WARN_ON(tsk == current);
unwind_task_free(tsk);
sched_ext_free(tsk);
io_uring_free(tsk);
cgroup_free(tsk);
@ -2135,6 +2137,8 @@ __latent_entropy struct task_struct *copy_process(
p->bpf_ctx = NULL;
#endif
unwind_task_init(p);
/* Perform scheduler related setup. Assign this task to a CPU. */
retval = sched_fork(clone_flags, p);
if (retval)

1
kernel/unwind/Makefile Normal file
View file

@ -0,0 +1 @@
obj-$(CONFIG_UNWIND_USER) += user.o deferred.o

362
kernel/unwind/deferred.c Normal file
View file

@ -0,0 +1,362 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Deferred user space unwinding
*/
#include <linux/sched/task_stack.h>
#include <linux/unwind_deferred.h>
#include <linux/sched/clock.h>
#include <linux/task_work.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/sizes.h>
#include <linux/slab.h>
#include <linux/mm.h>
/*
* For requesting a deferred user space stack trace from NMI context
* the architecture must support a safe cmpxchg in NMI context.
* For those architectures that do not have that, then it cannot ask
* for a deferred user space stack trace from an NMI context. If it
* does, then it will get -EINVAL.
*/
#if defined(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG)
# define CAN_USE_IN_NMI 1
static inline bool try_assign_cnt(struct unwind_task_info *info, u32 cnt)
{
u32 old = 0;
return try_cmpxchg(&info->id.cnt, &old, cnt);
}
#else
# define CAN_USE_IN_NMI 0
/* When NMIs are not allowed, this always succeeds */
static inline bool try_assign_cnt(struct unwind_task_info *info, u32 cnt)
{
info->id.cnt = cnt;
return true;
}
#endif
/* Make the cache fit in a 4K page */
#define UNWIND_MAX_ENTRIES \
((SZ_4K - sizeof(struct unwind_cache)) / sizeof(long))
/* Guards adding to or removing from the list of callbacks */
static DEFINE_MUTEX(callback_mutex);
static LIST_HEAD(callbacks);
#define RESERVED_BITS (UNWIND_PENDING | UNWIND_USED)
/* Zero'd bits are available for assigning callback users */
static unsigned long unwind_mask = RESERVED_BITS;
DEFINE_STATIC_SRCU(unwind_srcu);
static inline bool unwind_pending(struct unwind_task_info *info)
{
return test_bit(UNWIND_PENDING_BIT, &info->unwind_mask);
}
/*
* This is a unique percpu identifier for a given task entry context.
* Conceptually, it's incremented every time the CPU enters the kernel from
* user space, so that each "entry context" on the CPU gets a unique ID. In
* reality, as an optimization, it's only incremented on demand for the first
* deferred unwind request after a given entry-from-user.
*
* It's combined with the CPU id to make a systemwide-unique "context cookie".
*/
static DEFINE_PER_CPU(u32, unwind_ctx_ctr);
/*
* The context cookie is a unique identifier that is assigned to a user
* space stacktrace. As the user space stacktrace remains the same while
* the task is in the kernel, the cookie is an identifier for the stacktrace.
* Although it is possible for the stacktrace to get another cookie if another
* request is made after the cookie was cleared and before reentering user
* space.
*/
static u64 get_cookie(struct unwind_task_info *info)
{
u32 cnt = 1;
if (info->id.cpu)
return info->id.id;
/* LSB is always set to ensure 0 is an invalid value */
cnt |= __this_cpu_read(unwind_ctx_ctr) + 2;
if (try_assign_cnt(info, cnt)) {
/* Update the per cpu counter */
__this_cpu_write(unwind_ctx_ctr, cnt);
}
/* Interrupts are disabled, the CPU will always be same */
info->id.cpu = smp_processor_id() + 1; /* Must be non zero */
return info->id.id;
}
/**
* unwind_user_faultable - Produce a user stacktrace in faultable context
* @trace: The descriptor that will store the user stacktrace
*
* This must be called in a known faultable context (usually when entering
* or exiting user space). Depending on the available implementations
* the @trace will be loaded with the addresses of the user space stacktrace
* if it can be found.
*
* Return: 0 on success and negative on error
* On success @trace will contain the user space stacktrace
*/
int unwind_user_faultable(struct unwind_stacktrace *trace)
{
struct unwind_task_info *info = &current->unwind_info;
struct unwind_cache *cache;
/* Should always be called from faultable context */
might_fault();
if (!current->mm)
return -EINVAL;
if (!info->cache) {
info->cache = kzalloc(struct_size(cache, entries, UNWIND_MAX_ENTRIES),
GFP_KERNEL);
if (!info->cache)
return -ENOMEM;
}
cache = info->cache;
trace->entries = cache->entries;
if (cache->nr_entries) {
/*
* The user stack has already been previously unwound in this
* entry context. Skip the unwind and use the cache.
*/
trace->nr = cache->nr_entries;
return 0;
}
trace->nr = 0;
unwind_user(trace, UNWIND_MAX_ENTRIES);
cache->nr_entries = trace->nr;
/* Clear nr_entries on way back to user space */
set_bit(UNWIND_USED_BIT, &info->unwind_mask);
return 0;
}
static void process_unwind_deferred(struct task_struct *task)
{
struct unwind_task_info *info = &task->unwind_info;
struct unwind_stacktrace trace;
struct unwind_work *work;
unsigned long bits;
u64 cookie;
if (WARN_ON_ONCE(!unwind_pending(info)))
return;
/* Clear pending bit but make sure to have the current bits */
bits = atomic_long_fetch_andnot(UNWIND_PENDING,
(atomic_long_t *)&info->unwind_mask);
/*
* From here on out, the callback must always be called, even if it's
* just an empty trace.
*/
trace.nr = 0;
trace.entries = NULL;
unwind_user_faultable(&trace);
if (info->cache)
bits &= ~(info->cache->unwind_completed);
cookie = info->id.id;
guard(srcu)(&unwind_srcu);
list_for_each_entry_srcu(work, &callbacks, list,
srcu_read_lock_held(&unwind_srcu)) {
if (test_bit(work->bit, &bits)) {
work->func(work, &trace, cookie);
if (info->cache)
info->cache->unwind_completed |= BIT(work->bit);
}
}
}
static void unwind_deferred_task_work(struct callback_head *head)
{
process_unwind_deferred(current);
}
void unwind_deferred_task_exit(struct task_struct *task)
{
struct unwind_task_info *info = &current->unwind_info;
if (!unwind_pending(info))
return;
process_unwind_deferred(task);
task_work_cancel(task, &info->work);
}
/**
* unwind_deferred_request - Request a user stacktrace on task kernel exit
* @work: Unwind descriptor requesting the trace
* @cookie: The cookie of the first request made for this task
*
* Schedule a user space unwind to be done in task work before exiting the
* kernel.
*
* The returned @cookie output is the generated cookie of the very first
* request for a user space stacktrace for this task since it entered the
* kernel. It can be from a request by any caller of this infrastructure.
* Its value will also be passed to the callback function. It can be
* used to stitch kernel and user stack traces together in post-processing.
*
* It's valid to call this function multiple times for the same @work within
* the same task entry context. Each call will return the same cookie
* while the task hasn't left the kernel. If the callback is not pending
* because it has already been previously called for the same entry context,
* it will be called again with the same stack trace and cookie.
*
* Return: 0 if the callback successfully was queued.
* 1 if the callback is pending or was already executed.
* Negative if there's an error.
* @cookie holds the cookie of the first request by any user
*/
int unwind_deferred_request(struct unwind_work *work, u64 *cookie)
{
struct unwind_task_info *info = &current->unwind_info;
unsigned long old, bits;
unsigned long bit;
int ret;
*cookie = 0;
if ((current->flags & (PF_KTHREAD | PF_EXITING)) ||
!user_mode(task_pt_regs(current)))
return -EINVAL;
/*
* NMI requires having safe cmpxchg operations.
* Trigger a warning to make it obvious that an architecture
* is using this in NMI when it should not be.
*/
if (WARN_ON_ONCE(!CAN_USE_IN_NMI && in_nmi()))
return -EINVAL;
/* Do not allow cancelled works to request again */
bit = READ_ONCE(work->bit);
if (WARN_ON_ONCE(bit < 0))
return -EINVAL;
/* Only need the mask now */
bit = BIT(bit);
guard(irqsave)();
*cookie = get_cookie(info);
old = READ_ONCE(info->unwind_mask);
/* Is this already queued or executed */
if (old & bit)
return 1;
/*
* This work's bit hasn't been set yet. Now set it with the PENDING
* bit and fetch the current value of unwind_mask. If ether the
* work's bit or PENDING was already set, then this is already queued
* to have a callback.
*/
bits = UNWIND_PENDING | bit;
old = atomic_long_fetch_or(bits, (atomic_long_t *)&info->unwind_mask);
if (old & bits) {
/*
* If the work's bit was set, whatever set it had better
* have also set pending and queued a callback.
*/
WARN_ON_ONCE(!(old & UNWIND_PENDING));
return old & bit;
}
/* The work has been claimed, now schedule it. */
ret = task_work_add(current, &info->work, TWA_RESUME);
if (WARN_ON_ONCE(ret))
WRITE_ONCE(info->unwind_mask, 0);
return ret;
}
void unwind_deferred_cancel(struct unwind_work *work)
{
struct task_struct *g, *t;
int bit;
if (!work)
return;
bit = work->bit;
/* No work should be using a reserved bit */
if (WARN_ON_ONCE(BIT(bit) & RESERVED_BITS))
return;
guard(mutex)(&callback_mutex);
list_del_rcu(&work->list);
/* Do not allow any more requests and prevent callbacks */
work->bit = -1;
__clear_bit(bit, &unwind_mask);
synchronize_srcu(&unwind_srcu);
guard(rcu)();
/* Clear this bit from all threads */
for_each_process_thread(g, t) {
clear_bit(bit, &t->unwind_info.unwind_mask);
if (t->unwind_info.cache)
clear_bit(bit, &t->unwind_info.cache->unwind_completed);
}
}
int unwind_deferred_init(struct unwind_work *work, unwind_callback_t func)
{
memset(work, 0, sizeof(*work));
guard(mutex)(&callback_mutex);
/* See if there's a bit in the mask available */
if (unwind_mask == ~0UL)
return -EBUSY;
work->bit = ffz(unwind_mask);
__set_bit(work->bit, &unwind_mask);
list_add_rcu(&work->list, &callbacks);
work->func = func;
return 0;
}
void unwind_task_init(struct task_struct *task)
{
struct unwind_task_info *info = &task->unwind_info;
memset(info, 0, sizeof(*info));
init_task_work(&info->work, unwind_deferred_task_work);
info->unwind_mask = 0;
}
void unwind_task_free(struct task_struct *task)
{
struct unwind_task_info *info = &task->unwind_info;
kfree(info->cache);
task_work_cancel(task, &info->work);
}

128
kernel/unwind/user.c Normal file
View file

@ -0,0 +1,128 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Generic interfaces for unwinding user space
*/
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/sched/task_stack.h>
#include <linux/unwind_user.h>
#include <linux/uaccess.h>
static const struct unwind_user_frame fp_frame = {
ARCH_INIT_USER_FP_FRAME
};
#define for_each_user_frame(state) \
for (unwind_user_start(state); !(state)->done; unwind_user_next(state))
static int unwind_user_next_fp(struct unwind_user_state *state)
{
const struct unwind_user_frame *frame = &fp_frame;
unsigned long cfa, fp, ra;
unsigned int shift;
if (frame->use_fp) {
if (state->fp < state->sp)
return -EINVAL;
cfa = state->fp;
} else {
cfa = state->sp;
}
/* Get the Canonical Frame Address (CFA) */
cfa += frame->cfa_off;
/* stack going in wrong direction? */
if (cfa <= state->sp)
return -EINVAL;
/* Make sure that the address is word aligned */
shift = sizeof(long) == 4 ? 2 : 3;
if (cfa & ((1 << shift) - 1))
return -EINVAL;
/* Find the Return Address (RA) */
if (get_user(ra, (unsigned long *)(cfa + frame->ra_off)))
return -EINVAL;
if (frame->fp_off && get_user(fp, (unsigned long __user *)(cfa + frame->fp_off)))
return -EINVAL;
state->ip = ra;
state->sp = cfa;
if (frame->fp_off)
state->fp = fp;
return 0;
}
static int unwind_user_next(struct unwind_user_state *state)
{
unsigned long iter_mask = state->available_types;
unsigned int bit;
if (state->done)
return -EINVAL;
for_each_set_bit(bit, &iter_mask, NR_UNWIND_USER_TYPE_BITS) {
enum unwind_user_type type = BIT(bit);
state->current_type = type;
switch (type) {
case UNWIND_USER_TYPE_FP:
if (!unwind_user_next_fp(state))
return 0;
continue;
default:
WARN_ONCE(1, "Undefined unwind bit %d", bit);
break;
}
break;
}
/* No successful unwind method. */
state->current_type = UNWIND_USER_TYPE_NONE;
state->done = true;
return -EINVAL;
}
static int unwind_user_start(struct unwind_user_state *state)
{
struct pt_regs *regs = task_pt_regs(current);
memset(state, 0, sizeof(*state));
if ((current->flags & PF_KTHREAD) || !user_mode(regs)) {
state->done = true;
return -EINVAL;
}
if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP))
state->available_types |= UNWIND_USER_TYPE_FP;
state->ip = instruction_pointer(regs);
state->sp = user_stack_pointer(regs);
state->fp = frame_pointer(regs);
return 0;
}
int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries)
{
struct unwind_user_state state;
trace->nr = 0;
if (!max_entries)
return -EINVAL;
if (current->flags & PF_KTHREAD)
return 0;
for_each_user_frame(&state) {
trace->entries[trace->nr++] = state.ip;
if (trace->nr >= max_entries)
break;
}
return 0;
}