linux/arch/arm64/include/asm/ptrace.h

369 lines
9.4 KiB
C
Raw Permalink Normal View History

/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Based on arch/arm/include/asm/ptrace.h
*
* Copyright (C) 1996-2003 Russell King
* Copyright (C) 2012 ARM Ltd.
*/
#ifndef __ASM_PTRACE_H
#define __ASM_PTRACE_H
#include <asm/cpufeature.h>
#include <uapi/asm/ptrace.h>
/* Current Exception Level values, as contained in CurrentEL */
#define CurrentEL_EL1 (1 << 2)
#define CurrentEL_EL2 (2 << 2)
arm64: head.S: always initialize PSTATE As with SCTLR_ELx and other control registers, some PSTATE bits are UNKNOWN out-of-reset, and we may not be able to rely on hardware or firmware to initialize them to our liking prior to entry to the kernel, e.g. in the primary/secondary boot paths and return from idle/suspend. It would be more robust (and easier to reason about) if we consistently initialized PSTATE to a default value, as we do with control registers. This will ensure that the kernel is not adversely affected by bits it is not aware of, e.g. when support for a feature such as PAN/UAO is disabled. This patch ensures that PSTATE is consistently initialized at boot time via an ERET. This is not intended to relax the existing requirements (e.g. DAIF bits must still be set prior to entering the kernel). For features detected dynamically (which may require system-wide support), it is still necessary to subsequently modify PSTATE. As ERET is not always a Context Synchronization Event, an ISB is placed before each exception return to ensure updates to control registers have taken effect. This handles the kernel being entered with SCTLR_ELx.EOS clear (or any future control bits being in an UNKNOWN state). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201113124937.20574-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-13 12:49:25 +00:00
#define INIT_PSTATE_EL1 \
(PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | PSR_MODE_EL1h)
#define INIT_PSTATE_EL2 \
(PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | PSR_MODE_EL2h)
arm64: irqchip/gic-v3: Select priorities at boot time The distributor and PMR/RPR can present different views of the interrupt priority space dependent upon the values of GICD_CTLR.DS and SCR_EL3.FIQ. Currently we treat the distributor's view of the priority space as canonical, and when the two differ we change the way we handle values in the PMR/RPR, using the `gic_nonsecure_priorities` static key to decide what to do. This approach works, but it's sub-optimal. When using pseudo-NMI we manipulate the distributor rarely, and we manipulate the PMR/RPR registers very frequently in code spread out throughout the kernel (e.g. local_irq_{save,restore}()). It would be nicer if we could use fixed values for the PMR/RPR, and dynamically choose the values programmed into the distributor. This patch changes the GICv3 driver and arm64 code accordingly. PMR values are chosen at compile time, and the GICv3 driver determines the appropriate values to program into the distributor at boot time. This removes the need for the `gic_nonsecure_priorities` static key and results in smaller and better generated code for saving/restoring the irqflags. Before this patch, local_irq_disable() compiles to: | 0000000000000000 <outlined_local_irq_disable>: | 0: d503201f nop | 4: d50343df msr daifset, #0x3 | 8: d65f03c0 ret | c: d503201f nop | 10: d2800c00 mov x0, #0x60 // #96 | 14: d5184600 msr icc_pmr_el1, x0 | 18: d65f03c0 ret | 1c: d2801400 mov x0, #0xa0 // #160 | 20: 17fffffd b 14 <outlined_local_irq_disable+0x14> After this patch, local_irq_disable() compiles to: | 0000000000000000 <outlined_local_irq_disable>: | 0: d503201f nop | 4: d50343df msr daifset, #0x3 | 8: d65f03c0 ret | c: d2801800 mov x0, #0xc0 // #192 | 10: d5184600 msr icc_pmr_el1, x0 | 14: d65f03c0 ret ... with 3 fewer instructions per call. For defconfig + CONFIG_PSEUDO_NMI=y, this results in a minor saving of ~4K of text, and will make it easier to make further improvements to the way we manipulate irqflags and DAIF bits. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240617111841.2529370-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2024-06-17 12:18:41 +01:00
#include <linux/irqchip/arm-gic-v3-prio.h>
#define GIC_PRIO_IRQON GICV3_PRIO_UNMASKED
#define GIC_PRIO_IRQOFF GICV3_PRIO_IRQ
arm64: irqchip/gic-v3: Select priorities at boot time The distributor and PMR/RPR can present different views of the interrupt priority space dependent upon the values of GICD_CTLR.DS and SCR_EL3.FIQ. Currently we treat the distributor's view of the priority space as canonical, and when the two differ we change the way we handle values in the PMR/RPR, using the `gic_nonsecure_priorities` static key to decide what to do. This approach works, but it's sub-optimal. When using pseudo-NMI we manipulate the distributor rarely, and we manipulate the PMR/RPR registers very frequently in code spread out throughout the kernel (e.g. local_irq_{save,restore}()). It would be nicer if we could use fixed values for the PMR/RPR, and dynamically choose the values programmed into the distributor. This patch changes the GICv3 driver and arm64 code accordingly. PMR values are chosen at compile time, and the GICv3 driver determines the appropriate values to program into the distributor at boot time. This removes the need for the `gic_nonsecure_priorities` static key and results in smaller and better generated code for saving/restoring the irqflags. Before this patch, local_irq_disable() compiles to: | 0000000000000000 <outlined_local_irq_disable>: | 0: d503201f nop | 4: d50343df msr daifset, #0x3 | 8: d65f03c0 ret | c: d503201f nop | 10: d2800c00 mov x0, #0x60 // #96 | 14: d5184600 msr icc_pmr_el1, x0 | 18: d65f03c0 ret | 1c: d2801400 mov x0, #0xa0 // #160 | 20: 17fffffd b 14 <outlined_local_irq_disable+0x14> After this patch, local_irq_disable() compiles to: | 0000000000000000 <outlined_local_irq_disable>: | 0: d503201f nop | 4: d50343df msr daifset, #0x3 | 8: d65f03c0 ret | c: d2801800 mov x0, #0xc0 // #192 | 10: d5184600 msr icc_pmr_el1, x0 | 14: d65f03c0 ret ... with 3 fewer instructions per call. For defconfig + CONFIG_PSEUDO_NMI=y, this results in a minor saving of ~4K of text, and will make it easier to make further improvements to the way we manipulate irqflags and DAIF bits. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240617111841.2529370-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2024-06-17 12:18:41 +01:00
#define GIC_PRIO_PSR_I_SET GICV3_PRIO_PSR_I_SET
/* Additional SPSR bits not exposed in the UABI */
#define PSR_MODE_THREAD_BIT (1 << 0)
#define PSR_IL_BIT (1 << 20)
/* AArch32-specific ptrace requests */
#define COMPAT_PTRACE_GETREGS 12
#define COMPAT_PTRACE_SETREGS 13
#define COMPAT_PTRACE_GET_THREAD_AREA 22
#define COMPAT_PTRACE_SET_SYSCALL 23
#define COMPAT_PTRACE_GETVFPREGS 27
#define COMPAT_PTRACE_SETVFPREGS 28
#define COMPAT_PTRACE_GETHBPREGS 29
#define COMPAT_PTRACE_SETHBPREGS 30
/* SPSR_ELx bits for exceptions taken from AArch32 */
#define PSR_AA32_MODE_MASK 0x0000001f
#define PSR_AA32_MODE_USR 0x00000010
#define PSR_AA32_MODE_FIQ 0x00000011
#define PSR_AA32_MODE_IRQ 0x00000012
#define PSR_AA32_MODE_SVC 0x00000013
#define PSR_AA32_MODE_ABT 0x00000017
#define PSR_AA32_MODE_HYP 0x0000001a
#define PSR_AA32_MODE_UND 0x0000001b
#define PSR_AA32_MODE_SYS 0x0000001f
#define PSR_AA32_T_BIT 0x00000020
#define PSR_AA32_F_BIT 0x00000040
#define PSR_AA32_I_BIT 0x00000080
#define PSR_AA32_A_BIT 0x00000100
#define PSR_AA32_E_BIT 0x00000200
#define PSR_AA32_PAN_BIT 0x00400000
#define PSR_AA32_SSBS_BIT 0x00800000
#define PSR_AA32_DIT_BIT 0x01000000
#define PSR_AA32_Q_BIT 0x08000000
#define PSR_AA32_V_BIT 0x10000000
#define PSR_AA32_C_BIT 0x20000000
#define PSR_AA32_Z_BIT 0x40000000
#define PSR_AA32_N_BIT 0x80000000
#define PSR_AA32_IT_MASK 0x0600fc00 /* If-Then execution state mask */
#define PSR_AA32_GE_MASK 0x000f0000
#ifdef CONFIG_CPU_BIG_ENDIAN
#define PSR_AA32_ENDSTATE PSR_AA32_E_BIT
#else
#define PSR_AA32_ENDSTATE 0
#endif
/* AArch32 CPSR bits, as seen in AArch32 */
#define COMPAT_PSR_DIT_BIT 0x00200000
/*
* These are 'magic' values for PTRACE_PEEKUSR that return info about where a
* process is located in memory.
*/
#define COMPAT_PT_TEXT_ADDR 0x10000
#define COMPAT_PT_DATA_ADDR 0x10004
#define COMPAT_PT_TEXT_END_ADDR 0x10008
/*
* If pt_regs.syscallno == NO_SYSCALL, then the thread is not executing
* a syscall -- i.e., its most recent entry into the kernel from
* userspace was not via SVC, or otherwise a tracer cancelled the syscall.
*
* This must have the value -1, for ABI compatibility with ptrace etc.
*/
#define NO_SYSCALL (-1)
#ifndef __ASSEMBLY__
#include <linux/bug.h>
#include <linux/types.h>
#include <asm/stacktrace/frame.h>
/* sizeof(struct user) for AArch32 */
#define COMPAT_USER_SZ 296
/* Architecturally defined mapping between AArch32 and AArch64 registers */
#define compat_usr(x) regs[(x)]
#define compat_fp regs[11]
#define compat_sp regs[13]
#define compat_lr regs[14]
#define compat_sp_hyp regs[15]
#define compat_lr_irq regs[16]
#define compat_sp_irq regs[17]
#define compat_lr_svc regs[18]
#define compat_sp_svc regs[19]
#define compat_lr_abt regs[20]
#define compat_sp_abt regs[21]
#define compat_lr_und regs[22]
#define compat_sp_und regs[23]
#define compat_r8_fiq regs[24]
#define compat_r9_fiq regs[25]
#define compat_r10_fiq regs[26]
#define compat_r11_fiq regs[27]
#define compat_r12_fiq regs[28]
#define compat_sp_fiq regs[29]
#define compat_lr_fiq regs[30]
static inline unsigned long compat_psr_to_pstate(const unsigned long psr)
{
unsigned long pstate;
pstate = psr & ~COMPAT_PSR_DIT_BIT;
if (psr & COMPAT_PSR_DIT_BIT)
pstate |= PSR_AA32_DIT_BIT;
return pstate;
}
static inline unsigned long pstate_to_compat_psr(const unsigned long pstate)
{
unsigned long psr;
psr = pstate & ~PSR_AA32_DIT_BIT;
if (pstate & PSR_AA32_DIT_BIT)
psr |= COMPAT_PSR_DIT_BIT;
return psr;
}
/*
* This struct defines the way the registers are stored on the stack during an
* exception. struct user_pt_regs must form a prefix of struct pt_regs.
*/
struct pt_regs {
union {
struct user_pt_regs user_regs;
struct {
u64 regs[31];
u64 sp;
u64 pc;
u64 pstate;
};
};
u64 orig_x0;
arm64: syscallno is secretly an int, make it official The upper 32 bits of the syscallno field in thread_struct are handled inconsistently, being sometimes zero extended and sometimes sign-extended. In fact, only the lower 32 bits seem to have any real significance for the behaviour of the code: it's been OK to handle the upper bits inconsistently because they don't matter. Currently, the only place I can find where those bits are significant is in calling trace_sys_enter(), which may be unintentional: for example, if a compat tracer attempts to cancel a syscall by passing -1 to (COMPAT_)PTRACE_SET_SYSCALL at the syscall-enter-stop, it will be traced as syscall 4294967295 rather than -1 as might be expected (and as occurs for a native tracer doing the same thing). Elsewhere, reads of syscallno cast it to an int or truncate it. There's also a conspicuous amount of code and casting to bodge around the fact that although semantically an int, syscallno is stored as a u64. Let's not pretend any more. In order to preserve the stp x instruction that stores the syscall number in entry.S, this patch special-cases the layout of struct pt_regs for big endian so that the newly 32-bit syscallno field maps onto the low bits of the stored value. This is not beautiful, but benchmarking of the getpid syscall on Juno suggests indicates a minor slowdown if the stp is split into an stp x and stp w. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-01 15:35:53 +01:00
s32 syscallno;
u32 pmr;
arm64: pt_regs: remove stale big-endian layout For historical reasons the layout of struct pt_regs depends on the configured endianness, with the order of the 'syscallno' and 'unused2' fields varying dependent upon whether __AARCH64EB__ is defined. We no longer depend on the order of these two fields and can remove the ifdeffery. The current conditional layout was introduced in commit: 35d0e6fb4d219d64 ("arm64: syscallno is secretly an int, make it official") At the time, this was necessary so that the entry assembly could use a single STP instruction to save the pt_regs::{orig_x0,syscallno} fields, without logic that was conditional on the endianness of the kernel: | el0_svc_naked: | stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number This logic was converted to C in commit: f37099b6992a0b81 ("arm64: convert syscall trace logic to C") Since that commit, we no longer manipulate pt_regs::orig_x0 from assembly, and only manipulate pt_regs::syscallno as a 32-bit quantity early in the kernel_entry assembly: | /* Not in a syscall by default (el0_svc overwrites for real syscall) */ | .if \el == 0 | mov w21, #NO_SYSCALL | str w21, [sp, #S_SYSCALLNO] | .endif Given the above, there's no longer a need for the layout of pt_regs::{syscallno,unused2} to depend on the endianness of the kernel. This patch removes the ifdeffery and places 'syscallno' before 'unused2' regardless of the endianess of the kernel. At the same time, 'unused2' is renamed to 'unused', as it is the only unused field within pt_regs. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17 10:25:30 +01:00
arm64: uaccess: remove set_fs() Now that the uaccess primitives dont take addr_limit into account, we have no need to manipulate this via set_fs() and get_fs(). Remove support for these, along with some infrastructure this renders redundant. We no longer need to flip UAO to access kernel memory under KERNEL_DS, and head.S unconditionally clears UAO for all kernel configurations via an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO, nor do we need to context-switch it. However, we still need to adjust PAN during SDEI entry. Masking of __user pointers no longer needs to use the dynamic value of addr_limit, and can use a constant derived from the maximum possible userspace task size. A new TASK_SIZE_MAX constant is introduced for this, which is also used by core code. In configurations supporting 52-bit VAs, this may include a region of unusable VA space above a 48-bit TTBR0 limit, but never includes any portion of TTBR1. Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and KERNEL_DS were inclusive limits, and is converted to a mask by subtracting one. As the SDEI entry code repurposes the otherwise unnecessary pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted context, for now we rename that to pt_regs::sdei_ttbr1. In future we can consider factoring that out. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: James Morse <james.morse@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02 13:15:55 +00:00
u64 sdei_ttbr1;
arm64: stacktrace: unwind exception boundaries When arm64's stack unwinder encounters an exception boundary, it uses the pt_regs::stackframe created by the entry code, which has a copy of the PC and FP at the time the exception was taken. The unwinder doesn't know anything about pt_regs, and reports the PC from the stackframe, but does not report the LR. The LR is only guaranteed to contain the return address at function call boundaries, and can be used as a scratch register at other times, so the LR at an exception boundary may or may not be a legitimate return address. It would be useful to report the LR value regardless, as it can be helpful when debugging, and in future it will be helpful for reliable stacktrace support. This patch changes the way we unwind across exception boundaries, allowing both the PC and LR to be reported. The entry code creates a frame_record_meta structure embedded within pt_regs, which the unwinder uses to find the pt_regs. The unwinder can then extract pt_regs::pc and pt_regs::lr as two separate unwind steps before continuing with a regular walk of frame records. When a PC is unwound from pt_regs::lr, dump_backtrace() will log this with an "L" marker so that it can be identified easily. For example, an unwind across an exception boundary will appear as follows: | el1h_64_irq+0x6c/0x70 | _raw_spin_unlock_irqrestore+0x10/0x60 (P) | __aarch64_insn_write+0x6c/0x90 (L) | aarch64_insn_patch_text_nosync+0x28/0x80 ... with a (P) entry for pt_regs::pc, and an (L) entry for pt_regs:lr. Note that the LR may be stale at the point of the exception, for example, shortly after a return: | el1h_64_irq+0x6c/0x70 | default_idle_call+0x34/0x180 (P) | default_idle_call+0x28/0x180 (L) | do_idle+0x204/0x268 ... where the LR points a few instructions before the current PC. This plays nicely with all the other unwind metadata tracking. With the ftrace_graph profiler enabled globally, and kretprobes installed on generic_handle_domain_irq() and do_interrupt_handler(), a backtrace triggered by magic-sysrq + L reports: | Call trace: | show_stack+0x20/0x40 (CF) | dump_stack_lvl+0x60/0x80 (F) | dump_stack+0x18/0x28 | nmi_cpu_backtrace+0xfc/0x140 | nmi_trigger_cpumask_backtrace+0x1c8/0x200 | arch_trigger_cpumask_backtrace+0x20/0x40 | sysrq_handle_showallcpus+0x24/0x38 (F) | __handle_sysrq+0xa8/0x1b0 (F) | handle_sysrq+0x38/0x50 (F) | pl011_int+0x460/0x5a8 (F) | __handle_irq_event_percpu+0x60/0x220 (F) | handle_irq_event+0x54/0xc0 (F) | handle_fasteoi_irq+0xa8/0x1d0 (F) | generic_handle_domain_irq+0x34/0x58 (F) | gic_handle_irq+0x54/0x140 (FK) | call_on_irq_stack+0x24/0x58 (F) | do_interrupt_handler+0x88/0xa0 | el1_interrupt+0x34/0x68 (FK) | el1h_64_irq_handler+0x18/0x28 | el1h_64_irq+0x6c/0x70 | default_idle_call+0x34/0x180 (P) | default_idle_call+0x28/0x180 (L) | do_idle+0x204/0x268 | cpu_startup_entry+0x3c/0x50 (F) | rest_init+0xe4/0xf0 | start_kernel+0x744/0x750 | __primary_switched+0x88/0x98 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-11-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17 10:25:38 +01:00
struct frame_record_meta stackframe;
/* Only valid for some EL1 exceptions. */
u64 lockdep_hardirqs;
u64 exit_rcu;
};
/* For correct stack alignment, pt_regs has to be a multiple of 16 bytes. */
static_assert(IS_ALIGNED(sizeof(struct pt_regs), 16));
static inline bool in_syscall(struct pt_regs const *regs)
{
return regs->syscallno != NO_SYSCALL;
}
static inline void forget_syscall(struct pt_regs *regs)
{
regs->syscallno = NO_SYSCALL;
}
#define MAX_REG_OFFSET offsetof(struct pt_regs, pstate)
#define arch_has_single_step() (1)
#ifdef CONFIG_COMPAT
#define compat_thumb_mode(regs) \
(((regs)->pstate & PSR_AA32_T_BIT))
#else
#define compat_thumb_mode(regs) (0)
#endif
#define user_mode(regs) \
(((regs)->pstate & PSR_MODE_MASK) == PSR_MODE_EL0t)
#define compat_user_mode(regs) \
(((regs)->pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) == \
(PSR_MODE32_BIT | PSR_MODE_EL0t))
#define processor_mode(regs) \
((regs)->pstate & PSR_MODE_MASK)
#define irqs_priority_unmasked(regs) \
(system_uses_irq_prio_masking() ? \
(regs)->pmr == GIC_PRIO_IRQON : \
true)
#define interrupts_enabled(regs) \
(!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
#define fast_interrupts_enabled(regs) \
(!((regs)->pstate & PSR_F_BIT))
static inline unsigned long user_stack_pointer(struct pt_regs *regs)
{
if (compat_user_mode(regs))
return regs->compat_sp;
return regs->sp;
}
arm64: Kprobes with single stepping support Add support for basic kernel probes(kprobes) and jump probes (jprobes) for ARM64. Kprobes utilizes software breakpoint and single step debug exceptions supported on ARM v8. A software breakpoint is placed at the probe address to trap the kernel execution into the kprobe handler. ARM v8 supports enabling single stepping before the break exception return (ERET), with next PC in exception return address (ELR_EL1). The kprobe handler prepares an executable memory slot for out-of-line execution with a copy of the original instruction being probed, and enables single stepping. The PC is set to the out-of-line slot address before the ERET. With this scheme, the instruction is executed with the exact same register context except for the PC (and DAIF) registers. Debug mask (PSTATE.D) is enabled only when single stepping a recursive kprobe, e.g.: during kprobes reenter so that probed instruction can be single stepped within the kprobe handler -exception- context. The recursion depth of kprobe is always 2, i.e. upon probe re-entry, any further re-entry is prevented by not calling handlers and the case counted as a missed kprobe). Single stepping from the x-o-l slot has a drawback for PC-relative accesses like branching and symbolic literals access as the offset from the new PC (slot address) may not be ensured to fit in the immediate value of the opcode. Such instructions need simulation, so reject probing them. Instructions generating exceptions or cpu mode change are rejected for probing. Exclusive load/store instructions are rejected too. Additionally, the code is checked to see if it is inside an exclusive load/store sequence (code from Pratyush). System instructions are mostly enabled for stepping, except MSR/MRS accesses to "DAIF" flags in PSTATE, which are not safe for probing. This also changes arch/arm64/include/asm/ptrace.h to use include/asm-generic/ptrace.h. Thanks to Steve Capper and Pratyush Anand for several suggested Changes. Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com> Signed-off-by: David A. Long <dave.long@linaro.org> Signed-off-by: Pratyush Anand <panand@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-08 12:35:48 -04:00
extern int regs_query_register_offset(const char *name);
extern unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
unsigned int n);
/**
* regs_get_register() - get register value from its offset
* @regs: pt_regs from which register value is gotten
* @offset: offset of the register.
*
* regs_get_register returns the value of a register whose offset from @regs.
* The @offset is the offset of the register in struct pt_regs.
* If @offset is bigger than MAX_REG_OFFSET, this returns 0.
*/
static inline u64 regs_get_register(struct pt_regs *regs, unsigned int offset)
{
u64 val = 0;
WARN_ON(offset & 7);
offset >>= 3;
switch (offset) {
case 0 ... 30:
val = regs->regs[offset];
break;
case offsetof(struct pt_regs, sp) >> 3:
val = regs->sp;
break;
case offsetof(struct pt_regs, pc) >> 3:
val = regs->pc;
break;
case offsetof(struct pt_regs, pstate) >> 3:
val = regs->pstate;
break;
default:
val = 0;
}
return val;
}
/*
* Read a register given an architectural register index r.
* This handles the common case where 31 means XZR, not SP.
*/
static inline unsigned long pt_regs_read_reg(const struct pt_regs *regs, int r)
{
return (r == 31) ? 0 : regs->regs[r];
}
/*
* Write a register given an architectural register index r.
* This handles the common case where 31 means XZR, not SP.
*/
static inline void pt_regs_write_reg(struct pt_regs *regs, int r,
unsigned long val)
{
if (r != 31)
regs->regs[r] = val;
}
/* Valid only for Kernel mode traps. */
static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
{
return regs->sp;
}
static inline unsigned long regs_return_value(struct pt_regs *regs)
{
arm64: fix compat syscall return truncation Due to inconsistencies in the way we manipulate compat GPRs, we have a few issues today: * For audit and tracing, where error codes are handled as a (native) long, negative error codes are expected to be sign-extended to the native 64-bits, or they may fail to be matched correctly. Thus a syscall which fails with an error may erroneously be identified as failing. * For ptrace, *all* compat return values should be sign-extended for consistency with 32-bit arm, but we currently only do this for negative return codes. * As we may transiently set the upper 32 bits of some compat GPRs while in the kernel, these can be sampled by perf, which is somewhat confusing. This means that where a syscall returns a pointer above 2G, this will be sign-extended, but will not be mistaken for an error as error codes are constrained to the inclusive range [-4096, -1] where no user pointer can exist. To fix all of these, we must consistently use helpers to get/set the compat GPRs, ensuring that we never write the upper 32 bits of the return code, and always sign-extend when reading the return code. This patch does so, with the following changes: * We re-organise syscall_get_return_value() to always sign-extend for compat tasks, and reimplement syscall_get_error() atop. We update syscall_trace_exit() to use syscall_get_return_value(). * We consistently use syscall_set_return_value() to set the return value, ensureing the upper 32 bits are never set unexpectedly. * As the core audit code currently uses regs_return_value() rather than syscall_get_return_value(), we special-case this for compat_user_mode(regs) such that this will do the right thing. Going forward, we should try to move the core audit code over to syscall_get_return_value(). Cc: <stable@vger.kernel.org> Reported-by: He Zhe <zhe.he@windriver.com> Reported-by: weiyuchen <weiyuchen3@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210802104200.21390-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-02 11:42:00 +01:00
unsigned long val = regs->regs[0];
/*
* Audit currently uses regs_return_value() instead of
* syscall_get_return_value(). Apply the same sign-extension here until
* audit is updated to use syscall_get_return_value().
*/
if (compat_user_mode(regs))
val = sign_extend64(val, 31);
return val;
}
static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
{
regs->regs[0] = rc;
}
/**
* regs_get_kernel_argument() - get Nth function argument in kernel
* @regs: pt_regs of that context
* @n: function argument number (start from 0)
*
* regs_get_argument() returns @n th argument of the function call.
*
* Note that this chooses the most likely register mapping. In very rare
* cases this may not return correct data, for example, if one of the
* function parameters is 16 bytes or bigger. In such cases, we cannot
* get access the parameter correctly and the register assignment of
* subsequent parameters will be shifted.
*/
static inline unsigned long regs_get_kernel_argument(struct pt_regs *regs,
unsigned int n)
{
#define NR_REG_ARGUMENTS 8
if (n < NR_REG_ARGUMENTS)
return pt_regs_read_reg(regs, n);
return 0;
}
/* We must avoid circular header include via sched.h */
struct task_struct;
int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task);
static inline unsigned long instruction_pointer(struct pt_regs *regs)
{
return regs->pc;
}
static inline void instruction_pointer_set(struct pt_regs *regs,
unsigned long val)
{
regs->pc = val;
}
arm64: Kprobes with single stepping support Add support for basic kernel probes(kprobes) and jump probes (jprobes) for ARM64. Kprobes utilizes software breakpoint and single step debug exceptions supported on ARM v8. A software breakpoint is placed at the probe address to trap the kernel execution into the kprobe handler. ARM v8 supports enabling single stepping before the break exception return (ERET), with next PC in exception return address (ELR_EL1). The kprobe handler prepares an executable memory slot for out-of-line execution with a copy of the original instruction being probed, and enables single stepping. The PC is set to the out-of-line slot address before the ERET. With this scheme, the instruction is executed with the exact same register context except for the PC (and DAIF) registers. Debug mask (PSTATE.D) is enabled only when single stepping a recursive kprobe, e.g.: during kprobes reenter so that probed instruction can be single stepped within the kprobe handler -exception- context. The recursion depth of kprobe is always 2, i.e. upon probe re-entry, any further re-entry is prevented by not calling handlers and the case counted as a missed kprobe). Single stepping from the x-o-l slot has a drawback for PC-relative accesses like branching and symbolic literals access as the offset from the new PC (slot address) may not be ensured to fit in the immediate value of the opcode. Such instructions need simulation, so reject probing them. Instructions generating exceptions or cpu mode change are rejected for probing. Exclusive load/store instructions are rejected too. Additionally, the code is checked to see if it is inside an exclusive load/store sequence (code from Pratyush). System instructions are mostly enabled for stepping, except MSR/MRS accesses to "DAIF" flags in PSTATE, which are not safe for probing. This also changes arch/arm64/include/asm/ptrace.h to use include/asm-generic/ptrace.h. Thanks to Steve Capper and Pratyush Anand for several suggested Changes. Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com> Signed-off-by: David A. Long <dave.long@linaro.org> Signed-off-by: Pratyush Anand <panand@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-08 12:35:48 -04:00
static inline unsigned long frame_pointer(struct pt_regs *regs)
{
return regs->regs[29];
}
#define procedure_link_pointer(regs) ((regs)->regs[30])
static inline void procedure_link_pointer_set(struct pt_regs *regs,
unsigned long val)
{
procedure_link_pointer(regs) = val;
}
extern unsigned long profile_pc(struct pt_regs *regs);
#endif /* __ASSEMBLY__ */
#endif