2019-06-01 10:08:55 +02:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0-only */
|
2016-06-22 21:55:06 +05:30
|
|
|
/*
|
|
|
|
* bpf_jit.h: BPF JIT compiler for PPC
|
2011-07-20 15:51:00 +00:00
|
|
|
*
|
|
|
|
* Copyright 2011 Matt Evans <matt@ozlabs.org>, IBM Corporation
|
2016-06-22 21:55:07 +05:30
|
|
|
* 2016 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
|
2011-07-20 15:51:00 +00:00
|
|
|
*/
|
|
|
|
#ifndef _BPF_JIT_H
|
|
|
|
#define _BPF_JIT_H
|
|
|
|
|
|
|
|
#ifndef __ASSEMBLY__
|
|
|
|
|
2016-06-22 21:55:07 +05:30
|
|
|
#include <asm/types.h>
|
2020-06-24 17:00:35 +05:30
|
|
|
#include <asm/ppc-opcode.h>
|
powerpc64/bpf: Add support for bpf trampolines
Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc. While the code is generic, BPF trampolines are only
enabled on 64-bit powerpc. 32-bit powerpc will need testing and some
updates.
BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.
BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.
When attaching a bpf trampoline to a bpf prog, we can patch up to three
things:
- the nop at bpf prog entry to go to the out-of-line stub
- the instruction in the out-of-line stub to either call the bpf trampoline
directly, or to branch to the long_branch stub.
- the trampoline address before the long_branch stub.
We do not need any synchronization here since we always have a valid
branch target regardless of the order in which the above stores are
seen. dummy_tramp ensures that the long_branch stub goes to a valid
destination on other cpus, even when the branch to the long_branch stub
is seen before the updated trampoline address.
However, when detaching a bpf trampoline from a bpf prog, or if changing
the bpf trampoline address, we need synchronization to ensure that other
cpus can no longer branch into the older trampoline so that it can be
safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
make forward progress, but we still need to ensure that other cpus
execute isync (or some CSI) so that they don't go back into the
trampoline again. While here, update the stale comment that describes
the redzone usage in ppc64 BPF JIT.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-18-hbathini@linux.ibm.com
2024-10-30 12:38:50 +05:30
|
|
|
#include <linux/build_bug.h>
|
2016-06-22 21:55:07 +05:30
|
|
|
|
2022-05-09 07:36:07 +02:00
|
|
|
#ifdef CONFIG_PPC64_ELF_ABI_V1
|
2011-07-20 15:51:00 +00:00
|
|
|
#define FUNCTION_DESCR_SIZE 24
|
2015-02-17 10:04:40 +03:00
|
|
|
#else
|
|
|
|
#define FUNCTION_DESCR_SIZE 0
|
|
|
|
#endif
|
2011-07-20 15:51:00 +00:00
|
|
|
|
2023-04-08 12:17:51 +10:00
|
|
|
#define CTX_NIA(ctx) ((unsigned long)ctx->idx * 4)
|
|
|
|
|
powerpc64/bpf: Add support for bpf trampolines
Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc. While the code is generic, BPF trampolines are only
enabled on 64-bit powerpc. 32-bit powerpc will need testing and some
updates.
BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.
BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.
When attaching a bpf trampoline to a bpf prog, we can patch up to three
things:
- the nop at bpf prog entry to go to the out-of-line stub
- the instruction in the out-of-line stub to either call the bpf trampoline
directly, or to branch to the long_branch stub.
- the trampoline address before the long_branch stub.
We do not need any synchronization here since we always have a valid
branch target regardless of the order in which the above stores are
seen. dummy_tramp ensures that the long_branch stub goes to a valid
destination on other cpus, even when the branch to the long_branch stub
is seen before the updated trampoline address.
However, when detaching a bpf trampoline from a bpf prog, or if changing
the bpf trampoline address, we need synchronization to ensure that other
cpus can no longer branch into the older trampoline so that it can be
safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
make forward progress, but we still need to ensure that other cpus
execute isync (or some CSI) so that they don't go back into the
trampoline again. While here, update the stale comment that describes
the redzone usage in ppc64 BPF JIT.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-18-hbathini@linux.ibm.com
2024-10-30 12:38:50 +05:30
|
|
|
#define SZL sizeof(unsigned long)
|
|
|
|
#define BPF_INSN_SAFETY 64
|
|
|
|
|
2011-07-20 15:51:00 +00:00
|
|
|
#define PLANT_INSTR(d, idx, instr) \
|
|
|
|
do { if (d) { (d)[idx] = instr; } idx++; } while (0)
|
|
|
|
#define EMIT(instr) PLANT_INSTR(image, ctx->idx, instr)
|
|
|
|
|
|
|
|
/* Long jump; (unconditional 'branch') */
|
2021-10-06 01:55:21 +05:30
|
|
|
#define PPC_JMP(dest) \
|
|
|
|
do { \
|
2023-04-08 12:17:51 +10:00
|
|
|
long offset = (long)(dest) - CTX_NIA(ctx); \
|
2022-02-14 16:11:35 +05:30
|
|
|
if ((dest) != 0 && !is_offset_in_branch_range(offset)) { \
|
2021-10-06 01:55:21 +05:30
|
|
|
pr_err_ratelimited("Branch offset 0x%lx (@%u) out of range\n", offset, ctx->idx); \
|
|
|
|
return -ERANGE; \
|
|
|
|
} \
|
2021-10-12 18:00:52 +05:30
|
|
|
EMIT(PPC_RAW_BRANCH(offset)); \
|
2021-10-06 01:55:21 +05:30
|
|
|
} while (0)
|
|
|
|
|
2011-07-20 15:51:00 +00:00
|
|
|
/* "cond" here covers BO:BI fields. */
|
2021-10-06 01:55:21 +05:30
|
|
|
#define PPC_BCC_SHORT(cond, dest) \
|
|
|
|
do { \
|
2023-04-08 12:17:51 +10:00
|
|
|
long offset = (long)(dest) - CTX_NIA(ctx); \
|
2022-02-14 16:11:35 +05:30
|
|
|
if ((dest) != 0 && !is_offset_in_cond_branch_range(offset)) { \
|
2021-10-06 01:55:21 +05:30
|
|
|
pr_err_ratelimited("Conditional branch offset 0x%lx (@%u) out of range\n", offset, ctx->idx); \
|
|
|
|
return -ERANGE; \
|
|
|
|
} \
|
|
|
|
EMIT(PPC_INST_BRANCH_COND | (((cond) & 0x3ff) << 16) | (offset & 0xfffc)); \
|
|
|
|
} while (0)
|
|
|
|
|
2016-06-22 21:55:02 +05:30
|
|
|
/* Sign-extended 32-bit immediate load */
|
|
|
|
#define PPC_LI32(d, i) do { \
|
|
|
|
if ((int)(uintptr_t)(i) >= -32768 && \
|
|
|
|
(int)(uintptr_t)(i) < 32768) \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_LI(d, i)); \
|
2016-06-22 21:55:02 +05:30
|
|
|
else { \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_LIS(d, IMM_H(i))); \
|
2016-06-22 21:55:02 +05:30
|
|
|
if (IMM_L(i)) \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_ORI(d, d, IMM_L(i))); \
|
2011-07-20 15:51:00 +00:00
|
|
|
} } while(0)
|
2016-06-22 21:55:02 +05:30
|
|
|
|
2022-02-14 16:11:47 +05:30
|
|
|
#ifdef CONFIG_PPC64
|
2011-07-20 15:51:00 +00:00
|
|
|
#define PPC_LI64(d, i) do { \
|
2016-06-22 21:55:03 +05:30
|
|
|
if ((long)(i) >= -2147483648 && \
|
|
|
|
(long)(i) < 2147483648) \
|
2011-07-20 15:51:00 +00:00
|
|
|
PPC_LI32(d, i); \
|
|
|
|
else { \
|
2016-06-22 21:55:03 +05:30
|
|
|
if (!((uintptr_t)(i) & 0xffff800000000000ULL)) \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_LI(d, ((uintptr_t)(i) >> 32) & \
|
|
|
|
0xffff)); \
|
2016-06-22 21:55:03 +05:30
|
|
|
else { \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_LIS(d, ((uintptr_t)(i) >> 48))); \
|
2016-06-22 21:55:03 +05:30
|
|
|
if ((uintptr_t)(i) & 0x0000ffff00000000ULL) \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_ORI(d, d, \
|
|
|
|
((uintptr_t)(i) >> 32) & 0xffff)); \
|
2016-06-22 21:55:03 +05:30
|
|
|
} \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_SLDI(d, d, 32)); \
|
2011-07-20 15:51:00 +00:00
|
|
|
if ((uintptr_t)(i) & 0x00000000ffff0000ULL) \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_ORIS(d, d, \
|
|
|
|
((uintptr_t)(i) >> 16) & 0xffff)); \
|
2011-07-20 15:51:00 +00:00
|
|
|
if ((uintptr_t)(i) & 0x000000000000ffffULL) \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_ORI(d, d, (uintptr_t)(i) & \
|
|
|
|
0xffff)); \
|
2016-06-22 21:55:03 +05:30
|
|
|
} } while (0)
|
powerpc64/bpf: Add support for bpf trampolines
Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc. While the code is generic, BPF trampolines are only
enabled on 64-bit powerpc. 32-bit powerpc will need testing and some
updates.
BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.
BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.
When attaching a bpf trampoline to a bpf prog, we can patch up to three
things:
- the nop at bpf prog entry to go to the out-of-line stub
- the instruction in the out-of-line stub to either call the bpf trampoline
directly, or to branch to the long_branch stub.
- the trampoline address before the long_branch stub.
We do not need any synchronization here since we always have a valid
branch target regardless of the order in which the above stores are
seen. dummy_tramp ensures that the long_branch stub goes to a valid
destination on other cpus, even when the branch to the long_branch stub
is seen before the updated trampoline address.
However, when detaching a bpf trampoline from a bpf prog, or if changing
the bpf trampoline address, we need synchronization to ensure that other
cpus can no longer branch into the older trampoline so that it can be
safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
make forward progress, but we still need to ensure that other cpus
execute isync (or some CSI) so that they don't go back into the
trampoline again. While here, update the stale comment that describes
the redzone usage in ppc64 BPF JIT.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-18-hbathini@linux.ibm.com
2024-10-30 12:38:50 +05:30
|
|
|
#define PPC_LI_ADDR PPC_LI64
|
|
|
|
|
|
|
|
#ifndef CONFIG_PPC_KERNEL_PCREL
|
|
|
|
#define PPC64_LOAD_PACA() \
|
|
|
|
EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, kernel_toc)))
|
|
|
|
#else
|
|
|
|
#define PPC64_LOAD_PACA() do {} while (0)
|
|
|
|
#endif
|
|
|
|
#else
|
|
|
|
#define PPC_LI64(d, i) BUILD_BUG()
|
|
|
|
#define PPC_LI_ADDR PPC_LI32
|
|
|
|
#define PPC64_LOAD_PACA() BUILD_BUG()
|
2015-02-17 10:04:40 +03:00
|
|
|
#endif
|
|
|
|
|
2011-07-20 15:51:00 +00:00
|
|
|
/*
|
|
|
|
* The fly in the ointment of code size changing from pass to pass is
|
|
|
|
* avoided by padding the short branch case with a NOP. If code size differs
|
|
|
|
* with different branch reaches we will have the issue of code moving from
|
|
|
|
* one pass to the next and will need a few passes to converge on a stable
|
|
|
|
* state.
|
|
|
|
*/
|
|
|
|
#define PPC_BCC(cond, dest) do { \
|
2023-04-08 12:17:51 +10:00
|
|
|
if (is_offset_in_cond_branch_range((long)(dest) - CTX_NIA(ctx))) { \
|
2011-07-20 15:51:00 +00:00
|
|
|
PPC_BCC_SHORT(cond, dest); \
|
2020-06-24 17:00:36 +05:30
|
|
|
EMIT(PPC_RAW_NOP()); \
|
2011-07-20 15:51:00 +00:00
|
|
|
} else { \
|
|
|
|
/* Flip the 'T or F' bit to invert comparison */ \
|
2023-04-08 12:17:51 +10:00
|
|
|
PPC_BCC_SHORT(cond ^ COND_CMP_TRUE, CTX_NIA(ctx) + 2*4); \
|
2011-07-20 15:51:00 +00:00
|
|
|
PPC_JMP(dest); \
|
|
|
|
} } while(0)
|
|
|
|
|
|
|
|
/* To create a branch condition, select a bit of cr0... */
|
|
|
|
#define CR0_LT 0
|
|
|
|
#define CR0_GT 1
|
|
|
|
#define CR0_EQ 2
|
|
|
|
/* ...and modify BO[3] */
|
|
|
|
#define COND_CMP_TRUE 0x100
|
|
|
|
#define COND_CMP_FALSE 0x000
|
|
|
|
/* Together, they make all required comparisons: */
|
|
|
|
#define COND_GT (CR0_GT | COND_CMP_TRUE)
|
|
|
|
#define COND_GE (CR0_LT | COND_CMP_FALSE)
|
|
|
|
#define COND_EQ (CR0_EQ | COND_CMP_TRUE)
|
|
|
|
#define COND_NE (CR0_EQ | COND_CMP_FALSE)
|
|
|
|
#define COND_LT (CR0_LT | COND_CMP_TRUE)
|
2017-08-10 01:40:00 +02:00
|
|
|
#define COND_LE (CR0_GT | COND_CMP_FALSE)
|
2011-07-20 15:51:00 +00:00
|
|
|
|
2021-03-22 16:37:50 +00:00
|
|
|
#define SEEN_FUNC 0x20000000 /* might call external helpers */
|
2021-10-12 18:00:49 +05:30
|
|
|
#define SEEN_TAILCALL 0x40000000 /* uses tail calls */
|
2021-03-22 16:37:48 +00:00
|
|
|
|
|
|
|
struct codegen_context {
|
|
|
|
/*
|
|
|
|
* This is used to track register usage as well
|
|
|
|
* as calls to external helpers.
|
|
|
|
* - register usage is tracked with corresponding
|
2021-03-22 16:37:50 +00:00
|
|
|
* bits (r3-r31)
|
2021-03-22 16:37:48 +00:00
|
|
|
* - rest of the bits can be used to track other
|
2021-03-22 16:37:50 +00:00
|
|
|
* things -- for now, we use bits 0 to 2
|
2021-03-22 16:37:48 +00:00
|
|
|
* encoded in SEEN_* macros above
|
|
|
|
*/
|
|
|
|
unsigned int seen;
|
|
|
|
unsigned int idx;
|
|
|
|
unsigned int stack_size;
|
2022-02-14 16:11:51 +05:30
|
|
|
int b2p[MAX_BPF_JIT_REG + 2];
|
2021-10-12 18:00:53 +05:30
|
|
|
unsigned int exentry_idx;
|
2022-02-14 16:11:37 +05:30
|
|
|
unsigned int alt_exit_addr;
|
2021-03-22 16:37:48 +00:00
|
|
|
};
|
|
|
|
|
2022-02-14 16:11:51 +05:30
|
|
|
#define bpf_to_ppc(r) (ctx->b2p[r])
|
|
|
|
|
bpf ppc32: Add BPF_PROBE_MEM support for JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.
Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains 3 instructions,
first 2 instructions clear dest_reg (lower & higher 32-bit registers)
and last instruction jumps to next instruction in the BPF code.
extable 'insn' field contains relative offset of the instruction and
'fixup' field contains relative offset of the fixup entry. Example
layout of BPF program with extable present:
+------------------+
| |
| |
0x4020 -->| lwz r28,4(r4) |
| |
| |
0x40ac -->| lwz r3,0(r24) |
| lwz r4,4(r24) |
| |
| |
|------------------|
0x4278 -->| li r28,0 | \
| li r27,0 | | fixup entry
| b 0x4024 | /
0x4284 -->| li r4,0 |
| li r3,0 |
| b 0x40b4 |
|------------------|
0x4290 -->| insn=0xfffffd90 | \ extable entry
| fixup=0xffffffe4 | /
0x4298 -->| insn=0xfffffe14 |
| fixup=0xffffffe8 |
+------------------+
(Addresses shown here are chosen random, not real)
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-8-hbathini@linux.ibm.com
2021-10-12 18:00:55 +05:30
|
|
|
#ifdef CONFIG_PPC32
|
|
|
|
#define BPF_FIXUP_LEN 3 /* Three instructions => 12 bytes */
|
|
|
|
#else
|
2021-10-12 18:00:53 +05:30
|
|
|
#define BPF_FIXUP_LEN 2 /* Two instructions => 8 bytes */
|
bpf ppc32: Add BPF_PROBE_MEM support for JIT
BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.
Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains 3 instructions,
first 2 instructions clear dest_reg (lower & higher 32-bit registers)
and last instruction jumps to next instruction in the BPF code.
extable 'insn' field contains relative offset of the instruction and
'fixup' field contains relative offset of the fixup entry. Example
layout of BPF program with extable present:
+------------------+
| |
| |
0x4020 -->| lwz r28,4(r4) |
| |
| |
0x40ac -->| lwz r3,0(r24) |
| lwz r4,4(r24) |
| |
| |
|------------------|
0x4278 -->| li r28,0 | \
| li r27,0 | | fixup entry
| b 0x4024 | /
0x4284 -->| li r4,0 |
| li r3,0 |
| b 0x40b4 |
|------------------|
0x4290 -->| insn=0xfffffd90 | \ extable entry
| fixup=0xffffffe4 | /
0x4298 -->| insn=0xfffffe14 |
| fixup=0xffffffe8 |
+------------------+
(Addresses shown here are chosen random, not real)
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-8-hbathini@linux.ibm.com
2021-10-12 18:00:55 +05:30
|
|
|
#endif
|
2021-10-12 18:00:53 +05:30
|
|
|
|
2021-03-22 16:37:48 +00:00
|
|
|
static inline bool bpf_is_seen_register(struct codegen_context *ctx, int i)
|
|
|
|
{
|
|
|
|
return ctx->seen & (1 << (31 - i));
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void bpf_set_seen_register(struct codegen_context *ctx, int i)
|
|
|
|
{
|
|
|
|
ctx->seen |= 1 << (31 - i);
|
|
|
|
}
|
|
|
|
|
powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC32
When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to
avoid having to save them/restore on the stack.
Before this patch, the test #359 ADD default X is:
0: 7c 64 1b 78 mr r4,r3
4: 38 60 00 00 li r3,0
8: 94 21 ff b0 stwu r1,-80(r1)
c: 60 00 00 00 nop
10: 92 e1 00 2c stw r23,44(r1)
14: 93 01 00 30 stw r24,48(r1)
18: 93 21 00 34 stw r25,52(r1)
1c: 93 41 00 38 stw r26,56(r1)
20: 39 80 00 00 li r12,0
24: 39 60 00 00 li r11,0
28: 3b 40 00 00 li r26,0
2c: 3b 20 00 00 li r25,0
30: 7c 98 23 78 mr r24,r4
34: 7c 77 1b 78 mr r23,r3
38: 39 80 00 42 li r12,66
3c: 39 60 00 00 li r11,0
40: 7d 8c d2 14 add r12,r12,r26
44: 39 60 00 00 li r11,0
48: 7d 83 63 78 mr r3,r12
4c: 82 e1 00 2c lwz r23,44(r1)
50: 83 01 00 30 lwz r24,48(r1)
54: 83 21 00 34 lwz r25,52(r1)
58: 83 41 00 38 lwz r26,56(r1)
5c: 38 21 00 50 addi r1,r1,80
60: 4e 80 00 20 blr
After this patch, the same test has become:
0: 7c 64 1b 78 mr r4,r3
4: 38 60 00 00 li r3,0
8: 94 21 ff b0 stwu r1,-80(r1)
c: 60 00 00 00 nop
10: 39 80 00 00 li r12,0
14: 39 60 00 00 li r11,0
18: 39 00 00 00 li r8,0
1c: 38 e0 00 00 li r7,0
20: 7c 86 23 78 mr r6,r4
24: 7c 65 1b 78 mr r5,r3
28: 39 80 00 42 li r12,66
2c: 39 60 00 00 li r11,0
30: 7d 8c 42 14 add r12,r12,r8
34: 39 60 00 00 li r11,0
38: 7d 83 63 78 mr r3,r12
3c: 38 21 00 50 addi r1,r1,80
40: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b94562d7d2bb21aec89de0c40bb3cd91054b65a2.1616430991.git.christophe.leroy@csgroup.eu
2021-03-22 16:37:53 +00:00
|
|
|
static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i)
|
|
|
|
{
|
|
|
|
ctx->seen &= ~(1 << (31 - i));
|
|
|
|
}
|
|
|
|
|
2022-02-14 16:11:51 +05:30
|
|
|
void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
|
2023-10-20 19:43:58 +05:30
|
|
|
int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func);
|
|
|
|
int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
|
2023-02-01 11:04:27 +01:00
|
|
|
u32 *addrs, int pass, bool extra_pass);
|
2021-03-22 16:37:49 +00:00
|
|
|
void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
|
|
|
|
void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
|
powerpc64/bpf: Add support for bpf trampolines
Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc. While the code is generic, BPF trampolines are only
enabled on 64-bit powerpc. 32-bit powerpc will need testing and some
updates.
BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.
BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.
When attaching a bpf trampoline to a bpf prog, we can patch up to three
things:
- the nop at bpf prog entry to go to the out-of-line stub
- the instruction in the out-of-line stub to either call the bpf trampoline
directly, or to branch to the long_branch stub.
- the trampoline address before the long_branch stub.
We do not need any synchronization here since we always have a valid
branch target regardless of the order in which the above stores are
seen. dummy_tramp ensures that the long_branch stub goes to a valid
destination on other cpus, even when the branch to the long_branch stub
is seen before the updated trampoline address.
However, when detaching a bpf trampoline from a bpf prog, or if changing
the bpf trampoline address, we need synchronization to ensure that other
cpus can no longer branch into the older trampoline so that it can be
safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
make forward progress, but we still need to ensure that other cpus
execute isync (or some CSI) so that they don't go back into the
trampoline again. While here, update the stale comment that describes
the redzone usage in ppc64 BPF JIT.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-18-hbathini@linux.ibm.com
2024-10-30 12:38:50 +05:30
|
|
|
void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
|
powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC32
When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to
avoid having to save them/restore on the stack.
Before this patch, the test #359 ADD default X is:
0: 7c 64 1b 78 mr r4,r3
4: 38 60 00 00 li r3,0
8: 94 21 ff b0 stwu r1,-80(r1)
c: 60 00 00 00 nop
10: 92 e1 00 2c stw r23,44(r1)
14: 93 01 00 30 stw r24,48(r1)
18: 93 21 00 34 stw r25,52(r1)
1c: 93 41 00 38 stw r26,56(r1)
20: 39 80 00 00 li r12,0
24: 39 60 00 00 li r11,0
28: 3b 40 00 00 li r26,0
2c: 3b 20 00 00 li r25,0
30: 7c 98 23 78 mr r24,r4
34: 7c 77 1b 78 mr r23,r3
38: 39 80 00 42 li r12,66
3c: 39 60 00 00 li r11,0
40: 7d 8c d2 14 add r12,r12,r26
44: 39 60 00 00 li r11,0
48: 7d 83 63 78 mr r3,r12
4c: 82 e1 00 2c lwz r23,44(r1)
50: 83 01 00 30 lwz r24,48(r1)
54: 83 21 00 34 lwz r25,52(r1)
58: 83 41 00 38 lwz r26,56(r1)
5c: 38 21 00 50 addi r1,r1,80
60: 4e 80 00 20 blr
After this patch, the same test has become:
0: 7c 64 1b 78 mr r4,r3
4: 38 60 00 00 li r3,0
8: 94 21 ff b0 stwu r1,-80(r1)
c: 60 00 00 00 nop
10: 39 80 00 00 li r12,0
14: 39 60 00 00 li r11,0
18: 39 00 00 00 li r8,0
1c: 38 e0 00 00 li r7,0
20: 7c 86 23 78 mr r6,r4
24: 7c 65 1b 78 mr r5,r3
28: 39 80 00 42 li r12,66
2c: 39 60 00 00 li r11,0
30: 7d 8c 42 14 add r12,r12,r8
34: 39 60 00 00 li r11,0
38: 7d 83 63 78 mr r3,r12
3c: 38 21 00 50 addi r1,r1,80
40: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b94562d7d2bb21aec89de0c40bb3cd91054b65a2.1616430991.git.christophe.leroy@csgroup.eu
2021-03-22 16:37:53 +00:00
|
|
|
void bpf_jit_realloc_regs(struct codegen_context *ctx);
|
2022-02-14 16:11:37 +05:30
|
|
|
int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
|
2021-03-22 16:37:49 +00:00
|
|
|
|
2023-10-20 19:43:58 +05:30
|
|
|
int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int pass,
|
|
|
|
struct codegen_context *ctx, int insn_idx,
|
|
|
|
int jmp_off, int dst_reg);
|
2021-10-12 18:00:53 +05:30
|
|
|
|
2011-07-20 15:51:00 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#endif
|