Commit graph

291 commits

Author SHA1 Message Date
Alexei Starovoitov
886178a33a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3
Cross-merge BPF, perf and other fixes after downstream PRs.
It restores BPF CI to green after critical fix
commit bc4394e5e7 ("perf: Fix the throttle error of some clock events")

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:49:39 -07:00
Willem de Bruijn
d4adf1c9ee bpf: Adjust free target to avoid global starvation of LRU map
BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the
map is full, due to percpu reservations and force shrink before
neighbor stealing. Once a CPU is unable to borrow from the global map,
it will once steal one elem from a neighbor and after that each time
flush this one element to the global list and immediately recycle it.

Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map
with 79 CPUs. CPU 79 will observe this behavior even while its
neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%).

CPUs need not be active concurrently. The issue can appear with
affinity migration, e.g., irqbalance. Each CPU can reserve and then
hold onto its 128 elements indefinitely.

Avoid global list exhaustion by limiting aggregate percpu caches to
half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count.
This change has no effect on sufficiently large tables.

Similar to LOCAL_NR_SCANS and lru->nr_scans, introduce a map variable
lru->free_target. The extra field fits in a hole in struct bpf_lru.
The cacheline is already warm where read in the hot path. The field is
only accessed with the lru lock held.

Tested-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20250618215803.3587312-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-18 18:50:14 -07:00
Yonghong Song
50034d9362 docs/bpf: Default cpu version changed from v1 to v3 in llvm 20
The default cpu version is changed from v1 to v3 in llvm version 20.
See [1] for more detailed reasoning. Update bpf_devel_QA.rst so
developers can find such information easily.

  [1] https://github.com/llvm/llvm-project/pull/107008

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612043049.2411989-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:44 -07:00
Eslam Khafagy
c9b03a1100 bpf, doc: Improve wording of docs
The phrase "dividing -1" is one I find confusing.  E.g.,
"INT_MIN dividing -1" sounds like "-1 / INT_MIN" rather than the inverse.
"divided by" instead of "dividing" assuming the inverse is meant.

Signed-off-by: Eslam Khafagy <eslam.medhat1993@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607222434.227890-1-eslam.medhat1993@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-11 12:30:41 -07:00
Eslam Khafagy
e41079f53e Documentation: Fix spelling mistake.
Fix typo "desination => destination"
in file
Documentation/bpf/standardization/instruction-set.rst

Signed-off-by: Eslam Khafagy <eslam.medhat1993@gmail.com>
Acked-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20250606100511.368450-1-eslam.medhat1993@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:22:36 -07:00
Kumar Kartikeya Dwivedi
bc049387b4 bpf: Add support for __prog argument suffix to pass in prog->aux
Instead of hardcoding the list of kfuncs that need prog->aux passed to
them with a combination of fixup_kfunc_call adjustment + __ign suffix,
combine both in __prog suffix, which ignores the argument passed in, and
fixes it up to the prog->aux. This allows kfuncs to have the prog->aux
passed into them without having to touch the verifier.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250513142812.1021591-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-13 18:47:54 -07:00
Khaled Elnaggar
79af71c5fe docs: bpf: Fix bullet point formatting warning
Fix indentation for a bullet list item in bpf_iterators.rst.
According to reStructuredText rules, bullet list item bodies must be
consistently indented relative to the bullet. The indentation of the
first line after the bullet determines the alignment for the rest of
the item body.

Reported by smatch:
  /linux/Documentation/bpf/bpf_iterators.rst:55: WARNING: Bullet list ends without a blank line; unexpected unindent. [docutils]

Fixes: 7220eabff8 ("bpf, docs: document open-coded BPF iterators")
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250513015901.475207-1-khaledelnaggarlinux@gmail.com
2025-05-13 08:53:02 -07:00
Andrii Nakryiko
7220eabff8 bpf, docs: document open-coded BPF iterators
Extract BPF open-coded iterators documentation spread out across a few
original commit messages ([0], [1]) into a dedicated doc section under
Documentation/bpf/bpf_iterators.rst. Also make explicit expectation that
BPF iterator program type should be accompanied by a corresponding
open-coded BPF iterator implementation, going forward.

  [0] https://lore.kernel.org/all/20230308184121.1165081-3-andrii@kernel.org/
  [1] https://lore.kernel.org/all/20230308184121.1165081-4-andrii@kernel.org/

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250509180350.2604946-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09 22:49:15 -07:00
Alexei Starovoitov
224ee86639 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc4
Cross-merge bpf and other fixes after downstream PRs.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-28 08:40:45 -07:00
Alexei Starovoitov
f88886de09 bpf: Add namespace to BPF internal symbols
Add namespace to BPF internal symbols used by light skeleton
to prevent abuse and document with the code their allowed usage.

Fixes: b1d18a7574 ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20250425014542.62385-1-alexei.starovoitov@gmail.com
2025-04-25 09:21:23 -07:00
WangYuli
4cc2048214 bpf, docs: Fix non-standard line break
Even though the kernel's coding-style document does not explicitly
state this, we generally put a newline after the semicolon of every
C language statement to enhance code readability.

Adjust the placement of newlines to adhere to this convention.

Reported-by: Chen Linxuan <chenlinxuan@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://lore.kernel.org/r/DB66473733449DB0+20250423030632.17626-1-wangyuli@uniontech.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-24 07:46:04 -07:00
T.J. Mercier
dc438a9bc7 bpf, docs: Fix broken link to renamed bpf_iter_task_vmas.c
This file was renamed from bpf_iter_task_vma.c.

Fixes: 45b38941c8 ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250304204520.201115-1-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:56 -07:00
Yonghong Song
b123480eec docs/bpf: Document some special sdiv/smod operations
Patch [1] fixed possible kernel crash due to specific sdiv/smod operations
in bpf program. The following are related operations and the expected results
of those operations:
  - LLONG_MIN/-1 = LLONG_MIN
  - INT_MIN/-1 = INT_MIN
  - LLONG_MIN%-1 = 0
  - INT_MIN%-1 = 0

Those operations are replaced with codes which won't cause
kernel crash. This patch documents what operations may cause exception and
what replacement operations are.

  [1] https://lore.kernel.org/all/20240913150326.1187788-1-yonghong.song@linux.dev/

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241107170924.2944681-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-26 07:54:49 -08:00
Ihor Solodrai
ea70faa1f2 docs/bpf: Document the semantics of BTF tags with kind_flag
Explain the meaning of kind_flag in BTF type_tags and decl_tags.
Update uapi btf.h kind_flag comment to reflect the changes.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250130201239.1429648-3-ihor.solodrai@linux.dev
2025-02-05 16:17:59 -08:00
Abhinav Saxena
5249b164e6 bpf: Remove trailing whitespace in verifier.rst
Remove trailing whitespace in Documentation/bpf/verifier.rst.

Signed-off-by: Abhinav Saxena <xandfury@gmail.com>
Link: https://lore.kernel.org/r/20241107063708.106340-2-xandfury@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11 08:17:48 -08:00
Alan Maguire
8a0cfd8adf docs/bpf: Add description of .BTF.base section
Now that .BTF.base sections are generated for out-of-tree kernel
modules (provided pahole supports the "distilled_base" BTF feature),
document .BTF.base and its role in supporting resilient split BTF
and BTF relocation.

Changes since v1:

- updated formatting, corrected typo, used BTF ID[s] consistently
  (Andrii)

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241028091543.2175967-1-alan.maguire@oracle.com
2024-10-29 13:15:36 -07:00
Donald Hunter
6182e0b80f docs/bpf: Add missing BPF program types to docs
Update the table of program types in the libbpf documentation with the
recently added program types.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240912095944.6386-1-donald.hunter@gmail.com
2024-09-12 10:56:41 -07:00
Will Hawkins
c229c17a76 docs/bpf: Add constant values for linkages
Make the values of the symbolic constants that define the valid linkages
for functions and variables explicit.

Signed-off-by: Will Hawkins <hawkinsw@obs.cr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240911055033.2084881-1-hawkinsw@obs.cr
2024-09-11 13:47:04 -07:00
Yiming Xiang
89dd9bb255 docs/bpf: Fix a typo in verifier.rst
In verifier.rst, there is a typo in section 'Register parentage chains'.
Caller saved registers are r0-r5, callee saved registers are r6-r9.

Here by context it means callee saved registers rather than caller saved
registers. This may confuse users.

Signed-off-by: Yiming Xiang <kxiang@umich.edu>
Link: https://lore.kernel.org/r/20240829031712.198489-1-kxiang@umich.edu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-29 12:19:30 -07:00
Dave Thaler
04efaebd72 bpf, docs: Address comments from IETF Area Directors
This patch does the following to address IETF feedback:

* Remove mention of "program type" and reference future
  docs (and mention platform-specific docs exist) for
  helper functions and BTF. Addresses Roman Danyliw's
  comments based on GENART review from Ines Robles [0].

* Add reference for endianness as requested by John
  Scudder [1].

* Added bit numbers to top of 32-bit wide format diagrams
  as requested by Paul Wouters [2].

* Added more text about why BPF doesn't stand for anything, based
  on text from ebpf.io [3], as requested by Eric Vyncke and
  Gunter Van de Velde [4].

* Replaced "htobe16" (and similar) and the direction-specific
  description with just "be16" (and similar) and a direction-agnostic
  description, to match the direction-agnostic description in
  the Byteswap Instructions section. Based on feedback from Eric
  Vyncke [5].

[0] https://mailarchive.ietf.org/arch/msg/bpf/DvDgDWOiwk05OyNlWlAmELZFPlM/

[1] https://mailarchive.ietf.org/arch/msg/bpf/eKNXpU4jCLjsbZDSw8LjI29M3tM/

[2] https://mailarchive.ietf.org/arch/msg/bpf/hGk8HkYxeZTpdu9qW_MvbGKj7WU/

[3] https://ebpf.io/what-is-ebpf/#what-do-ebpf-and-bpf-stand-for

[4] https://mailarchive.ietf.org/arch/msg/bpf/i93lzdN3ewnzzS_JMbinCIYxAIU/

[5] https://mailarchive.ietf.org/arch/msg/bpf/KBWXbMeDcSrq4vsKR_KkBbV6hI4/

Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Dave Thaler <dthaler1968@googlemail.com>
Link: https://lore.kernel.org/r/20240623150453.10613-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-23 09:10:26 -07:00
Mykyta Yatsenko
eb4e772627 libbpf: Configure log verbosity with env variable
Configure logging verbosity by setting LIBBPF_LOG_LEVEL environment
variable, which is applied only to default logger. Once user set their
custom logging callback, it is up to them to handle filtering.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240524131840.114289-1-yatsenko@meta.com
2024-05-28 16:25:06 -07:00
Dave Thaler
e245ef8a0b bpf, docs: Fix instruction.rst indentation
The table captions patch corrected indented most tables to work with
the table directive for adding a caption but missed two of them.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240526061815.22497-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-26 09:58:42 -07:00
Dave Thaler
f980f13e4e bpf, docs: Clarify call local offset
In the Jump instructions section it explains that the offset is
"relative to the instruction following the jump instruction".
But the program-local section confusingly said "referenced by
offset from the call instruction, similar to JA".

This patch updates that sentence with consistent wording, saying
it's relative to the instruction following the call instruction.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240525153332.21355-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:41:57 -07:00
Dave Thaler
6a6d8b6f00 bpf, docs: Add table captions
As suggested by Ines Robles in his IETF GENART review at
https://datatracker.ietf.org/doc/review-ietf-bpf-isa-02-genart-lc-robles-2024-05-16/

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Link: https://lore.kernel.org/r/20240524164618.18894-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:39:31 -07:00
Dave Thaler
4e1215d9a1 bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
imm is defined as a 32-bit signed integer.

{MOV, K, ALU64} says it does "dst = src" (where src is 'imm') and it
does do dst = (s64)imm, which in that sense does sign extend imm. The MOVSX
instruction is explained as sign extending, so added the example of
{MOV, K, ALU64} to make this more clear.

{JLE, K, JMP} says it does "PC += offset if dst <= src" (where src is 'imm',
and the comparison is unsigned). This was apparently ambiguous to some
readers as to whether the comparison was "dst <= (u64)(u32)imm" or
"dst <= (u64)(s64)imm" so added an example to make this more clear.

v1 -> v2: Address comments from Yonghong

Signed-off-by: Dave Thaler <dthaler1968@googlemail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240520215255.10595-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:39:03 -07:00
Dave Thaler
a985fdca5e bpf, docs: Use RFC 2119 language for ISA requirements
Per IETF convention and discussion at LSF/MM/BPF, use MUST etc.
keywords as requested by IETF Area Director review.  Also as
requested, indicate that documenting BTF is out of scope of this
document and will be covered by a separate IETF specification.

Added paragraph about the terminology that is required IETF boilerplate
and must be worded exactly as such.

Signed-off-by: Dave Thaler <dthaler1968@googlemail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240517165855.4688-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:38:35 -07:00
Dave Thaler
4652072e7b bpf, docs: Move sentence about returning R0 to abi.rst
As discussed at LSF/MM/BPF, the sentence about using R0 for returning
values from calls is part of the calling convention and belongs in
abi.rst.  Any further additions or clarifications to this text are left
for future patches on abi.rst.  The current patch is simply to unblock
progression of instruction-set.rst to a standard.

In contrast, the restriction of register numbers to the range 0-10
is untouched, left in the instruction-set.rst definition of the
src_reg and dst_reg fields.

Signed-off-by: Dave Thaler <dthaler1968@googlemail.com>
Link: https://lore.kernel.org/r/20240517153445.3914-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:37:49 -07:00
Puranjay Mohan
7a8030057f bpf, docs: Fix the description of 'src' in ALU instructions
An ALU instruction's source operand can be the value in the source
register or the 32-bit immediate value encoded in the instruction. This
is controlled by the 's' bit of the 'opcode'.

The current description explicitly uses the phrase 'value of the source
register' when defining the meaning of 'src'.

Change the description to use 'source operand' in place of 'value of the
source register'.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Dave Thaler <dthaler1968@gmail.com>
Link: https://lore.kernel.org/r/20240514130303.113607-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-15 09:34:54 -07:00
Dave Thaler
07801a24e2 bpf, docs: Clarify PC use in instruction-set.rst
This patch elaborates on the use of PC by expanding the PC acronym,
explaining the units, and the relative position to which the offset
applies.

Signed-off-by: Dave Thaler <dthaler1968@googlemail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240426231126.5130-1-dthaler1968@gmail.com
2024-04-29 11:54:42 +02:00
Dave Thaler
e51b907d40 bpf, docs: Add introduction for use in the ISA Internet Draft
The proposed intro paragraph text is derived from the first paragraph
of the IETF BPF WG charter at https://datatracker.ietf.org/wg/bpf/about/

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240422190942.24658-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-25 19:15:21 -07:00
Dave Thaler
735f5b8a7c bpf, docs: Fix formatting nit in instruction-set.rst
Other places that had pseudocode were prefixed with ::
so as to appear in a literal block, but one place was inconsistent.
This patch fixes that inconsistency.

Signed-off-by: Dave Thaler <dthaler1968@googlemail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240419213826.7301-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-21 10:09:12 -07:00
Dave Thaler
db50040d09 bpf, docs: Clarify helper ID and pointer terms in instruction-set.rst
Per IETF 119 meeting discussion and mailing list discussion at
https://mailarchive.ietf.org/arch/msg/bpf/2JwWQwFdOeMGv0VTbD0CKWwAOEA/
the following changes are made.

First, say call by "static ID" rather than call by "address"

Second, change "pointer" to "address"

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240419203617.6850-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-21 10:08:56 -07:00
Dave Thaler
00d5d22a5b bpf, docs: Editorial nits in instruction-set.rst
This patch addresses a number of editorial nits including
spelling, punctuation, grammar, and wording consistency issues
in instruction-set.rst.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240405155245.3618-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-05 10:42:30 -07:00
Dave Thaler
0ef05e258b bpf, docs: Rename legacy conformance group to packet
There could be other legacy conformance groups in the future,
so use a more descriptive name.  The status of the conformance
group in the IANA registry is what designates it as legacy,
not the name of the group.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Link: https://lore.kernel.org/r/20240302012229.16452-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2024-03-04 14:31:06 +01:00
Dave Thaler
4e73e1bc1a bpf, docs: Use IETF format for field definitions in instruction-set.rst
In preparation for publication as an IETF RFC, the WG chairs asked me
to convert the document to use IETF packet format for field layout, so
this patch attempts to make it consistent with other IETF documents.

Some fields that are not byte aligned were previously inconsistent
in how values were defined.  Some were defined as the value of the
byte containing the field (like 0x20 for a field holding the high
four bits of the byte), and others were defined as the value of the
field itself (like 0x2).  This PR makes them be consistent in using
just the values of the field itself, which is IETF convention.

As a result, some of the defines that used BPF_* would no longer
match the value in the spec, and so this patch also drops the BPF_*
prefix to avoid confusion with the defines that are the full-byte
equivalent values.  For consistency, BPF_* is then dropped from
other fields too.  BPF_<foo> is thus the Linux implementation-specific
define for <foo> as it appears in the BPF ISA specification.

The syntax BPF_ADD | BPF_X | BPF_ALU only worked for full-byte
values so the convention {ADD, X, ALU} is proposed for referring
to field values instead.

Also replace the redundant "LSB bits" with "least significant bits".

A preview of what the resulting Internet Draft would look like can
be seen at:
https://htmlpreview.github.io/?https://raw.githubusercontent.com/dthaler/ebp
f-docs-1/format/draft-ietf-bpf-isa.html

v1->v2: Fix sphinx issue as recommended by David Vernet

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240301222337.15931-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-02 21:40:14 -08:00
Kees Cook
896880ff30 bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Replace deprecated 0-length array in struct bpf_lpm_trie_key with
flexible array. Found with GCC 13:

../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
  207 |                                        *(__be16 *)&key->data[i]);
      |                                                   ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
  102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
      |                                                      ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
   97 | #define be16_to_cpu __be16_to_cpu
      |                     ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
  206 |                 u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
      |                            ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
   82 |         __u8    data[0];        /* Arbitrary size */
      |                 ^~~~

And found at run-time under CONFIG_FORTIFY_SOURCE:

  UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49
  index 0 is out of range for type '__u8 [*]'

Changing struct bpf_lpm_trie_key is difficult since has been used by
userspace. For example, in Cilium:

	struct egress_gw_policy_key {
	        struct bpf_lpm_trie_key lpm_key;
	        __u32 saddr;
	        __u32 daddr;
	};

While direct references to the "data" member haven't been found, there
are static initializers what include the final member. For example,
the "{}" here:

        struct egress_gw_policy_key in_key = {
                .lpm_key = { 32 + 24, {} },
                .saddr   = CLIENT_IP,
                .daddr   = EXTERNAL_SVC_IP & 0Xffffff,
        };

To avoid the build time and run time warnings seen with a 0-sized
trailing array for struct bpf_lpm_trie_key, introduce a new struct
that correctly uses a flexible array for the trailing bytes,
struct bpf_lpm_trie_key_u8. As part of this, include the "header"
portion (which is just the "prefixlen" member), so it can be used
by anything building a bpf_lpr_trie_key that has trailing members that
aren't a u8 flexible array (like the self-test[1]), which is named
struct bpf_lpm_trie_key_hdr.

Unfortunately, C++ refuses to parse the __struct_group() helper, so
it is not possible to define struct bpf_lpm_trie_key_hdr directly in
struct bpf_lpm_trie_key_u8, so we must open-code the union directly.

Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out,
and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment
to the UAPI header directing folks to the two new options.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Closes: https://paste.debian.net/hidden/ca500597/
Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1]
Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
2024-02-29 22:52:43 +01:00
Dave Thaler
89ee838130 bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
Specifying which fields were unused allows IANA to only list as deprecated
instructions that were actually used, leaving the rest as unassigned and
possibly available for future use for something else.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240221175419.16843-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-22 09:11:49 -08:00
Dave Thaler
c1bb68f6b2 bpf, docs: Fix typos in instruction-set.rst
* "BPF ADD" should be "BPF_ADD".
* "src" should be "src_reg" in several places.  The latter is the field name
  in the instruction.  The former refers to the value of the register, or the
  immediate.
* Add '' around field names in one sentence, for consistency with the rest
  of the document.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240221173535.16601-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-22 09:07:37 -08:00
Dave Thaler
dc8543b597 bpf, docs: Update ISA document title
* Use "Instruction Set Architecture (ISA)" instead of "Instruction Set
  Specification"
* Remove version number

As previously discussed on the mailing list at
https://mailarchive.ietf.org/arch/msg/bpf/SEpn3OL9TabNRn-4rDX9A6XVbjM/

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240208221449.12274-1-dthaler1968@gmail.com
2024-02-13 23:14:15 +01:00
Dave Thaler
563918a0e3 bpf, docs: Fix typos in instructions-set.rst
* "imm32" should just be "imm"
* Add blank line to fix formatting error reported by Stephen Rothwell [0]

[0]: https://lore.kernel.org/bpf/20240206153301.4ead0bad@canb.auug.org.au/T/#u

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240206045146.4965-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06 07:44:59 -08:00
Dave Thaler
2d9a925d0f bpf, docs: Expand set of initial conformance groups
This patch attempts to update the ISA specification according
to the latest mailing list discussion about conformance groups,
in a way that is intended to be consistent with IANA registry
processes and IETF 118 WG meeting discussion.

It does the following:
* Split basic into base32 and base64 for 32-bit vs 64-bit base
  instructions
* Split division/multiplication/modulo instructions out of base groups
* Split atomic instructions out of base groups

There may be additional changes as discussion continues,
but there seems to be consensus on the principles above.

v1->v2: fixed typo pointed out by David Vernet

v2->v3: Moved multiplication to same groups as division/modulo

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240202221110.3872-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 14:44:45 -08:00
Dave Thaler
088a464ed5 bpf, docs: Clarify which legacy packet instructions existed
As discussed on the BPF IETF mailing list (see link), this patch updates
the "Legacy BPF Packet access instructions" section to clarify which
instructions are deprecated (vs which were never defined and so are not
deprecated).

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Link: https://mailarchive.ietf.org/arch/msg/bpf/5LnnKm093cGpOmDI9TnLQLBXyys
Link: https://lore.kernel.org/bpf/20240131033759.3634-1-dthaler1968@gmail.com
2024-02-01 11:32:13 +01:00
Daniel Xu
6f3189f38a bpf: treewide: Annotate BPF kfuncs in BTF
This commit marks kfuncs as such inside the .BTF_ids section. The upshot
of these annotations is that we'll be able to automatically generate
kfunc prototypes for downstream users. The process is as follows:

1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs
2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for
   each function inside BTF_KFUNCS sets
3. At runtime, vmlinux or module BTF is made available in sysfs
4. At runtime, bpftool (or similar) can look at provided BTF and
   generate appropriate prototypes for functions with "bpf_kfunc" tag

To ensure future kfunc are similarly tagged, we now also return error
inside kfunc registration for untagged kfuncs. For vmlinux kfuncs,
we also WARN(), as initcall machinery does not handle errors.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-31 20:40:56 -08:00
Yonghong Song
ced33f2cfa docs/bpf: Improve documentation of 64-bit immediate instructions
For 64-bit immediate instruction, 'BPF_IMM | BPF_DW | BPF_LD' and
src_reg=[0-6], the current documentation describes the 64-bit
immediate is constructed by:

  imm64 = (next_imm << 32) | imm

But actually imm64 is only used when src_reg=0. For all other
variants (src_reg != 0), 'imm' and 'next_imm' have separate special
encoding requirement and imm64 cannot be easily used to describe
instruction semantics.

This patch clarifies that 64-bit immediate instructions use
two 32-bit immediate values instead of a 64-bit immediate value,
so later describing individual 64-bit immediate instructions
becomes less confusing.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Thaler <dthaler1968@gmail.com>
Link: https://lore.kernel.org/bpf/20240127194629.737589-1-yonghong.song@linux.dev
2024-01-29 16:54:33 +01:00
Dave Thaler
e48f0f4a9b bpf, docs: Clarify definitions of various instructions
Clarify definitions of several instructions:

* BPF_NEG does not support BPF_X
* BPF_CALL does not support BPF_JMP32 or BPF_X
* BPF_EXIT does not support BPF_X
* BPF_JA does not support BPF_X (was implied but not explicitly stated)

Also fix a typo in the wide instruction figure where the field is
actually named "opcode" not "code".

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240126040050.8464-1-dthaler1968@gmail.com
2024-01-26 19:05:38 +01:00
Dave Thaler
20e109ea98 bpf, docs: Clarify that MOVSX is only for BPF_X not BPF_K
Per discussion on the mailing list at
https://mailarchive.ietf.org/arch/msg/bpf/uQiqhURdtxV_ZQOTgjCdm-seh74/
the MOVSX operation is only defined to support register extension.

The document didn't previously state this and incorrectly implied
that one could use an immediate value.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240118232954.27206-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 15:10:08 -08:00
Yonghong Song
88031b929c docs/bpf: Fix an incorrect statement in verifier.rst
In verifier.rst, I found an incorrect statement (maybe a typo) in section
'Liveness marks tracking'. Basically, the wrong register is attributed
to have a read mark. This may confuse the user.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240111052136.3440417-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:40:23 -08:00
Dave Thaler
81777efbf5 Introduce concept of conformance groups
The discussion of what the actual conformance groups should be
is still in progress, so this is just part 1 which only uses
"legacy" for deprecated instructions and "basic" for everything
else.  Subsequent patches will add more groups as discussion
continues.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240108214231.5280-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:40:22 -08:00
Linus Torvalds
5b9b41617b Another moderately busy cycle for documentation, including:
- The minimum Sphinx requirement has been raised to 2.4.4, following a
   warning that was added in 6.2.
 
 - Some reworking of the Documentation/process front page to, hopefully,
   make it more useful.
 
 - Various kernel-doc tweaks to, for example, make it deal properly with
   __counted_by annotations.
 
 - We have also restored a warning for documentation of nonexistent
   structure members that disappeared a while back.  That had the delightful
   consequence of adding some 600 warnings to the docs build.  A sustained
   effort by Randy, Vegard, and myself has addressed almost all of those,
   bringing the documentation back into sync with the code.  The fixes are
   going through the appropriate maintainer trees.
 
 - Various improvements to the HTML rendered docs, including automatic links
   to Git revisions and a nice new pulldown to make translations easy to
   access.
 
 - Speaking of translations, more of those for Spanish and Chinese.
 
 ...plus the usual stream of documentation updates and typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmWcRKMPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YTKIH/AxBt/3iWt40dPf18arZHLU6tdUbmg01ttef
 CNKWkniCmABGKc//KYDXvjZMRDt0YlrS0KgUzrb8nIQTBlZG40D+88EwjXE0HeGP
 xt1Fk7OPOiJEqBZ3HEe0PDVfOiA+4yR6CmDKklCJuKg77X9atklneBwPUw/cOASk
 CWj+BdbwPBiSNQv48Lp87rGusKwnH/g0MN2uS0z9MPr1DYjM1K8+ngZjGW24lZHt
 qs5yhP43mlZGBF/lwNJXQp/xhnKAqJ9XwylBX9Wmaoxaz9yyzNVsADGvROMudgzi
 9YB+Jdy7Z0JSrVoLIRhUuDOv7aW8vk+8qLmGJt2aTIsqehbQ6pk=
 =fCtT
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.8' of git://git.lwn.net/linux

Pull documentation update from Jonathan Corbet:
 "Another moderately busy cycle for documentation, including:

   - The minimum Sphinx requirement has been raised to 2.4.4, following
     a warning that was added in 6.2

   - Some reworking of the Documentation/process front page to,
     hopefully, make it more useful

   - Various kernel-doc tweaks to, for example, make it deal properly
     with __counted_by annotations

   - We have also restored a warning for documentation of nonexistent
     structure members that disappeared a while back. That had the
     delightful consequence of adding some 600 warnings to the docs
     build. A sustained effort by Randy, Vegard, and myself has
     addressed almost all of those, bringing the documentation back into
     sync with the code. The fixes are going through the appropriate
     maintainer trees

   - Various improvements to the HTML rendered docs, including automatic
     links to Git revisions and a nice new pulldown to make translations
     easy to access

   - Speaking of translations, more of those for Spanish and Chinese

  ... plus the usual stream of documentation updates and typo fixes"

* tag 'docs-6.8' of git://git.lwn.net/linux: (57 commits)
  MAINTAINERS: use tabs for indent of CONFIDENTIAL COMPUTING THREAT MODEL
  A reworked process/index.rst
  ring-buffer/Documentation: Add documentation on buffer_percent file
  Translated the RISC-V architecture boot documentation.
  Docs: remove mentions of fdformat from util-linux
  Docs/zh_CN: Fix the meaning of DEBUG to pr_debug()
  Documentation: move driver-api/dcdbas to userspace-api/
  Documentation: move driver-api/isapnp to userspace-api/
  Documentation/core-api : fix typo in workqueue
  Documentation/trace: Fixed typos in the ftrace FLAGS section
  kernel-doc: handle a void function without producing a warning
  scripts/get_abi.pl: ignore some temp files
  docs: kernel_abi.py: fix command injection
  scripts/get_abi: fix source path leak
  CREDITS, MAINTAINERS, docs/process/howto: Update man-pages' maintainer
  docs: translations: add translations links when they exist
  kernel-doc: Align quick help and the code
  MAINTAINERS: add reviewer for Spanish translations
  docs: ignore __counted_by attribute in structure definitions
  scripts: kernel-doc: Clarify missing struct member description
  ..
2024-01-11 19:46:52 -08:00
David Vernet
a6de18f310 bpf: Add bpf_cpumask_weight() kfunc
It can be useful to query how many bits are set in a cpumask. For
example, if you want to perform special logic for the last remaining
core that's set in a mask. Let's therefore add a new
bpf_cpumask_weight() kfunc which checks how many bits are set in a mask.

Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231207210843.168466-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 21:37:33 -08:00