License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2011-02-04 09:45:46 -02:00
|
|
|
#ifndef __PERF_ANNOTATE_H
|
|
|
|
#define __PERF_ANNOTATE_H
|
|
|
|
|
|
|
|
#include <stdbool.h>
|
2012-04-25 14:16:03 -03:00
|
|
|
#include <stdint.h>
|
2019-01-22 10:47:38 -02:00
|
|
|
#include <stdio.h>
|
2014-04-25 21:31:02 +02:00
|
|
|
#include <linux/types.h>
|
2011-02-04 09:45:46 -02:00
|
|
|
#include <linux/list.h>
|
|
|
|
#include <linux/rbtree.h>
|
2018-08-04 15:05:17 +02:00
|
|
|
#include <asm/bug.h>
|
2019-01-22 10:47:38 -02:00
|
|
|
#include "symbol_conf.h"
|
2022-08-26 09:42:36 -07:00
|
|
|
#include "mutex.h"
|
2019-09-25 09:14:46 +08:00
|
|
|
#include "spark.h"
|
2024-03-04 15:08:12 -08:00
|
|
|
#include "hashmap.h"
|
2024-03-29 14:58:10 -07:00
|
|
|
#include "disasm.h"
|
2024-08-13 09:02:03 -07:00
|
|
|
#include "branch.h"
|
2011-02-04 09:45:46 -02:00
|
|
|
|
2019-01-22 10:47:38 -02:00
|
|
|
struct hist_browser_timer;
|
|
|
|
struct hist_entry;
|
|
|
|
struct map;
|
|
|
|
struct map_symbol;
|
|
|
|
struct addr_map_symbol;
|
|
|
|
struct option;
|
|
|
|
struct perf_sample;
|
2019-07-21 13:23:51 +02:00
|
|
|
struct evsel;
|
2019-01-22 10:47:38 -02:00
|
|
|
struct symbol;
|
2023-12-12 16:13:13 -08:00
|
|
|
struct annotated_data_type;
|
2016-11-24 11:16:06 -03:00
|
|
|
|
2018-03-15 10:26:17 -03:00
|
|
|
#define ANNOTATION__IPC_WIDTH 6
|
|
|
|
#define ANNOTATION__CYCLES_WIDTH 6
|
perf annotate: Create hotkey 'c' to show min/max cycles
In the 'perf annotate' view, a new hotkey 'c' is created for showing the
min/max cycles.
For example, when press 'c', the annotate view is:
Percent│ IPC Cycle(min/max)
│
│
│ Disassembly of section .text:
│
│ 000000000003aab0 <random@@GLIBC_2.2.5>:
8.22 │3.92 sub $0x18,%rsp
│3.92 mov $0x1,%esi
│3.92 xor %eax,%eax
│3.92 cmpl $0x0,argp_program_version_hook@@G
│3.92 1(2/1) ↓ je 20
│ lock cmpxchg %esi,__abort_msg@@GLIBC_P
│ ↓ jne 29
│ ↓ jmp 43
│1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+
8.93 │1.10 1(5/1) ↓ je 43
When press 'c' again, the annotate view is switched back:
Percent│ IPC Cycle
│
│
│ Disassembly of section .text:
│
│ 000000000003aab0 <random@@GLIBC_2.2.5>:
8.22 │3.92 sub $0x18,%rsp
│3.92 mov $0x1,%esi
│3.92 xor %eax,%eax
│3.92 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x
│3.92 1 ↓ je 20
│ lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
│ ↓ jne 29
│ ↓ jmp 43
│1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
8.93 │1.10 1 ↓ je 43
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com
[ Rename all maxmin to minmax ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 22:58:38 +08:00
|
|
|
#define ANNOTATION__MINMAX_CYCLES_WIDTH 19
|
perf annotate: Compute average IPC and IPC coverage per symbol
Add support to 'perf report' annotate view or 'perf annotate --stdio2'
to aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.
For example:
$ perf annotate --stdio2
Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)
Disassembly of section .text:
000000000003aac0 <random@@GLIBC_2.2.5>:
8.32 3.28 sub $0x18,%rsp
3.28 mov $0x1,%esi
3.28 xor %eax,%eax
3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
11.57 3.28 1 ↓ je 20
lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne 29
↓ jmp 43
11.57 1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
0.00 1.10 1 ↓ je 43
29: lea __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
sub $0x80,%rsp
→ callq __lll_lock_wait_private
add $0x80,%rsp
0.00 3.00 43: lea __ctype_b@GLIBC_2.2.5+0x38,%rdi
3.00 lea 0xc(%rsp),%rsi
8.49 3.00 1 → callq __random_r
7.91 1.94 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
0.00 1.94 1 ↓ je 68
lock decl __abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne 70
↓ jmp 8a
0.00 2.00 68: decl __abort_msg@@GLIBC_PRIVATE+0x8a0
21.56 2.00 1 ↓ je 8a
70: lea __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
sub $0x80,%rsp
→ callq __lll_unlock_wake_private
add $0x80,%rsp
21.56 2.90 8a: movslq 0xc(%rsp),%rax
2.90 add $0x18,%rsp
9.03 2.90 1 ← retq
It shows for this symbol the average IPC is 2.30 and the IPC coverage is
54.8%.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-30 21:54:54 +08:00
|
|
|
#define ANNOTATION__AVG_IPC_WIDTH 36
|
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-13 09:02:06 -07:00
|
|
|
#define ANNOTATION__BR_CNTR_WIDTH 30
|
perf report: Support interactive annotation of code without symbols
For perf report on stripped binaries it is currently impossible to do
annotation. The annotation state is all tied to symbols, but there are
either no symbols, or symbols are not covering all the code.
We should support the annotation functionality even without symbols.
This patch fakes a symbol and the symbol name is the string of address.
After that, we just follow current annotation working flow.
For example,
1. perf report
Overhead Command Shared Object Symbol
20.67% div libc-2.27.so [.] __random_r
17.29% div libc-2.27.so [.] __random
10.59% div div [.] 0x0000000000000628
9.25% div div [.] 0x0000000000000612
6.11% div div [.] 0x0000000000000645
2. Select the line of "10.59% div div [.] 0x0000000000000628" and ENTER.
Annotate 0x0000000000000628
Zoom into div thread
Zoom into div DSO (use the 'k' hotkey to zoom directly into the kernel)
Browse map details
Run scripts for samples of symbol [0x0000000000000628]
Run scripts for all samples
Switch to another data file in PWD
Exit
3. Select the "Annotate 0x0000000000000628" and ENTER.
Percent│
│
│
│ Disassembly of section .text:
│
│ 0000000000000628 <.text+0x68>:
│ divsd %xmm4,%xmm0
│ divsd %xmm3,%xmm1
│ movsd (%rsp),%xmm2
│ addsd %xmm1,%xmm0
│ addsd %xmm2,%xmm0
│ movsd %xmm0,(%rsp)
Now we can see the dump of object starting from 0x628.
v5:
---
Remove the hotkey 'a' implementation from this patch. It
will be moved to a separate patch.
v4:
---
1. Support the hotkey 'a'. When we press 'a' on address,
now it supports the annotation.
2. Change the patch title from
"Support interactive annotation of code without symbols" to
"perf report: Support interactive annotation of code without symbols"
v3:
---
Keep just the ANNOTATION_DUMMY_LEN, and remove the
opts->annotate_dummy_len since it's the "maybe in future
we will provide" feature.
v2:
---
Fix a crash issue when annotating an address in "unknown" object.
The steps to reproduce this issue:
perf record -e cycles:u ls
perf report
75.29% ls ld-2.27.so [.] do_lookup_x
23.64% ls ld-2.27.so [.] __GI___tunables_init
1.04% ls [unknown] [k] 0xffffffff85c01210
0.03% ls ld-2.27.so [.] _start
When annotating 0xffffffff85c01210, the crash happens.
v2 adds checking for ms->map in add_annotate_opt(). If the object is
"unknown", ms->map is NULL.
Committer notes:
Renamed new_annotate_sym() to symbol__new_unresolved().
Use PRIx64 to fix this issue in some 32-bit arches:
ui/browsers/hists.c: In function 'symbol__new_unresolved':
ui/browsers/hists.c:2474:38: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
snprintf(name, sizeof(name), "%-#.*lx", BITS_PER_LONG / 4, addr);
~~~~~~^ ~~~~
%-#.*llx
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200227043939.4403-3-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 12:39:38 +08:00
|
|
|
#define ANNOTATION_DUMMY_LEN 256
|
2018-03-15 10:26:17 -03:00
|
|
|
|
perf disasm: Allow configuring what disassemblers to use
The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.
So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8
/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}
/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#
After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-11 12:17:34 -03:00
|
|
|
// llvm, capstone, objdump
|
|
|
|
#define MAX_DISASSEMBLERS 3
|
|
|
|
|
2018-03-15 10:03:34 -03:00
|
|
|
struct annotation_options {
|
|
|
|
bool hide_src_code,
|
|
|
|
use_offset,
|
|
|
|
jump_arrows,
|
2018-05-25 17:28:37 -03:00
|
|
|
print_lines,
|
|
|
|
full_path,
|
2018-03-15 10:03:34 -03:00
|
|
|
show_linenr,
|
2021-02-15 12:34:46 +01:00
|
|
|
show_fileloc,
|
2018-03-15 10:03:34 -03:00
|
|
|
show_nr_jumps,
|
2018-05-28 11:42:59 -03:00
|
|
|
show_minmax_cycle,
|
|
|
|
show_asm_raw,
|
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-13 09:02:06 -07:00
|
|
|
show_br_cntr,
|
2022-09-23 10:31:42 -07:00
|
|
|
annotate_src,
|
|
|
|
full_addr;
|
2018-04-11 10:30:03 -03:00
|
|
|
u8 offset_level;
|
perf disasm: Allow configuring what disassemblers to use
The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.
So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8
/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}
/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#
After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-11 12:17:34 -03:00
|
|
|
u8 nr_disassemblers;
|
2018-05-25 17:28:37 -03:00
|
|
|
int min_pcnt;
|
|
|
|
int max_lines;
|
|
|
|
int context;
|
2023-03-28 16:55:41 -07:00
|
|
|
char *objdump_path;
|
|
|
|
char *disassembler_style;
|
perf disasm: Allow configuring what disassemblers to use
The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.
So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8
/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}
/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#
After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-11 12:17:34 -03:00
|
|
|
const char *disassemblers_str;
|
|
|
|
const char *disassemblers[MAX_DISASSEMBLERS];
|
2020-01-07 13:04:44 -08:00
|
|
|
const char *prefix;
|
|
|
|
const char *prefix_strip;
|
2018-08-04 15:05:13 +02:00
|
|
|
unsigned int percent_type;
|
2018-03-15 10:03:34 -03:00
|
|
|
};
|
|
|
|
|
2023-11-28 09:54:34 -08:00
|
|
|
extern struct annotation_options annotate_opts;
|
|
|
|
|
2018-04-11 10:30:03 -03:00
|
|
|
enum {
|
|
|
|
ANNOTATION__OFFSET_JUMP_TARGETS = 1,
|
|
|
|
ANNOTATION__OFFSET_CALL,
|
|
|
|
ANNOTATION__MAX_OFFSET_LEVEL,
|
|
|
|
};
|
|
|
|
|
|
|
|
#define ANNOTATION__MIN_OFFSET_LEVEL ANNOTATION__OFFSET_JUMP_TARGETS
|
|
|
|
|
2013-03-05 14:53:30 +09:00
|
|
|
struct annotation;
|
|
|
|
|
2017-10-11 17:01:39 +02:00
|
|
|
struct sym_hist_entry {
|
|
|
|
u64 nr_samples;
|
|
|
|
u64 period;
|
|
|
|
};
|
|
|
|
|
2018-08-04 15:05:09 +02:00
|
|
|
enum {
|
|
|
|
PERCENT_HITS_LOCAL,
|
2018-08-04 15:05:10 +02:00
|
|
|
PERCENT_HITS_GLOBAL,
|
2018-08-04 15:05:11 +02:00
|
|
|
PERCENT_PERIOD_LOCAL,
|
2018-08-04 15:05:12 +02:00
|
|
|
PERCENT_PERIOD_GLOBAL,
|
2018-08-04 15:05:09 +02:00
|
|
|
PERCENT_MAX,
|
|
|
|
};
|
|
|
|
|
2017-10-11 17:01:39 +02:00
|
|
|
struct annotation_data {
|
2018-08-04 15:05:09 +02:00
|
|
|
double percent[PERCENT_MAX];
|
2017-10-11 17:01:41 +02:00
|
|
|
double percent_sum;
|
2017-10-11 17:01:39 +02:00
|
|
|
struct sym_hist_entry he;
|
|
|
|
};
|
|
|
|
|
2023-11-03 12:19:03 -07:00
|
|
|
struct cycles_info {
|
|
|
|
float ipc;
|
|
|
|
u64 avg;
|
|
|
|
u64 max;
|
|
|
|
u64 min;
|
|
|
|
};
|
|
|
|
|
2017-10-11 17:01:25 +02:00
|
|
|
struct annotation_line {
|
|
|
|
struct list_head node;
|
2017-10-11 17:01:36 +02:00
|
|
|
struct rb_node rb_node;
|
2017-10-11 17:01:26 +02:00
|
|
|
s64 offset;
|
|
|
|
char *line;
|
|
|
|
int line_nr;
|
2021-02-15 12:34:46 +01:00
|
|
|
char *fileloc;
|
2017-10-11 17:01:41 +02:00
|
|
|
char *path;
|
2023-11-03 12:19:03 -07:00
|
|
|
struct cycles_info *cycles;
|
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-13 09:02:06 -07:00
|
|
|
int num_aggr;
|
|
|
|
int br_cntr_nr;
|
|
|
|
u64 *br_cntr;
|
|
|
|
struct evsel *evsel;
|
2023-11-03 12:19:03 -07:00
|
|
|
int jump_sources;
|
2018-03-15 15:43:18 -03:00
|
|
|
u32 idx;
|
|
|
|
int idx_asm;
|
2018-08-04 15:05:05 +02:00
|
|
|
int data_nr;
|
2020-05-15 12:29:26 -05:00
|
|
|
struct annotation_data data[];
|
2017-10-11 17:01:25 +02:00
|
|
|
};
|
|
|
|
|
2012-04-15 15:24:39 -03:00
|
|
|
struct disasm_line {
|
2017-10-11 17:01:25 +02:00
|
|
|
struct ins ins;
|
|
|
|
struct ins_operands ops;
|
perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc
Currently, the perf tool infrastructure uses the disasm_line__parse
function to parse disassembled line.
Example snippet from objdump:
objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux>
c0000000010224b4: lwz r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset.
In powerpc, the approach for data type profiling uses raw instruction
instead of result from objdump to identify the instruction category and
extract the source/target registers.
Example: 38 01 81 e8 ld r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. Add function
"disasm_line__parse_powerpc" to handle parsing of raw instruction.
Also update "struct disasm_line" to save the binary code/
With the change, function captures:
line -> "38 01 81 e8 ld r4,312(r1)"
raw instruction "38 01 81 e8"
Raw instruction is used later to extract the reg/offset fields. Macros
are added to extract opcode and register fields. "struct disasm_line"
is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
code (raw).
Function "disasm_line__parse_powerpc fills the raw instruction hex value
and can use macros to get opcode. There is no changes in existing code
paths, which parses the disassembled code. The size of raw instruction
depends on architecture.
In case of powerpc, the parsing the disasm line needs to handle cases
for reading binary code directly from DSO as well as parsing the objdump
result. Hence adding the logic into separate function instead of
updating "disasm_line__parse". The architecture using the instruction
name and present approach is not altered. Since this approach targets
powerpc, the macro implementation is added for powerpc as of now.
Since the disasm_line__parse is used in other cases (perf annotate) and
not only data tye profiling, the powerpc callback includes changes to
work with binary code as well as mnemonic representation.
Also in case if the DSO read fails and libcapstone is not supported, the
approach fallback to use objdump as option. Hence as option, patch has
changes to ensure objdump option also works well.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/lkml/20240718084358.72242-5-atrajeev@linux.vnet.ibm.com
[ Add check for strndup() result ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-18 14:13:47 +05:30
|
|
|
union {
|
|
|
|
u8 bytes[4];
|
|
|
|
u32 raw_insn;
|
|
|
|
} raw;
|
2017-10-11 17:01:37 +02:00
|
|
|
/* This needs to be at the end. */
|
|
|
|
struct annotation_line al;
|
2011-02-04 09:45:46 -02:00
|
|
|
};
|
|
|
|
|
2024-03-29 14:58:10 -07:00
|
|
|
void annotation_line__add(struct annotation_line *al, struct list_head *head);
|
|
|
|
|
2018-08-04 15:05:09 +02:00
|
|
|
static inline double annotation_data__percent(struct annotation_data *data,
|
|
|
|
unsigned int which)
|
|
|
|
{
|
|
|
|
return which < PERCENT_MAX ? data->percent[which] : -1;
|
|
|
|
}
|
|
|
|
|
2018-08-04 15:05:17 +02:00
|
|
|
static inline const char *percent_type_str(unsigned int type)
|
|
|
|
{
|
|
|
|
static const char *str[PERCENT_MAX] = {
|
|
|
|
"local hits",
|
|
|
|
"global hits",
|
|
|
|
"local period",
|
|
|
|
"global period",
|
|
|
|
};
|
|
|
|
|
|
|
|
if (WARN_ON(type >= PERCENT_MAX))
|
|
|
|
return "N/A";
|
|
|
|
|
|
|
|
return str[type];
|
|
|
|
}
|
|
|
|
|
2017-10-11 17:01:37 +02:00
|
|
|
static inline struct disasm_line *disasm_line(struct annotation_line *al)
|
|
|
|
{
|
|
|
|
return al ? container_of(al, struct disasm_line, al) : NULL;
|
|
|
|
}
|
|
|
|
|
perf annotate: Add "_local" to jump/offset validation routines
Because they all really check if we can access data structures/visual
constructs where a "jump" instruction targets code in the same function,
i.e. things like:
__pthread_mutex_lock /usr/lib64/libpthread-2.26.so
1.95 │ mov __pthread_force_elision,%ecx
│ ┌──test %ecx,%ecx
0.07 │ ├──je 60
│ │ test $0x300,%esi
│ │↓ jne 60
│ │ or $0x100,%esi
│ │ mov %esi,0x10(%rdi)
│ 42:│ mov %esi,%edx
│ │ lea 0x16(%r8),%rsi
│ │ mov %r8,%rdi
│ │ and $0x80,%edx
│ │ add $0x8,%rsp
│ │→ jmpq __lll_lock_elision
│ │ nop
0.29 │ 60:└─→and $0x80,%esi
0.07 │ mov $0x1,%edi
0.29 │ xor %eax,%eax
2.53 │ lock cmpxchg %edi,(%r8)
And not things like that "jmpq __lll_lock_elision", that instead should behave
like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision".
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 10:12:33 -03:00
|
|
|
/*
|
|
|
|
* Is this offset in the same function as the line it is used?
|
|
|
|
* asm functions jump to other functions, for instance.
|
|
|
|
*/
|
|
|
|
static inline bool disasm_line__has_local_offset(const struct disasm_line *dl)
|
2012-04-25 14:16:03 -03:00
|
|
|
{
|
perf annotate: Add "_local" to jump/offset validation routines
Because they all really check if we can access data structures/visual
constructs where a "jump" instruction targets code in the same function,
i.e. things like:
__pthread_mutex_lock /usr/lib64/libpthread-2.26.so
1.95 │ mov __pthread_force_elision,%ecx
│ ┌──test %ecx,%ecx
0.07 │ ├──je 60
│ │ test $0x300,%esi
│ │↓ jne 60
│ │ or $0x100,%esi
│ │ mov %esi,0x10(%rdi)
│ 42:│ mov %esi,%edx
│ │ lea 0x16(%r8),%rsi
│ │ mov %r8,%rdi
│ │ and $0x80,%edx
│ │ add $0x8,%rsp
│ │→ jmpq __lll_lock_elision
│ │ nop
0.29 │ 60:└─→and $0x80,%esi
0.07 │ mov $0x1,%edi
0.29 │ xor %eax,%eax
2.53 │ lock cmpxchg %edi,(%r8)
And not things like that "jmpq __lll_lock_elision", that instead should behave
like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision".
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 10:12:33 -03:00
|
|
|
return dl->ops.target.offset_avail && !dl->ops.target.outside;
|
2012-04-25 14:16:03 -03:00
|
|
|
}
|
|
|
|
|
perf annotate: Add "_local" to jump/offset validation routines
Because they all really check if we can access data structures/visual
constructs where a "jump" instruction targets code in the same function,
i.e. things like:
__pthread_mutex_lock /usr/lib64/libpthread-2.26.so
1.95 │ mov __pthread_force_elision,%ecx
│ ┌──test %ecx,%ecx
0.07 │ ├──je 60
│ │ test $0x300,%esi
│ │↓ jne 60
│ │ or $0x100,%esi
│ │ mov %esi,0x10(%rdi)
│ 42:│ mov %esi,%edx
│ │ lea 0x16(%r8),%rsi
│ │ mov %r8,%rdi
│ │ and $0x80,%edx
│ │ add $0x8,%rsp
│ │→ jmpq __lll_lock_elision
│ │ nop
0.29 │ 60:└─→and $0x80,%esi
0.07 │ mov $0x1,%edi
0.29 │ xor %eax,%eax
2.53 │ lock cmpxchg %edi,(%r8)
And not things like that "jmpq __lll_lock_elision", that instead should behave
like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision".
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 10:12:33 -03:00
|
|
|
/*
|
|
|
|
* Can we draw an arrow from the jump to its target, for instance? I.e.
|
|
|
|
* is the jump and its target in the same function?
|
|
|
|
*/
|
|
|
|
bool disasm_line__is_valid_local_jump(struct disasm_line *dl, struct symbol *sym);
|
2018-03-15 15:31:56 -03:00
|
|
|
|
2017-10-11 17:01:34 +02:00
|
|
|
struct annotation_line *
|
|
|
|
annotation_line__next(struct annotation_line *pos, struct list_head *head);
|
2018-03-15 17:04:53 -03:00
|
|
|
|
2018-03-15 23:14:51 -03:00
|
|
|
struct annotation_write_ops {
|
|
|
|
bool first_line, current_entry, change_color;
|
|
|
|
int width;
|
|
|
|
void *obj;
|
|
|
|
int (*set_color)(void *obj, int color);
|
|
|
|
void (*set_percent_color)(void *obj, double percent, bool current);
|
|
|
|
int (*set_jumps_percent_color)(void *obj, int nr, bool current);
|
|
|
|
void (*printf)(void *obj, const char *fmt, ...);
|
|
|
|
void (*write_graph)(void *obj, int graph);
|
|
|
|
};
|
|
|
|
|
2018-03-15 19:12:39 -03:00
|
|
|
void annotation_line__write(struct annotation_line *al, struct annotation *notes,
|
2023-11-28 09:54:37 -08:00
|
|
|
struct annotation_write_ops *ops);
|
2018-03-15 17:04:53 -03:00
|
|
|
|
2018-04-03 15:19:47 -03:00
|
|
|
int __annotation__scnprintf_samples_period(struct annotation *notes,
|
|
|
|
char *bf, size_t size,
|
2019-07-21 13:23:51 +02:00
|
|
|
struct evsel *evsel,
|
2018-04-03 15:19:47 -03:00
|
|
|
bool show_freq);
|
|
|
|
|
2012-04-15 15:52:18 -03:00
|
|
|
size_t disasm__fprintf(struct list_head *head, FILE *fp);
|
2019-07-21 13:23:51 +02:00
|
|
|
void symbol__calc_percent(struct symbol *sym, struct evsel *evsel);
|
2011-02-04 09:45:46 -02:00
|
|
|
|
2024-03-04 15:08:15 -08:00
|
|
|
/**
|
|
|
|
* struct sym_hist - symbol histogram information for an event
|
|
|
|
*
|
|
|
|
* @nr_samples: Total number of samples.
|
|
|
|
* @period: Sum of sample periods.
|
|
|
|
*/
|
2011-02-04 09:45:46 -02:00
|
|
|
struct sym_hist {
|
2017-07-20 06:36:51 +09:00
|
|
|
u64 nr_samples;
|
2017-07-20 17:18:05 -03:00
|
|
|
u64 period;
|
2011-02-04 09:45:46 -02:00
|
|
|
};
|
|
|
|
|
2024-03-04 15:08:15 -08:00
|
|
|
/**
|
|
|
|
* struct cyc_hist - (CPU) cycle histogram for a basic block
|
|
|
|
*
|
|
|
|
* @start: Start address of current block (if known).
|
|
|
|
* @cycles: Sum of cycles for the longest basic block.
|
|
|
|
* @cycles_aggr: Total cycles for this address.
|
|
|
|
* @cycles_max: Max cycles for this address.
|
|
|
|
* @cycles_min: Min cycles for this address.
|
|
|
|
* @cycles_spark: History of cycles for the longest basic block.
|
|
|
|
* @num: Number of samples for the longest basic block.
|
|
|
|
* @num_aggr: Total number of samples for this address.
|
|
|
|
* @have_start: Whether the current branch info has a start address.
|
|
|
|
* @reset: Number of resets due to a different start address.
|
|
|
|
*
|
|
|
|
* If sample has branch_stack and cycles info, it can construct basic blocks
|
|
|
|
* between two adjacent branches. It'd have start and end addresses but
|
|
|
|
* sometimes the start address may not be available. So the cycles are
|
|
|
|
* accounted at the end address. If multiple basic blocks end at the same
|
|
|
|
* address, it will take the longest one.
|
|
|
|
*
|
|
|
|
* The @start, @cycles, @cycles_spark and @num fields are used for the longest
|
|
|
|
* block only. Other fields are used for all cases.
|
|
|
|
*
|
|
|
|
* See __symbol__account_cycles().
|
|
|
|
*/
|
2015-07-18 08:24:48 -07:00
|
|
|
struct cyc_hist {
|
|
|
|
u64 start;
|
|
|
|
u64 cycles;
|
|
|
|
u64 cycles_aggr;
|
2018-05-17 22:58:37 +08:00
|
|
|
u64 cycles_max;
|
|
|
|
u64 cycles_min;
|
2019-09-25 09:14:46 +08:00
|
|
|
s64 cycles_spark[NUM_SPARKS];
|
2015-07-18 08:24:48 -07:00
|
|
|
u32 num;
|
|
|
|
u32 num_aggr;
|
|
|
|
u8 have_start;
|
|
|
|
/* 1 byte padding */
|
|
|
|
u16 reset;
|
|
|
|
};
|
|
|
|
|
2024-03-04 15:08:15 -08:00
|
|
|
/**
|
|
|
|
* struct annotated_source - symbols with hits have this attached as in annotation
|
2011-02-04 13:43:24 -02:00
|
|
|
*
|
2024-03-04 15:08:15 -08:00
|
|
|
* @source: List head for annotated_line (embeded in disasm_line).
|
|
|
|
* @histograms: Array of symbol histograms per event to maintain the total number
|
|
|
|
* of samples and period.
|
|
|
|
* @nr_histograms: This may not be the same as evsel->evlist->core.nr_entries if
|
perf annotate: Add comment about annotated_src->nr_histograms
When we have multiple groups in an evlist, say:
$ perf stat -e '{cycles,instructions},{cache-references,cache-misses}' sleep 1
Performance counter stats for 'sleep 1':
343,134 cycles:u
249,292 instructions:u # 0.73 insn per cycle
15,556 cache-references:u
8,925 cache-misses:u # 57.373 % of all cache refs
1.000957550 seconds time elapsed
$
Then the perf_evsel instances for the two group leaders ("cycles" and
"cache-references") will have evsel->nr_members set to 2, while all the
evsel->evlist->nr_entries will be set to 4, so we can't use
evsel->evlist->nr_entries everywhere, as event groups need to be taken
into account.
But this probably requires us to audit at least the forced-group code,
where we want all of the events to be in a "group", to see them all in
the screen, one column for each, even knowing that they were not
necessarily scheduled to count at the same time by the kernel perf
subsystem.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2g0vwqnc49wl4ttjk8dvpgcc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-25 11:27:38 -03:00
|
|
|
* we have more than a group in a evlist, where we will want
|
|
|
|
* to see each group separately, that is why symbol__annotate2()
|
|
|
|
* sets src->nr_histograms to evsel->nr_members.
|
2024-03-04 15:08:15 -08:00
|
|
|
* @samples: Hash map of sym_hist_entry. Keyed by event index and offset in symbol.
|
2024-04-04 10:57:15 -07:00
|
|
|
* @nr_events: Number of events in the current output.
|
2024-03-04 15:08:15 -08:00
|
|
|
* @nr_entries: Number of annotated_line in the source list.
|
|
|
|
* @nr_asm_entries: Number of annotated_line with actual asm instruction in the
|
|
|
|
* source list.
|
2024-04-04 10:57:14 -07:00
|
|
|
* @max_jump_sources: Maximum number of jump instructions targeting to the same
|
|
|
|
* instruction.
|
2024-04-04 10:57:13 -07:00
|
|
|
* @widths: Precalculated width of each column in the TUI output.
|
2011-02-04 13:43:24 -02:00
|
|
|
*
|
2024-03-04 15:08:15 -08:00
|
|
|
* disasm_lines are allocated, percentages calculated and all sorted by percentage
|
2011-02-04 13:43:24 -02:00
|
|
|
* when the annotation is about to be presented, so the percentages are for
|
|
|
|
* one of the entries in the histogram array, i.e. for the event/counter being
|
|
|
|
* presented. It is deallocated right after symbol__{tui,tty,etc}_annotate
|
|
|
|
* returns.
|
|
|
|
*/
|
2011-02-08 13:27:39 -02:00
|
|
|
struct annotated_source {
|
2023-11-03 12:19:06 -07:00
|
|
|
struct list_head source;
|
|
|
|
struct sym_hist *histograms;
|
2024-03-04 15:08:12 -08:00
|
|
|
struct hashmap *samples;
|
2023-11-03 12:19:06 -07:00
|
|
|
int nr_histograms;
|
2024-04-04 10:57:15 -07:00
|
|
|
int nr_events;
|
2023-11-03 12:19:06 -07:00
|
|
|
int nr_entries;
|
|
|
|
int nr_asm_entries;
|
2024-04-04 10:57:14 -07:00
|
|
|
int max_jump_sources;
|
2024-04-04 10:57:16 -07:00
|
|
|
u64 start;
|
2024-04-04 10:57:13 -07:00
|
|
|
struct {
|
|
|
|
u8 addr;
|
|
|
|
u8 jumps;
|
|
|
|
u8 target;
|
|
|
|
u8 min_addr;
|
|
|
|
u8 max_addr;
|
|
|
|
u8 max_ins_name;
|
|
|
|
u16 max_line_len;
|
|
|
|
} widths;
|
2011-02-08 13:27:39 -02:00
|
|
|
};
|
|
|
|
|
2024-04-04 10:57:10 -07:00
|
|
|
struct annotation_line *annotated_source__get_line(struct annotated_source *src,
|
|
|
|
s64 offset);
|
|
|
|
|
2024-08-13 09:02:03 -07:00
|
|
|
/* A branch counter once saturated */
|
|
|
|
#define ANNOTATION__BR_CNTR_SATURATED_FLAG (1ULL << 63)
|
|
|
|
|
2024-03-04 15:08:15 -08:00
|
|
|
/**
|
|
|
|
* struct annotated_branch - basic block and IPC information for a symbol.
|
|
|
|
*
|
|
|
|
* @hit_cycles: Total executed cycles.
|
|
|
|
* @hit_insn: Total number of instructions executed.
|
|
|
|
* @total_insn: Number of instructions in the function.
|
|
|
|
* @cover_insn: Number of distinct, actually executed instructions.
|
|
|
|
* @cycles_hist: Array of cyc_hist for each instruction.
|
|
|
|
* @max_coverage: Maximum number of covered basic block (used for block-range).
|
2024-08-13 09:02:03 -07:00
|
|
|
* @br_cntr: Array of the occurrences of events (branch counters) during a block.
|
2024-03-04 15:08:15 -08:00
|
|
|
*
|
|
|
|
* This struct is used by two different codes when the sample has branch stack
|
|
|
|
* and cycles information. annotation__compute_ipc() calculates average IPC
|
|
|
|
* using @hit_insn / @hit_cycles. The actual coverage can be calculated using
|
|
|
|
* @cover_insn / @total_insn. The @cycles_hist can give IPC for each (longest)
|
|
|
|
* basic block ends at the given address.
|
|
|
|
* process_basic_block() calculates coverage of instructions (or basic blocks)
|
|
|
|
* in the function.
|
|
|
|
*/
|
2023-11-03 12:19:04 -07:00
|
|
|
struct annotated_branch {
|
perf annotate: Compute average IPC and IPC coverage per symbol
Add support to 'perf report' annotate view or 'perf annotate --stdio2'
to aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.
For example:
$ perf annotate --stdio2
Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)
Disassembly of section .text:
000000000003aac0 <random@@GLIBC_2.2.5>:
8.32 3.28 sub $0x18,%rsp
3.28 mov $0x1,%esi
3.28 xor %eax,%eax
3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
11.57 3.28 1 ↓ je 20
lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne 29
↓ jmp 43
11.57 1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
0.00 1.10 1 ↓ je 43
29: lea __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
sub $0x80,%rsp
→ callq __lll_lock_wait_private
add $0x80,%rsp
0.00 3.00 43: lea __ctype_b@GLIBC_2.2.5+0x38,%rdi
3.00 lea 0xc(%rsp),%rsi
8.49 3.00 1 → callq __random_r
7.91 1.94 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
0.00 1.94 1 ↓ je 68
lock decl __abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne 70
↓ jmp 8a
0.00 2.00 68: decl __abort_msg@@GLIBC_PRIVATE+0x8a0
21.56 2.00 1 ↓ je 8a
70: lea __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
sub $0x80,%rsp
→ callq __lll_unlock_wake_private
add $0x80,%rsp
21.56 2.90 8a: movslq 0xc(%rsp),%rax
2.90 add $0x18,%rsp
9.03 2.90 1 ← retq
It shows for this symbol the average IPC is 2.30 and the IPC coverage is
54.8%.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-30 21:54:54 +08:00
|
|
|
u64 hit_cycles;
|
|
|
|
u64 hit_insn;
|
|
|
|
unsigned int total_insn;
|
|
|
|
unsigned int cover_insn;
|
2023-11-03 12:19:04 -07:00
|
|
|
struct cyc_hist *cycles_hist;
|
2023-11-03 12:19:05 -07:00
|
|
|
u64 max_coverage;
|
2024-08-13 09:02:03 -07:00
|
|
|
u64 *br_cntr;
|
2023-11-03 12:19:04 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
struct LOCKABLE annotation {
|
2011-02-08 13:27:39 -02:00
|
|
|
struct annotated_source *src;
|
2023-11-03 12:19:04 -07:00
|
|
|
struct annotated_branch *branch;
|
2011-02-04 09:45:46 -02:00
|
|
|
};
|
|
|
|
|
2023-06-14 21:07:15 -07:00
|
|
|
static inline void annotation__init(struct annotation *notes __maybe_unused)
|
|
|
|
{
|
|
|
|
}
|
2021-11-11 19:51:24 -08:00
|
|
|
void annotation__exit(struct annotation *notes);
|
|
|
|
|
2023-06-14 21:07:15 -07:00
|
|
|
void annotation__lock(struct annotation *notes) EXCLUSIVE_LOCK_FUNCTION(*notes);
|
|
|
|
void annotation__unlock(struct annotation *notes) UNLOCK_FUNCTION(*notes);
|
|
|
|
bool annotation__trylock(struct annotation *notes) EXCLUSIVE_TRYLOCK_FUNCTION(true, *notes);
|
|
|
|
|
2018-03-15 10:35:04 -03:00
|
|
|
static inline int annotation__cycles_width(struct annotation *notes)
|
|
|
|
{
|
2023-11-28 09:54:40 -08:00
|
|
|
if (notes->branch && annotate_opts.show_minmax_cycle)
|
perf annotate: Create hotkey 'c' to show min/max cycles
In the 'perf annotate' view, a new hotkey 'c' is created for showing the
min/max cycles.
For example, when press 'c', the annotate view is:
Percent│ IPC Cycle(min/max)
│
│
│ Disassembly of section .text:
│
│ 000000000003aab0 <random@@GLIBC_2.2.5>:
8.22 │3.92 sub $0x18,%rsp
│3.92 mov $0x1,%esi
│3.92 xor %eax,%eax
│3.92 cmpl $0x0,argp_program_version_hook@@G
│3.92 1(2/1) ↓ je 20
│ lock cmpxchg %esi,__abort_msg@@GLIBC_P
│ ↓ jne 29
│ ↓ jmp 43
│1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+
8.93 │1.10 1(5/1) ↓ je 43
When press 'c' again, the annotate view is switched back:
Percent│ IPC Cycle
│
│
│ Disassembly of section .text:
│
│ 000000000003aab0 <random@@GLIBC_2.2.5>:
8.22 │3.92 sub $0x18,%rsp
│3.92 mov $0x1,%esi
│3.92 xor %eax,%eax
│3.92 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x
│3.92 1 ↓ je 20
│ lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
│ ↓ jne 29
│ ↓ jmp 43
│1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
8.93 │1.10 1 ↓ je 43
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com
[ Rename all maxmin to minmax ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 22:58:38 +08:00
|
|
|
return ANNOTATION__IPC_WIDTH + ANNOTATION__MINMAX_CYCLES_WIDTH;
|
|
|
|
|
2023-11-03 12:19:04 -07:00
|
|
|
return notes->branch ? ANNOTATION__IPC_WIDTH + ANNOTATION__CYCLES_WIDTH : 0;
|
2018-03-15 10:35:04 -03:00
|
|
|
}
|
|
|
|
|
2018-03-15 12:41:39 -03:00
|
|
|
static inline int annotation__pcnt_width(struct annotation *notes)
|
|
|
|
{
|
2024-08-03 14:13:30 -07:00
|
|
|
return (symbol_conf.show_total_period ? 12 : 8) * notes->src->nr_events;
|
2018-03-15 12:41:39 -03:00
|
|
|
}
|
|
|
|
|
2023-11-28 09:54:40 -08:00
|
|
|
static inline bool annotation_line__filter(struct annotation_line *al)
|
2018-03-16 10:39:24 -03:00
|
|
|
{
|
2023-11-28 09:54:40 -08:00
|
|
|
return annotate_opts.hide_src_code && al->offset == -1;
|
2018-03-16 10:39:24 -03:00
|
|
|
}
|
2018-03-15 15:59:01 -03:00
|
|
|
|
perf annotate: Display the branch counter histogram
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-13 09:02:06 -07:00
|
|
|
static inline u8 annotation__br_cntr_width(void)
|
|
|
|
{
|
|
|
|
return annotate_opts.show_br_cntr ? ANNOTATION__BR_CNTR_WIDTH : 0;
|
|
|
|
}
|
|
|
|
|
2018-03-15 16:19:59 -03:00
|
|
|
void annotation__update_column_widths(struct annotation *notes);
|
2022-09-23 10:31:42 -07:00
|
|
|
void annotation__toggle_full_addr(struct annotation *notes, struct map_symbol *ms);
|
2018-03-15 11:46:23 -03:00
|
|
|
|
2018-05-24 16:28:29 -03:00
|
|
|
static inline struct sym_hist *annotated_source__histogram(struct annotated_source *src, int idx)
|
|
|
|
{
|
2024-03-04 15:08:14 -08:00
|
|
|
return &src->histograms[idx];
|
2018-05-24 16:28:29 -03:00
|
|
|
}
|
|
|
|
|
2011-02-04 13:43:24 -02:00
|
|
|
static inline struct sym_hist *annotation__histogram(struct annotation *notes, int idx)
|
|
|
|
{
|
2018-05-24 16:28:29 -03:00
|
|
|
return annotated_source__histogram(notes->src, idx);
|
2011-02-04 13:43:24 -02:00
|
|
|
}
|
|
|
|
|
2024-03-04 15:08:13 -08:00
|
|
|
static inline struct sym_hist_entry *
|
|
|
|
annotated_source__hist_entry(struct annotated_source *src, int idx, u64 offset)
|
|
|
|
{
|
|
|
|
struct sym_hist_entry *entry;
|
|
|
|
long key = offset << 16 | idx;
|
|
|
|
|
|
|
|
if (!hashmap__find(src->samples, key, &entry))
|
|
|
|
return NULL;
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
2011-02-04 09:45:46 -02:00
|
|
|
static inline struct annotation *symbol__annotation(struct symbol *sym)
|
|
|
|
{
|
2015-01-14 20:18:05 +09:00
|
|
|
return (void *)sym - symbol_conf.priv_size;
|
2011-02-04 09:45:46 -02:00
|
|
|
}
|
|
|
|
|
2017-07-20 16:28:53 -03:00
|
|
|
int addr_map_symbol__inc_samples(struct addr_map_symbol *ams, struct perf_sample *sample,
|
2019-07-21 13:23:51 +02:00
|
|
|
struct evsel *evsel);
|
2013-12-18 16:48:29 -03:00
|
|
|
|
2023-11-03 12:19:05 -07:00
|
|
|
struct annotated_branch *annotation__get_branch(struct annotation *notes);
|
|
|
|
|
2015-07-18 08:24:48 -07:00
|
|
|
int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
|
|
|
|
struct addr_map_symbol *start,
|
2024-08-13 09:02:03 -07:00
|
|
|
unsigned cycles,
|
|
|
|
struct evsel *evsel,
|
|
|
|
u64 br_cntr);
|
2015-07-18 08:24:48 -07:00
|
|
|
|
2017-07-20 16:28:53 -03:00
|
|
|
int hist_entry__inc_addr_samples(struct hist_entry *he, struct perf_sample *sample,
|
2019-07-21 13:23:51 +02:00
|
|
|
struct evsel *evsel, u64 addr);
|
2013-12-18 17:10:15 -03:00
|
|
|
|
2018-05-24 17:33:18 -03:00
|
|
|
struct annotated_source *symbol__hists(struct symbol *sym, int nr_hists);
|
2011-02-06 14:54:44 -02:00
|
|
|
void symbol__annotate_zero_histograms(struct symbol *sym);
|
2011-02-04 09:45:46 -02:00
|
|
|
|
2019-11-04 11:10:00 -03:00
|
|
|
int symbol__annotate(struct map_symbol *ms,
|
2020-02-04 10:22:28 +05:30
|
|
|
struct evsel *evsel,
|
2017-12-11 12:46:11 -03:00
|
|
|
struct arch **parch);
|
2019-11-04 11:10:00 -03:00
|
|
|
int symbol__annotate2(struct map_symbol *ms,
|
2019-07-21 13:23:51 +02:00
|
|
|
struct evsel *evsel,
|
2018-03-15 16:54:11 -03:00
|
|
|
struct arch **parch);
|
2013-12-18 17:10:15 -03:00
|
|
|
|
2016-07-29 16:27:18 -03:00
|
|
|
enum symbol_disassemble_errno {
|
|
|
|
SYMBOL_ANNOTATE_ERRNO__SUCCESS = 0,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Choose an arbitrary negative big number not to clash with standard
|
|
|
|
* errno since SUS requires the errno has distinct positive values.
|
|
|
|
* See 'Issue 6' in the link below.
|
|
|
|
*
|
|
|
|
* http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
|
|
|
|
*/
|
|
|
|
__SYMBOL_ANNOTATE_ERRNO__START = -10000,
|
|
|
|
|
|
|
|
SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX = __SYMBOL_ANNOTATE_ERRNO__START,
|
perf annotate: Enable annotation of BPF programs
In symbol__disassemble(), DSO_BINARY_TYPE__BPF_PROG_INFO dso calls into
a new function symbol__disassemble_bpf(), where annotation line
information is filled based on the bpf_prog_info and btf data saved in
given perf_env.
symbol__disassemble_bpf() uses binutils's libopcodes to disassemble bpf
programs.
Committer testing:
After fixing this:
- u64 *addrs = (u64 *)(info_linear->info.jited_ksyms);
+ u64 *addrs = (u64 *)(uintptr_t)(info_linear->info.jited_ksyms);
Detected when crossbuilding to a 32-bit arch.
And making all this dependent on HAVE_LIBBFD_SUPPORT and
HAVE_LIBBPF_SUPPORT:
1) Have a BPF program running, one that has BTF info, etc, I used
the tools/perf/examples/bpf/augmented_raw_syscalls.c put in place
by 'perf trace'.
# grep -B1 augmented_raw ~/.perfconfig
[trace]
add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
#
# perf trace -e *mmsg
dnf/6245 sendmmsg(20, 0x7f5485a88030, 2, MSG_NOSIGNAL) = 2
NetworkManager/10055 sendmmsg(22<socket:[1056822]>, 0x7f8126ad1bb0, 2, MSG_NOSIGNAL) = 2
2) Then do a 'perf record' system wide for a while:
# perf record -a
^C[ perf record: Woken up 68 times to write data ]
[ perf record: Captured and wrote 19.427 MB perf.data (366891 samples) ]
#
3) Check that we captured BPF and BTF info in the perf.data file:
# perf report --header-only | grep 'b[pt]f'
# event : name = cycles:ppp, , id = { 294789, 294790, 294791, 294792, 294793, 294794, 294795, 294796 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# bpf_prog_info of id 13
# bpf_prog_info of id 14
# bpf_prog_info of id 15
# bpf_prog_info of id 16
# bpf_prog_info of id 17
# bpf_prog_info of id 18
# bpf_prog_info of id 21
# bpf_prog_info of id 22
# bpf_prog_info of id 41
# bpf_prog_info of id 42
# btf info of id 2
#
4) Check which programs got recorded:
# perf report | grep bpf_prog | head
0.16% exe bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.14% exe bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.08% fuse-overlayfs bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.07% fuse-overlayfs bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.01% clang-4.0 bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.01% clang-4.0 bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% clang bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.00% runc bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% clang bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% sh bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
#
This was with the default --sort order for 'perf report', which is:
--sort comm,dso,symbol
If we just look for the symbol, for instance:
# perf report --sort symbol | grep bpf_prog | head
0.26% [k] bpf_prog_819967866022f1e1_sys_enter - -
0.24% [k] bpf_prog_c1bd85c092d6e4aa_sys_exit - -
#
or the DSO:
# perf report --sort dso | grep bpf_prog | head
0.26% bpf_prog_819967866022f1e1_sys_enter
0.24% bpf_prog_c1bd85c092d6e4aa_sys_exit
#
We'll see the two BPF programs that augmented_raw_syscalls.o puts in
place, one attached to the raw_syscalls:sys_enter and another to the
raw_syscalls:sys_exit tracepoints, as expected.
Now we can finally do, from the command line, annotation for one of
those two symbols, with the original BPF program source coude intermixed
with the disassembled JITed code:
# perf annotate --stdio2 bpf_prog_819967866022f1e1_sys_enter
Samples: 950 of event 'cycles:ppp', 4000 Hz, Event count (approx.): 553756947, [percent: local period]
bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
Percent int sys_enter(struct syscall_enter_args *args)
53.41 push %rbp
0.63 mov %rsp,%rbp
0.31 sub $0x170,%rsp
1.93 sub $0x28,%rbp
7.02 mov %rbx,0x0(%rbp)
3.20 mov %r13,0x8(%rbp)
1.07 mov %r14,0x10(%rbp)
0.61 mov %r15,0x18(%rbp)
0.11 xor %eax,%eax
1.29 mov %rax,0x20(%rbp)
0.11 mov %rdi,%rbx
return bpf_get_current_pid_tgid();
2.02 → callq *ffffffffda6776d9
2.76 mov %eax,-0x148(%rbp)
mov %rbp,%rsi
int sys_enter(struct syscall_enter_args *args)
add $0xfffffffffffffeb8,%rsi
return bpf_map_lookup_elem(pids, &pid) != NULL;
movabs $0xffff975ac2607800,%rdi
1.26 → callq *ffffffffda6789e9
cmp $0x0,%rax
2.43 → je 0
add $0x38,%rax
0.21 xor %r13d,%r13d
if (pid_filter__has(&pids_filtered, getpid()))
0.81 cmp $0x0,%rax
→ jne 0
mov %rbp,%rdi
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
2.22 add $0xfffffffffffffeb8,%rdi
0.11 mov $0x40,%esi
0.32 mov %rbx,%rdx
2.74 → callq *ffffffffda658409
syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
0.22 mov %rbp,%rsi
1.69 add $0xfffffffffffffec0,%rsi
syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
movabs $0xffff975bfcd36000,%rdi
add $0xd0,%rdi
0.21 mov 0x0(%rsi),%eax
0.93 cmp $0x200,%rax
→ jae 0
0.10 shl $0x3,%rax
0.11 add %rdi,%rax
0.11 → jmp 0
xor %eax,%eax
if (syscall == NULL || !syscall->enabled)
1.07 cmp $0x0,%rax
→ je 0
if (syscall == NULL || !syscall->enabled)
6.57 movzbq 0x0(%rax),%rdi
if (syscall == NULL || !syscall->enabled)
cmp $0x0,%rdi
0.95 → je 0
mov $0x40,%r8d
switch (augmented_args.args.syscall_nr) {
mov -0x140(%rbp),%rdi
switch (augmented_args.args.syscall_nr) {
cmp $0x2,%rdi
→ je 0
cmp $0x101,%rdi
→ je 0
cmp $0x15,%rdi
→ jne 0
case SYS_OPEN: filename_arg = (const void *)args->args[0];
mov 0x10(%rbx),%rdx
→ jmp 0
case SYS_OPENAT: filename_arg = (const void *)args->args[1];
mov 0x18(%rbx),%rdx
if (filename_arg != NULL) {
cmp $0x0,%rdx
→ je 0
xor %edi,%edi
augmented_args.filename.reserved = 0;
mov %edi,-0x104(%rbp)
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %rbp,%rdi
add $0xffffffffffffff00,%rdi
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov $0x100,%esi
→ callq *ffffffffda658499
mov $0x148,%r8d
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %eax,-0x108(%rbp)
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %rax,%rdi
shl $0x20,%rdi
shr $0x20,%rdi
if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
cmp $0xff,%rdi
→ ja 0
len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
add $0x48,%rax
len &= sizeof(augmented_args.filename.value) - 1;
and $0xff,%rax
mov %rax,%r8
mov %rbp,%rcx
return perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
add $0xfffffffffffffeb8,%rcx
mov %rbx,%rdi
movabs $0xffff975fbd72d800,%rsi
mov $0xffffffff,%edx
→ callq *ffffffffda658ad9
mov %rax,%r13
}
mov %r13,%rax
0.72 mov 0x0(%rbp),%rbx
mov 0x8(%rbp),%r13
1.16 mov 0x10(%rbp),%r14
0.10 mov 0x18(%rbp),%r15
0.42 add $0x28,%rbp
0.54 leaveq
0.54 ← retq
#
Please see 'man perf-config' to see how to control what should be seen,
via ~/.perfconfig [annotate] section, for instance, one can suppress the
source code and see just the disassembly, etc.
Alternatively, use the TUI bu just using 'perf annotate', press
'/bpf_prog' to see the bpf symbols, press enter and do the interactive
annotation, which allows for dumping to a file after selecting the
the various output tunables, for instance, the above without source code
intermixed, plus showing all the instruction offsets:
# perf annotate bpf_prog_819967866022f1e1_sys_enter
Then press: 's' to hide the source code + 'O' twice to show all
instruction offsets, then 'P' to print to the
bpf_prog_819967866022f1e1_sys_enter.annotation file, which will have:
# cat bpf_prog_819967866022f1e1_sys_enter.annotation
bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
Event: cycles:ppp
53.41 0: push %rbp
0.63 1: mov %rsp,%rbp
0.31 4: sub $0x170,%rsp
1.93 b: sub $0x28,%rbp
7.02 f: mov %rbx,0x0(%rbp)
3.20 13: mov %r13,0x8(%rbp)
1.07 17: mov %r14,0x10(%rbp)
0.61 1b: mov %r15,0x18(%rbp)
0.11 1f: xor %eax,%eax
1.29 21: mov %rax,0x20(%rbp)
0.11 25: mov %rdi,%rbx
2.02 28: → callq *ffffffffda6776d9
2.76 2d: mov %eax,-0x148(%rbp)
33: mov %rbp,%rsi
36: add $0xfffffffffffffeb8,%rsi
3d: movabs $0xffff975ac2607800,%rdi
1.26 47: → callq *ffffffffda6789e9
4c: cmp $0x0,%rax
2.43 50: → je 0
52: add $0x38,%rax
0.21 56: xor %r13d,%r13d
0.81 59: cmp $0x0,%rax
5d: → jne 0
63: mov %rbp,%rdi
2.22 66: add $0xfffffffffffffeb8,%rdi
0.11 6d: mov $0x40,%esi
0.32 72: mov %rbx,%rdx
2.74 75: → callq *ffffffffda658409
0.22 7a: mov %rbp,%rsi
1.69 7d: add $0xfffffffffffffec0,%rsi
84: movabs $0xffff975bfcd36000,%rdi
8e: add $0xd0,%rdi
0.21 95: mov 0x0(%rsi),%eax
0.93 98: cmp $0x200,%rax
9f: → jae 0
0.10 a1: shl $0x3,%rax
0.11 a5: add %rdi,%rax
0.11 a8: → jmp 0
aa: xor %eax,%eax
1.07 ac: cmp $0x0,%rax
b0: → je 0
6.57 b6: movzbq 0x0(%rax),%rdi
bb: cmp $0x0,%rdi
0.95 bf: → je 0
c5: mov $0x40,%r8d
cb: mov -0x140(%rbp),%rdi
d2: cmp $0x2,%rdi
d6: → je 0
d8: cmp $0x101,%rdi
df: → je 0
e1: cmp $0x15,%rdi
e5: → jne 0
e7: mov 0x10(%rbx),%rdx
eb: → jmp 0
ed: mov 0x18(%rbx),%rdx
f1: cmp $0x0,%rdx
f5: → je 0
f7: xor %edi,%edi
f9: mov %edi,-0x104(%rbp)
ff: mov %rbp,%rdi
102: add $0xffffffffffffff00,%rdi
109: mov $0x100,%esi
10e: → callq *ffffffffda658499
113: mov $0x148,%r8d
119: mov %eax,-0x108(%rbp)
11f: mov %rax,%rdi
122: shl $0x20,%rdi
126: shr $0x20,%rdi
12a: cmp $0xff,%rdi
131: → ja 0
133: add $0x48,%rax
137: and $0xff,%rax
13d: mov %rax,%r8
140: mov %rbp,%rcx
143: add $0xfffffffffffffeb8,%rcx
14a: mov %rbx,%rdi
14d: movabs $0xffff975fbd72d800,%rsi
157: mov $0xffffffff,%edx
15c: → callq *ffffffffda658ad9
161: mov %rax,%r13
164: mov %r13,%rax
0.72 167: mov 0x0(%rbp),%rbx
16b: mov 0x8(%rbp),%r13
1.16 16f: mov 0x10(%rbp),%r14
0.10 173: mov 0x18(%rbp),%r15
0.42 177: add $0x28,%rbp
0.54 17b: leaveq
0.54 17c: ← retq
Another cool way to test all this is to symple use 'perf top' look for
those symbols, go there and press enter, annotate it live :-)
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11 22:30:48 -07:00
|
|
|
SYMBOL_ANNOTATE_ERRNO__NO_LIBOPCODES_FOR_BPF,
|
2019-09-30 15:48:12 -03:00
|
|
|
SYMBOL_ANNOTATE_ERRNO__ARCH_INIT_CPUID_PARSING,
|
|
|
|
SYMBOL_ANNOTATE_ERRNO__ARCH_INIT_REGEXP,
|
2019-09-30 16:04:21 -03:00
|
|
|
SYMBOL_ANNOTATE_ERRNO__BPF_INVALID_FILE,
|
|
|
|
SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF,
|
2016-07-29 16:27:18 -03:00
|
|
|
|
|
|
|
__SYMBOL_ANNOTATE_ERRNO__END,
|
|
|
|
};
|
|
|
|
|
2019-11-04 11:10:00 -03:00
|
|
|
int symbol__strerror_disassemble(struct map_symbol *ms, int errnum, char *buf, size_t buflen);
|
2016-07-29 16:27:18 -03:00
|
|
|
|
2023-11-28 09:54:37 -08:00
|
|
|
int symbol__annotate_printf(struct map_symbol *ms, struct evsel *evsel);
|
2011-02-06 14:54:44 -02:00
|
|
|
void symbol__annotate_zero_histogram(struct symbol *sym, int evidx);
|
2011-02-08 13:27:39 -02:00
|
|
|
void symbol__annotate_decay_histogram(struct symbol *sym, int evidx);
|
2017-10-11 17:01:38 +02:00
|
|
|
void annotated_source__purge(struct annotated_source *as);
|
2011-02-04 09:45:46 -02:00
|
|
|
|
2023-11-28 09:54:37 -08:00
|
|
|
int map_symbol__annotation_dump(struct map_symbol *ms, struct evsel *evsel);
|
perf annotate browser: Add 'P' hotkey to dump annotation to file
Just like we have in the histograms browser used as the main screen for
'perf top --tui' and 'perf report --tui', to print the current
annotation to a file with a named composed by the symbol name and the
".annotation" suffix.
Here is one example of pressing 'A' on 'perf top' to live annotate a
kernel function and then press 'P' to dump that annotation, the
resulting file:
# cat _raw_spin_lock_irqsave.annotation
_raw_spin_lock_irqsave() /proc/kcore
Event: cycles:ppp
7.14 nop
21.43 push %rbx
7.14 pushfq
pop %rax
nop
mov %rax,%rbx
cli
nop
xor %eax,%eax
mov $0x1,%edx
64.29 lock cmpxchg %edx,(%rdi)
test %eax,%eax
↓ jne 2b
mov %rbx,%rax
pop %rbx
← retq
2b: mov %eax,%esi
→ callq queued_spin_lock_slowpath
mov %rbx,%rax
pop %rbx
← retq
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zzmnrwugb5vtk7bvg0rbx150@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-16 16:57:47 -03:00
|
|
|
|
2014-02-20 10:32:53 +09:00
|
|
|
bool ui__has_annotation(void);
|
|
|
|
|
2023-11-28 09:54:37 -08:00
|
|
|
int symbol__tty_annotate(struct map_symbol *ms, struct evsel *evsel);
|
2011-02-04 09:45:46 -02:00
|
|
|
|
2023-11-28 09:54:37 -08:00
|
|
|
int symbol__tty_annotate2(struct map_symbol *ms, struct evsel *evsel);
|
2018-03-15 23:44:34 -03:00
|
|
|
|
2013-09-30 12:07:11 +02:00
|
|
|
#ifdef HAVE_SLANG_SUPPORT
|
2019-11-04 11:10:00 -03:00
|
|
|
int symbol__tui_annotate(struct map_symbol *ms, struct evsel *evsel,
|
2023-11-28 09:54:38 -08:00
|
|
|
struct hist_browser_timer *hbt);
|
2012-09-28 18:32:02 +09:00
|
|
|
#else
|
2019-11-04 11:10:00 -03:00
|
|
|
static inline int symbol__tui_annotate(struct map_symbol *ms __maybe_unused,
|
2019-07-21 13:23:51 +02:00
|
|
|
struct evsel *evsel __maybe_unused,
|
2023-11-28 09:54:38 -08:00
|
|
|
struct hist_browser_timer *hbt __maybe_unused)
|
2011-02-04 09:45:46 -02:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2023-11-28 09:54:39 -08:00
|
|
|
void annotation_options__init(void);
|
|
|
|
void annotation_options__exit(void);
|
2023-03-28 16:55:40 -07:00
|
|
|
|
2023-11-28 09:54:39 -08:00
|
|
|
void annotation_config__init(void);
|
2018-03-16 14:33:38 -03:00
|
|
|
|
2018-08-04 15:05:20 +02:00
|
|
|
int annotate_parse_percent_type(const struct option *opt, const char *_str,
|
|
|
|
int unset);
|
2020-01-07 13:04:44 -08:00
|
|
|
|
2023-11-28 09:54:39 -08:00
|
|
|
int annotate_check_args(void);
|
2020-01-07 13:04:44 -08:00
|
|
|
|
2023-12-12 16:13:12 -08:00
|
|
|
/**
|
|
|
|
* struct annotated_op_loc - Location info of instruction operand
|
2024-01-16 22:26:51 -08:00
|
|
|
* @reg1: First register in the operand
|
|
|
|
* @reg2: Second register in the operand
|
2023-12-12 16:13:12 -08:00
|
|
|
* @offset: Memory access offset in the operand
|
2024-03-18 22:51:08 -07:00
|
|
|
* @segment: Segment selector register
|
2023-12-12 16:13:12 -08:00
|
|
|
* @mem_ref: Whether the operand accesses memory
|
2024-01-16 22:26:51 -08:00
|
|
|
* @multi_regs: Whether the second register is used
|
2024-03-18 22:51:08 -07:00
|
|
|
* @imm: Whether the operand is an immediate value (in offset)
|
2023-12-12 16:13:12 -08:00
|
|
|
*/
|
|
|
|
struct annotated_op_loc {
|
2024-01-16 22:26:51 -08:00
|
|
|
int reg1;
|
|
|
|
int reg2;
|
2023-12-12 16:13:12 -08:00
|
|
|
int offset;
|
2024-03-18 22:51:08 -07:00
|
|
|
u8 segment;
|
2023-12-12 16:13:12 -08:00
|
|
|
bool mem_ref;
|
2024-01-16 22:26:51 -08:00
|
|
|
bool multi_regs;
|
2024-03-18 22:51:08 -07:00
|
|
|
bool imm;
|
2023-12-12 16:13:12 -08:00
|
|
|
};
|
|
|
|
|
|
|
|
enum annotated_insn_ops {
|
|
|
|
INSN_OP_SOURCE = 0,
|
|
|
|
INSN_OP_TARGET = 1,
|
|
|
|
|
|
|
|
INSN_OP_MAX,
|
|
|
|
};
|
|
|
|
|
2024-03-18 22:51:08 -07:00
|
|
|
enum annotated_x86_segment {
|
|
|
|
INSN_SEG_NONE = 0,
|
|
|
|
|
|
|
|
INSN_SEG_X86_CS,
|
|
|
|
INSN_SEG_X86_DS,
|
|
|
|
INSN_SEG_X86_ES,
|
|
|
|
INSN_SEG_X86_FS,
|
|
|
|
INSN_SEG_X86_GS,
|
|
|
|
INSN_SEG_X86_SS,
|
|
|
|
};
|
|
|
|
|
2023-12-12 16:13:12 -08:00
|
|
|
/**
|
|
|
|
* struct annotated_insn_loc - Location info of instruction
|
|
|
|
* @ops: Array of location info for source and target operands
|
|
|
|
*/
|
|
|
|
struct annotated_insn_loc {
|
|
|
|
struct annotated_op_loc ops[INSN_OP_MAX];
|
|
|
|
};
|
|
|
|
|
|
|
|
#define for_each_insn_op_loc(insn_loc, i, op_loc) \
|
|
|
|
for (i = INSN_OP_SOURCE, op_loc = &(insn_loc)->ops[i]; \
|
|
|
|
i < INSN_OP_MAX; \
|
|
|
|
i++, op_loc++)
|
|
|
|
|
|
|
|
/* Get detailed location info in the instruction */
|
|
|
|
int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
|
|
|
|
struct annotated_insn_loc *loc);
|
|
|
|
|
2023-12-12 16:13:13 -08:00
|
|
|
/* Returns a data type from the sample instruction (if any) */
|
|
|
|
struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he);
|
|
|
|
|
perf annotate: Add --insn-stat option for debugging
This is for a debugging purpose. It'd be useful to see per-instrucion
level success/failure stats.
$ perf annotate --data-type --insn-stat
Annotate Instruction stats
total 264, ok 143 (54.2%), bad 121 (45.8%)
Name : Good Bad
-----------------------------------------------------------
movq : 45 31
movl : 22 11
popq : 0 19
cmpl : 16 3
addq : 8 7
cmpq : 11 3
cmpxchgl : 3 7
cmpxchgq : 8 0
incl : 3 3
movzbl : 4 2
incq : 4 2
decl : 6 0
...
Committer notes:
So these are about being able to find the type for accesses from these
instructions, we should improve the naming, but it is for debugging, we
can improve this later:
@@ -3726,6 +3759,10 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he)
continue;
mem_type = find_data_type(ms, ip, op_loc->reg, op_loc->offset);
+ if (mem_type)
+ istat->good++;
+ else
+ istat->bad++;
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-18-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-12 16:13:23 -08:00
|
|
|
struct annotated_item_stat {
|
|
|
|
struct list_head list;
|
|
|
|
char *name;
|
|
|
|
int good;
|
|
|
|
int bad;
|
|
|
|
};
|
|
|
|
extern struct list_head ann_insn_stat;
|
|
|
|
|
2024-01-16 22:26:54 -08:00
|
|
|
/* Calculate PC-relative address */
|
|
|
|
u64 annotate_calc_pcrel(struct map_symbol *ms, u64 ip, int offset,
|
|
|
|
struct disasm_line *dl);
|
|
|
|
|
2024-03-18 22:50:59 -07:00
|
|
|
/**
|
|
|
|
* struct annotated_basic_block - Basic block of instructions
|
|
|
|
* @list: List node
|
|
|
|
* @begin: start instruction in the block
|
|
|
|
* @end: end instruction in the block
|
|
|
|
*/
|
|
|
|
struct annotated_basic_block {
|
|
|
|
struct list_head list;
|
|
|
|
struct disasm_line *begin;
|
|
|
|
struct disasm_line *end;
|
|
|
|
};
|
|
|
|
|
|
|
|
/* Get a list of basic blocks from src to dst addresses */
|
|
|
|
int annotate_get_basic_blocks(struct symbol *sym, s64 src, s64 dst,
|
|
|
|
struct list_head *head);
|
|
|
|
|
2024-08-05 16:46:48 -07:00
|
|
|
void debuginfo_cache__delete(void);
|
|
|
|
|
perf report: Display the branch counter histogram
Reusing the existing --total-cycles option to display the branch
counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
the logged branch counter events. They are shown right after all the
cycle-related annotations.
Extend the 'struct block_info' to store and pass the branch counter
related information.
The annotation_br_cntr_entry() is to print the histogram of each branch
counter event. If the number of logged events is less than 4, the exact
number of the abbr name is printed. Otherwise, using '+' to stands for
more than 3 events.
Assume the number of logged events is less than 4.
The annotation_br_cntr_abbr_list() prints the branch counter's
abbreviation list. Press 'B' to display the list in the TUI mode.
$ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
$ perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
# Event count (approx.): 1610046
#
# Branch counter abbr list:
# branch-instructions:ppp = A
# branch-misses = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... .............. ..................
#
57.55% 2.5M 0.00% 3 |A |- | ...
25.27% 1.1M 0.00% 2 |AA |- | ...
15.61% 667.2K 0.00% 1 |A |- | ...
0.16% 6.9K 0.81% 575 |A |- | ...
0.16% 6.8K 1.38% 977 |AA |- | ...
0.16% 6.8K 0.04% 28 |AA |B | ...
0.15% 6.6K 1.33% 946 |A |- | ...
0.11% 4.5K 0.06% 46 |AAA+|- | ...
0.10% 4.4K 0.88% 624 |A |- | ...
0.09% 3.7K 0.74% 524 |AAA+|B | ...
With -v applied,
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... .............. ..................
#
57.55% 2.5M 0.00% 3 A=1 ,B=- ...
25.27% 1.1M 0.00% 2 A=2 ,B=- ...
15.61% 667.2K 0.00% 1 A=1 ,B=- ...
0.16% 6.9K 0.81% 575 A=1 ,B=- ...
0.16% 6.8K 1.38% 977 A=2 ,B=- ...
0.16% 6.8K 0.04% 28 A=2 ,B=1 ...
0.15% 6.6K 1.33% 946 A=1 ,B=- ...
0.11% 4.5K 0.06% 46 A=3+,B=- ...
0.10% 4.4K 0.88% 624 A=1 ,B=- ...
0.09% 3.7K 0.74% 524 A=3+,B=1 ...
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-7-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-13 09:02:05 -07:00
|
|
|
int annotation_br_cntr_entry(char **str, int br_cntr_nr, u64 *br_cntr,
|
|
|
|
int num_aggr, struct evsel *evsel);
|
|
|
|
int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header);
|
2011-02-04 09:45:46 -02:00
|
|
|
#endif /* __PERF_ANNOTATE_H */
|