trace: Update documentation

Revamp the documentation for the new features, including a description of
the new features and documentation for the trace command.

Signed-off-by: Simon Glass <sjg@chromium.org>
This commit is contained in:
Simon Glass 2023-01-15 14:16:01 -07:00 committed by Tom Rini
parent 9cea4797ae
commit e46b72436c
9 changed files with 382 additions and 64 deletions

1
.gitattributes vendored
View file

@ -4,3 +4,4 @@
*.bmp binary
*.ttf binary
*.gz binary
*.png binary

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

View file

@ -15,12 +15,16 @@ Overview
The trace feature uses GCC's instrument-functions feature to trace all
function entry/exit points. These are then recorded in a memory buffer.
The memory buffer can be saved to the host over a network link using
tftpput or by writing to an attached memory device such as MMC.
tftpput or by writing to an attached storage device such as MMC.
On the host, the file is first converted with a tool called 'proftool',
which extracts useful information from it. The resulting trace output
resembles that emitted by Linux's ftrace feature, so can be visually
displayed by pytimechart.
displayed by kernelshark (see kernelshark_) and used with
'trace-cmd report' (see trace_cmd_).
It is also possible to produce a flame graph for use with flamegraph.pl
(see flamegraph_pl_).
Quick-start using Sandbox
@ -30,7 +34,7 @@ Sandbox is a build of U-Boot that can run under Linux so it is a convenient
way of trying out tracing before you use it on your actual board. To do
this, follow these steps:
Add the following to config/sandbox_defconfig
Add the following to `config/sandbox_defconfig`:
.. code-block:: c
@ -75,7 +79,7 @@ a trace:
16 maximum observed call depth
15 call depth limit
1,275,767 calls not traced due to depth
=>trace calls 0 e00000
=>trace calls 1000000 e00000
Call list dumped to 00000000, size 0xae0a40
=>print
baudrate=115200
@ -87,7 +91,7 @@ a trace:
stdout=serial
Environment size: 117/8188 bytes
=>host save host 0 trace 0 ${profoffset}
=>host save hostfs - 1000000 trace ${profoffset}
11405888 bytes written in 10 ms (1.1 GiB/s)
=>reset
@ -96,18 +100,107 @@ Then run proftool to convert the trace information to ftrace format
.. code-block:: console
$ ./sandbox/tools/proftool -m sandbox/System.map -p trace dump-ftrace >trace.txt
$ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-ftrace >trace.dat
Finally run pytimechart to display it
Finally run kernelshark to display it (note it only works with `.dat` files!):
.. code-block:: console
$ pytimechart trace.txt
$ kernelshark trace.dat
Using this tool you can zoom and pan across the trace, with the function
calls on the left and little marks representing the start and end of each
Using this tool you can view the trace records and see the timestamp for each
function.
.. image:: pics/kernelshark.png
:width: 800
:alt: Kernelshark showing function-trace records
To see the records on the console, use trace-cmd:
.. code-block:: console
$ trace-cmd report trace.dat | less
cpus=1
u-boot-1 [000] 3.116364: function: initf_malloc
u-boot-1 [000] 3.116375: function: initf_malloc
u-boot-1 [000] 3.116386: function: initf_bootstage
u-boot-1 [000] 3.116396: function: bootstage_init
u-boot-1 [000] 3.116408: function: malloc
u-boot-1 [000] 3.116418: function: malloc_simple
u-boot-1 [000] 3.116429: function: alloc_simple
u-boot-1 [000] 3.116441: function: alloc_simple
u-boot-1 [000] 3.116449: function: malloc_simple
u-boot-1 [000] 3.116457: function: malloc
Note that `pytimechart` is obsolete so cannot be used anymore.
There is a -f option available to select a function graph:
.. code-block:: console
$ ./sandbox/tools/proftool -m sandbox/System.map -t trace -f funcgraph dump-ftrace >trace.dat
Again, you can use kernelshark or trace-cmd to look at the output. In this case
you will see the time taken by each function shown against its exit record.
.. image:: pics/kernelshark_fg.png
:width: 800
:alt: Kernelshark showing function-graph records
.. code-block:: console
$ trace-cmd report trace.dat | less
cpus=1
u-boot-1 [000] 3.116364: funcgraph_entry: 0.011 us | initf_malloc();
u-boot-1 [000] 3.116386: funcgraph_entry: | initf_bootstage() {
u-boot-1 [000] 3.116396: funcgraph_entry: | bootstage_init() {
u-boot-1 [000] 3.116408: funcgraph_entry: | malloc() {
u-boot-1 [000] 3.116418: funcgraph_entry: | malloc_simple() {
u-boot-1 [000] 3.116429: funcgraph_entry: 0.012 us | alloc_simple();
u-boot-1 [000] 3.116449: funcgraph_exit: 0.031 us | }
u-boot-1 [000] 3.116457: funcgraph_exit: 0.049 us | }
u-boot-1 [000] 3.116466: funcgraph_entry: 0.063 us | memset();
u-boot-1 [000] 3.116539: funcgraph_exit: 0.143 us | }
Flame graph
-----------
Some simple flame graph options are available as well, using the dump-flamegraph
command:
.. code-block:: console
$ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-flamegraph >trace.fg
$ flamegraph.pl trace.fg >trace.svg
You can load the .svg file into a viewer. If you use Chrome (and some other
programs) you can click around and zoom in and out.
.. image:: pics/flamegraph.png
:width: 800
:alt: Chrome showing the flamegraph.pl output
.. image:: pics/flamegraph_zoom.png
:width: 800
:alt: Chrome showing zooming into the flamegraph.pl output
A timing variant is also available, which gives an idea of how much time is
spend in each call stack:
.. code-block:: console
$ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-flamegraph -f timing >trace.fg
$ flamegraph.pl trace.fg >trace.svg
Note that trace collection does slow down execution so the timings will be
inflated. They should be used to guide optimisation. For accurate boot timings,
use bootstage.
.. image:: pics/flamegraph_timing.png
:width: 800
:alt: Chrome showing flamegraph.pl output with timing
CONFIG Options
--------------
@ -138,6 +231,11 @@ CONFIG_TRACE_EARLY_SIZE
CONFIG_TRACE_EARLY_ADDR
Address of early trace buffer
CONFIG_TRACE_CALL_DEPTH_LIMIT
Sets the limit on trace call-depth. For a broad view, 10 is typically
sufficient. Setting this too large creates enormous traces and distorts
the overall timing considerable.
Building U-Boot with Tracing Enabled
------------------------------------
@ -148,6 +246,26 @@ instrumenting from the command line instead of having to change board
config files.
Board requirements
------------------
Trace data collection relies on a microsecond timer, accessed through
`timer_get_us()`. So the first thing you should do is make sure that
this produces sensible results for your board. Suitable sources for
this timer include high resolution timers, PWMs or profile timers if
available. Most modern SOCs have a suitable timer for this.
See `add_ftrace()` for where `timer_get_us()` is called. The `notrace`
attribute must be used on each function called by `timer_get_us()` since
recursive calls to `add_ftrace()` will cause a fault::
trace: recursion detected, disabling
You cannot use driver model to obtain the microsecond timer, since tracing
may be enabled before driver model is set up. Instead, provide a low-level
function which accesses the timer, setting it up if needed.
Collecting Trace Data
---------------------
@ -155,21 +273,22 @@ When you run U-Boot on your board it will collect trace data up to the
limit of the trace buffer size you have specified. Once that is exhausted
no more data will be collected.
Collecting trace data has an affect on execution time/performance. You
Collecting trace data affects execution time and performance. You
will notice this particularly with trivial functions - the overhead of
recording their execution may even exceed their normal execution time.
In practice this doesn't matter much so long as you are aware of the
effect. Once you have done your optimizations, turn off tracing before
doing end-to-end timing.
doing end-to-end timing using bootstage.
The best time to start tracing is right at the beginning of U-Boot. The
best time to stop tracing is right at the end. In practice it is hard
to achieve these ideals.
This implementation enables tracing early in board_init_f(). This means
This implementation enables tracing early in `board_init_r()`, or
`board_init_f()` when `TRACE_EARLY` is enabled. This means
that it captures most of the board init process, missing only the
early architecture-specific init. However, it also misses the entire
SPL stage if there is one.
SPL stage if there is one. At present tracing is not supported in SPL.
U-Boot typically ends with a 'bootm' command which loads and runs an
OS. There is useful trace data in the execution of that bootm
@ -179,39 +298,11 @@ the OS. In practical terms, U-Boot runs the 'fakegocmd' environment
variable at this point. This variable should have a short script which
collects the trace data and writes it somewhere.
Trace data collection relies on a microsecond timer, accessed through
timer_get_us(). So the first think you should do is make sure that
this produces sensible results for your board. Suitable sources for
this timer include high resolution timers, PWMs or profile timers if
available. Most modern SOCs have a suitable timer for this. Make sure
that you mark this timer (and anything it calls) with
notrace so that the trace library can
use it without causing an infinite loop.
Controlling the trace
---------------------
Commands
--------
The trace command has variable sub-commands:
stats
Display tracing statistics
pause
Pause tracing
resume
Resume tracing
funclist [<addr> <size>]
Dump a list of functions into the buffer
calls [<addr> <size>]
Dump function call trace into buffer
If the address and size are not given, these are obtained from environment
variables (see below). In any case the environment variables are updated
after the command runs.
U-Boot provides a command-line interface to the trace system for controlling
tracing and accessing the trace data. See :doc:`../usage/cmd/trace`.
Environment Variables
@ -264,39 +355,94 @@ a trace log to address 10000000 and sends it to a host machine using
TFTP. After this, U-Boot will boot the OS normally, albeit a little
later.
For a filesystem you may do something like::
Converting Trace Output Data
----------------------------
trace calls 10000000 1000000;
save mmc 1:1 10000000 /trace ${profoffset}
The trace buffer format is internal to the trace system. It consists of a
header, a call count for each function site, followed by a list of trace
records, once for each function call.
Converting Trace Output Data (proftool)
---------------------------------------
The trace output data is kept in a binary format which is not documented
here. To convert it into something useful, you can use proftool.
here. See the `trace.h` header file if you are interested. To convert it into
something useful, you can use proftool.
This tool must be given the U-Boot map file and the trace data received
from running that U-Boot. It produces a text output file.
from running that U-Boot. It produces a binary output file.
Options
It is also possible to provide a configuration file to indicate which functions
should be included or dropped during conversion. This file consists of lines
like::
include-func <regex>
exclude-func <regex>
where <regex> is a regular expression matched against function names. It
allows some functions to be dropped from the trace when producing ftrace
records.
Options:
-c <config_file>
Specify the optional configuration file, to control which functions are
included in the output.
-f <format>
Specifies the format to use (see below)
-m <map_file>
Specify U-Boot map file
Specify U-Boot map file (`System.map`)
-p <trace_file>
Specify profile/trace file
-o <output file>
Specify the output filename
-t <trace_file>
Specify trace file, the data saved from U-Boot
-v <0-4>
Specify the verbosity, where 0 is the minimum and 4 is for debugging.
Commands:
dump-ftrace
Write a text dump of the file in Linux ftrace format to stdout
dump-ftrace:
Write a binary dump of the file in Linux ftrace format. Two options are
available:
function
write function-call records (caller/callee)
funcgraph
write function entry/exit records (graph)
This format can be used with kernelshark_ and trace_cmd_.
dump-flamegraph
Write a list of stack records useful for producing a flame graph. Two
options are available:
calls
create a flamegraph of stack frames
timing
create a flamegraph of microseconds for each stack frame
This format can be used with flamegraph_pl_.
Viewing the Trace Data
----------------------
You can use pytimechart for this (sudo apt-get pytimechart might work on
your Debian-style machine, and use your favourite search engine to obtain
documentation). It expects the file to have a .txt extension. The program
has terse user interface but is very convenient for viewing U-Boot
profile information.
You can use kernelshark_ for a GUI, but note that version 2.0.x was broken. If
you have that version you could try building it from source.
The file must have a .dat extension or it is ignored. The program has terse
user interface but is very convenient for viewing U-Boot profile information.
Also available is trace_cmd_ which provides a command-line interface.
Workflow Suggestions
--------------------
@ -329,7 +475,9 @@ There are a few parameters in the code that you may want to consider.
There is a function call depth limit (set to 15 by default). When the
stack depth goes above this then no tracing information is recorded.
The maximum depth reached is recorded and displayed by the 'trace stats'
command.
command. While it might be tempting to set the depth limit quite high, this
can dramatically increase the size of the trace output as well as the execution
time.
Future Work
@ -346,5 +494,10 @@ Some other features that might be useful:
- Compression of trace information
Simon Glass <sjg@chromium.org>
April 2013
.. sectionauthor:: Simon Glass <sjg@chromium.org>
.. April 2013
.. Updated January 2023
.. _kernelshark: https://kernelshark.org/
.. _trace_cmd: https://www.trace-cmd.org/
.. _flamegraph_pl: https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl

163
doc/usage/cmd/trace.rst Normal file
View file

@ -0,0 +1,163 @@
.. SPDX-License-Identifier: GPL-2.0+:
trace command
=============
Synopis
-------
::
trace stats
trace pause
trace resume
trace funclist [<addr> <size>]
trace calls [<addr> <size>]
Description
-----------
The *trace* command is used to control the U-Boot tracing system. It allows
tracing to be paused and resumed, shows statistics for traces and provides a
way to dump out the trace information.
trace stats
~~~~~~~~~~~
This display tracing statistics, as follows:
function sites
Functions are binned as a way of reducing the amount of space needed to
hold all the function information. This is controlled by FUNC_SITE_SIZE in
the trace.h header file. The usual value is 4, which provides the finest
granularity (assuming a minimum instruction size of 4 bytes) which means
that every function can be resolved individually.
function calls
Total number of function calls, including those which were not traced due
to buffer space. This count does not include functions which exceeded the
depth limit.
untracked function calls
Total number of function calls which did not appear in the U-Boot image.
This can happen if a function is called outside the normal code area.
traced function calls
Total number of function calls which were actually traced, i.e. are included
in the recorded trace data.
dropped due to overflow
If the trace buffer was exhausted then this shows the number of records that
were dropped. Try reducing the depth limit or expanding the buffer size.
maximum observed call depth
Maximum observed call depth while tracing.
calls not traced due to depth
Counts the number of function calls that were not recorded because they
exceeded the maximum call depth.
max function calls
Maximum number of function calls which can be recorded in the trace buffer,
given its size. Once `function calls` hits this value, recording stops.
trace buffer
Address of trace buffer
call records
Address of first trace record. This is near the start of the trace buffer,
after the function-call counts.
trace pause
~~~~~~~~~~~
Pauses tracing, so that no more data is added to the trace buffer.
trace resume
~~~~~~~~~~~~
Resumes tracing, so that new function calls are added to the trace buffer if
there is sufficient space.
trace funclist [<addr> <size>]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dumps a list of functions into the provided buffer. The file uses a format
specific to U-Boot: a header, following by the function offset and call count.
If the address and size are not given, these are obtained from
:ref:`develop/trace:environment variables`. In any case the environment
variables are updated after the command runs.
The resulting data should be written out to the host, e.g. using Ethernet or
a filesystem. There are no tools provided to read this sdata.
trace calls [<addr> <size>]
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dumps a list of function calls into the provided buffer. The file uses a format
specific to U-Boot: a header, following by the list of calls. The proftool
tool can be used to convert this information ready for further analysis.
Example
-------
::
=> trace stats
269,252 function sites
38,025,059 function calls
3 untracked function calls
7,382,690 traced function calls
17 maximum observed call depth
15 call depth limit
68,667,432 calls not traced due to depth
22,190,112 max function calls
trace buffer 6c000000 call records 6c20de78
=> trace resume
=> trace pause
This shows that resuming the trace causes the buffer to overflow::
=> trace stats
269,252 function sites
49,573,694 function calls
3 untracked function calls
22,190,112 traced function calls (8289848 dropped due to overflow)
17 maximum observed call depth
15 call depth limit
68,667,432 calls not traced due to depth
22,190,112 max function calls
trace buffer 6c000000 call records 6c20de78
=> trace funcs 30000000 0x100000
Function trace dumped to 30000000, size 0x1e70
This shows collecting and writing out the result trace data:
::
=> trace calls 20000000 0x10000000
Call list dumped to 20000000, size 0xfdf21a0
=> save mmc 1:1 20000000 /trace ${profoffset}
File System is consistent
file found, deleting
update journal finished
File System is consistent
update journal finished
266281376 bytes written in 18584 ms (13.7 MiB/s)
From here you can use proftool to convert it:
.. code-block:: bash
tools/proftool -m System.map -t trace -o asc.fg dump-ftrace
.. _`ACPI specification`: https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

View file

@ -84,6 +84,7 @@ Shell commands
cmd/sm
cmd/temperature
cmd/tftpput
cmd/trace
cmd/true
cmd/ums
cmd/ut