trace: Update documentation

Revamp the documentation for the new features, including a description of the new features and documentation for the trace command. Signed-off-by: Simon Glass <sjg@chromium.org>
2024-11-28 15:41:40 +00:00 · 2023-01-15 14:16:01 -07:00 · 2023-01-15 14:16:01 -07:00 · e46b72436c
commit e46b72436c
parent 9cea4797ae
9 changed files with 382 additions and 64 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -4,3 +4,4 @@
 *.bmp binary
 *.ttf binary
 *.gz binary
 *.png binary
--- a/doc/develop/pics/flamegraph.png
+++ b/doc/develop/pics/flamegraph.png
--- a/doc/develop/pics/flamegraph_timing.png
+++ b/doc/develop/pics/flamegraph_timing.png
--- a/doc/develop/pics/flamegraph_zoom.png
+++ b/doc/develop/pics/flamegraph_zoom.png
--- a/doc/develop/pics/kernelshark.png
+++ b/doc/develop/pics/kernelshark.png
--- a/doc/develop/pics/kernelshark_fg.png
+++ b/doc/develop/pics/kernelshark_fg.png
--- a/doc/develop/trace.rst
+++ b/doc/develop/trace.rst
@ -15,12 +15,16 @@ Overview
 The trace feature uses GCC's instrument-functions feature to trace all
 function entry/exit points. These are then recorded in a memory buffer.
 The memory buffer can be saved to the host over a network link using
-tftpput or by writing to an attached memory device such as MMC.
+tftpput or by writing to an attached storage device such as MMC.
 On the host, the file is first converted with a tool called 'proftool',
 which extracts useful information from it. The resulting trace output
 resembles that emitted by Linux's ftrace feature, so can be visually
-displayed by pytimechart.
+displayed by kernelshark (see kernelshark_) and used with
 'trace-cmd report' (see trace_cmd_).
 It is also possible to produce a flame graph for use with flamegraph.pl
 (see flamegraph_pl_).
 Quick-start using Sandbox
@ -30,7 +34,7 @@ Sandbox is a build of U-Boot that can run under Linux so it is a convenient
 way of trying out tracing before you use it on your actual board. To do
 this, follow these steps:
-Add the following to config/sandbox_defconfig
+Add the following to `config/sandbox_defconfig`:
 .. code-block:: c
@ -75,7 +79,7 @@ a trace:
             16 maximum observed call depth
             15 call depth limit
          1,275,767 calls not traced due to depth
-    =>trace calls 0 e00000
+    =>trace calls 1000000 e00000
    Call list dumped to 00000000, size 0xae0a40
    =>print
    baudrate=115200
@ -87,7 +91,7 @@ a trace:
    stdout=serial
    Environment size: 117/8188 bytes
-    =>host save host 0 trace 0 ${profoffset}
+    =>host save hostfs - 1000000 trace ${profoffset}
    11405888 bytes written in 10 ms (1.1 GiB/s)
    =>reset
@ -96,18 +100,107 @@ Then run proftool to convert the trace information to ftrace format
 .. code-block:: console
-    $ ./sandbox/tools/proftool -m sandbox/System.map -p trace dump-ftrace >trace.txt
+    $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-ftrace >trace.dat
-Finally run pytimechart to display it
+Finally run kernelshark to display it (note it only works with `.dat` files!):
 .. code-block:: console
-    $ pytimechart trace.txt
+    $ kernelshark trace.dat
-Using this tool you can zoom and pan across the trace, with the function
+Using this tool you can view the trace records and see the timestamp for each
 calls on the left and little marks representing the start and end of each
 function.
 .. image:: pics/kernelshark.png
  :width: 800
  :alt: Kernelshark showing function-trace records
 To see the records on the console, use trace-cmd:
 .. code-block:: console
    $ trace-cmd report trace.dat | less
    cpus=1
          u-boot-1     [000]     3.116364: function:             initf_malloc
          u-boot-1     [000]     3.116375: function:             initf_malloc
          u-boot-1     [000]     3.116386: function:             initf_bootstage
          u-boot-1     [000]     3.116396: function:                bootstage_init
          u-boot-1     [000]     3.116408: function:                   malloc
          u-boot-1     [000]     3.116418: function:                      malloc_simple
          u-boot-1     [000]     3.116429: function:                         alloc_simple
          u-boot-1     [000]     3.116441: function:                         alloc_simple
          u-boot-1     [000]     3.116449: function:                      malloc_simple
          u-boot-1     [000]     3.116457: function:                   malloc
 Note that `pytimechart` is obsolete so cannot be used anymore.
 There is a -f option available to select a function graph:
 .. code-block:: console
    $ ./sandbox/tools/proftool -m sandbox/System.map -t trace -f funcgraph dump-ftrace >trace.dat
 Again, you can use kernelshark or trace-cmd to look at the output. In this case
 you will see the time taken by each function shown against its exit record.
 .. image:: pics/kernelshark_fg.png
  :width: 800
  :alt: Kernelshark showing function-graph records
 .. code-block:: console
    $ trace-cmd report trace.dat | less
    cpus=1
              u-boot-1     [000]     3.116364: funcgraph_entry:        0.011 us   |    initf_malloc();
              u-boot-1     [000]     3.116386: funcgraph_entry:                   |    initf_bootstage() {
              u-boot-1     [000]     3.116396: funcgraph_entry:                   |      bootstage_init() {
              u-boot-1     [000]     3.116408: funcgraph_entry:                   |        malloc() {
              u-boot-1     [000]     3.116418: funcgraph_entry:                   |          malloc_simple() {
              u-boot-1     [000]     3.116429: funcgraph_entry:        0.012 us   |            alloc_simple();
              u-boot-1     [000]     3.116449: funcgraph_exit:         0.031 us   |            }
              u-boot-1     [000]     3.116457: funcgraph_exit:         0.049 us   |          }
              u-boot-1     [000]     3.116466: funcgraph_entry:        0.063 us   |        memset();
              u-boot-1     [000]     3.116539: funcgraph_exit:         0.143 us   |        }
 Flame graph
 -----------
 Some simple flame graph options are available as well, using the dump-flamegraph
 command:
 .. code-block:: console
    $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-flamegraph >trace.fg
    $ flamegraph.pl trace.fg >trace.svg
 You can load the .svg file into a viewer. If you use Chrome (and some other
 programs) you can click around and zoom in and out.
 .. image:: pics/flamegraph.png
  :width: 800
  :alt: Chrome showing the flamegraph.pl output
 .. image:: pics/flamegraph_zoom.png
  :width: 800
  :alt: Chrome showing zooming into the flamegraph.pl output
 A timing variant is also available, which gives an idea of how much time is
 spend in each call stack:
 .. code-block:: console
    $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-flamegraph -f timing >trace.fg
    $ flamegraph.pl trace.fg >trace.svg
 Note that trace collection does slow down execution so the timings will be
 inflated. They should be used to guide optimisation. For accurate boot timings,
 use bootstage.
 .. image:: pics/flamegraph_timing.png
  :width: 800
  :alt: Chrome showing flamegraph.pl output with timing
 CONFIG Options
 --------------
@ -138,6 +231,11 @@ CONFIG_TRACE_EARLY_SIZE
 CONFIG_TRACE_EARLY_ADDR
    Address of early trace buffer
 CONFIG_TRACE_CALL_DEPTH_LIMIT
    Sets the limit on trace call-depth. For a broad view, 10 is typically
    sufficient. Setting this too large creates enormous traces and distorts
    the overall timing considerable.
 Building U-Boot with Tracing Enabled
 ------------------------------------
@ -148,6 +246,26 @@ instrumenting from the command line instead of having to change board
 config files.
 Board requirements
 ------------------
 Trace data collection relies on a microsecond timer, accessed through
 `timer_get_us()`. So the first thing you should do is make sure that
 this produces sensible results for your board. Suitable sources for
 this timer include high resolution timers, PWMs or profile timers if
 available. Most modern SOCs have a suitable timer for this.
 See `add_ftrace()` for where `timer_get_us()` is called. The `notrace`
 attribute must be used on each function called by `timer_get_us()` since
 recursive calls to `add_ftrace()` will cause a fault::
   trace: recursion detected, disabling
 You cannot use driver model to obtain the microsecond timer, since tracing
 may be enabled before driver model is set up. Instead, provide a low-level
 function which accesses the timer, setting it up if needed.
 Collecting Trace Data
 ---------------------
@ -155,21 +273,22 @@ When you run U-Boot on your board it will collect trace data up to the
 limit of the trace buffer size you have specified. Once that is exhausted
 no more data will be collected.
-Collecting trace data has an affect on execution time/performance. You
+Collecting trace data affects execution time and performance. You
 will notice this particularly with trivial functions - the overhead of
 recording their execution may even exceed their normal execution time.
 In practice this doesn't matter much so long as you are aware of the
 effect. Once you have done your optimizations, turn off tracing before
-doing end-to-end timing.
+doing end-to-end timing using bootstage.
 The best time to start tracing is right at the beginning of U-Boot. The
 best time to stop tracing is right at the end. In practice it is hard
 to achieve these ideals.
-This implementation enables tracing early in board_init_f(). This means
+This implementation enables tracing early in `board_init_r()`, or
 `board_init_f()` when `TRACE_EARLY` is enabled. This means
 that it captures most of the board init process, missing only the
 early architecture-specific init. However, it also misses the entire
-SPL stage if there is one.
+SPL stage if there is one. At present tracing is not supported in SPL.
 U-Boot typically ends with a 'bootm' command which loads and runs an
 OS. There is useful trace data in the execution of that bootm
@ -179,39 +298,11 @@ the OS. In practical terms, U-Boot runs the 'fakegocmd' environment
 variable at this point. This variable should have a short script which
 collects the trace data and writes it somewhere.
-Trace data collection relies on a microsecond timer, accessed through
+Controlling the trace
-timer_get_us(). So the first think you should do is make sure that
+---------------------
 this produces sensible results for your board. Suitable sources for
 this timer include high resolution timers, PWMs or profile timers if
 available. Most modern SOCs have a suitable timer for this. Make sure
 that you mark this timer (and anything it calls) with
 notrace so that the trace library can
 use it without causing an infinite loop.
-
+U-Boot provides a command-line interface to the trace system for controlling
-Commands
+tracing and accessing the trace data. See :doc:`../usage/cmd/trace`.
 --------
 The trace command has variable sub-commands:
 stats
    Display tracing statistics
 pause
    Pause tracing
 resume
    Resume tracing
 funclist [<addr> <size>]
    Dump a list of functions into the buffer
 calls  [<addr> <size>]
    Dump function call trace into buffer
 If the address and size are not given, these are obtained from environment
 variables (see below). In any case the environment variables are updated
 after the command runs.
 Environment Variables
@ -264,39 +355,94 @@ a trace log to address 10000000 and sends it to a host machine using
 TFTP. After this, U-Boot will boot the OS normally, albeit a little
 later.
 For a filesystem you may do something like::
-Converting Trace Output Data
+    trace calls 10000000 1000000;
----------------------------
+    save mmc 1:1 10000000 /trace ${profoffset}
 The trace buffer format is internal to the trace system. It consists of a
 header, a call count for each function site, followed by a list of trace
 records, once for each function call.
 Converting Trace Output Data (proftool)
 ---------------------------------------
 The trace output data is kept in a binary format which is not documented
-here. To convert it into something useful, you can use proftool.
+here. See the `trace.h` header file if you are interested. To convert it into
 something useful, you can use proftool.
 This tool must be given the U-Boot map file and the trace data received
-from running that U-Boot. It produces a text output file.
+from running that U-Boot. It produces a binary output file.
-Options
+It is also possible to provide a configuration file to indicate which functions
 should be included or dropped during conversion. This file consists of lines
 like::
   include-func <regex>
   exclude-func <regex>
 where <regex> is a regular expression matched against function names. It
 allows some functions to be dropped from the trace when producing ftrace
 records.
 Options:
 -c <config_file>
    Specify the optional configuration file, to control which functions are
    included in the output.
 -f <format>
    Specifies the format to use (see below)
 -m <map_file>
-    Specify U-Boot map file
+    Specify U-Boot map file (`System.map`)
-p <trace_file>
+-o <output file>
-    Specify profile/trace file
+    Specify the output filename
 -t <trace_file>
    Specify trace file, the data saved from U-Boot
 -v <0-4>
    Specify the verbosity, where 0 is the minimum and 4 is for debugging.
 Commands:
-dump-ftrace
+dump-ftrace:
-    Write a text dump of the file in Linux ftrace format to stdout
+    Write a binary dump of the file in Linux ftrace format. Two options are
    available:
    function
        write function-call records (caller/callee)
    funcgraph
        write function entry/exit records (graph)
    This format can be used with kernelshark_ and trace_cmd_.
 dump-flamegraph
    Write a list of stack records useful for producing a flame graph. Two
    options are available:
    calls
        create a flamegraph of stack frames
    timing
        create a flamegraph of microseconds for each stack frame
    This format can be used with flamegraph_pl_.
 Viewing the Trace Data
 ----------------------
-You can use pytimechart for this (sudo apt-get pytimechart might work on
+You can use kernelshark_ for a GUI, but note that version 2.0.x was broken. If
-your Debian-style machine, and use your favourite search engine to obtain
+you have that version you could try building it from source.
 documentation). It expects the file to have a .txt extension. The program
 has terse user interface but is very convenient for viewing U-Boot
 profile information.
 The file must have a .dat extension or it is ignored. The program has terse
 user interface but is very convenient for viewing U-Boot profile information.
 Also available is trace_cmd_ which provides a command-line interface.
 Workflow Suggestions
 --------------------
@ -329,7 +475,9 @@ There are a few parameters in the code that you may want to consider.
 There is a function call depth limit (set to 15 by default). When the
 stack depth goes above this then no tracing information is recorded.
 The maximum depth reached is recorded and displayed by the 'trace stats'
-command.
+command. While it might be tempting to set the depth limit quite high, this
 can dramatically increase the size of the trace output as well as the execution
 time.
 Future Work
@ -346,5 +494,10 @@ Some other features that might be useful:
 - Compression of trace information
-Simon Glass <sjg@chromium.org>
+.. sectionauthor:: Simon Glass <sjg@chromium.org>
-April 2013
+.. April 2013
 .. Updated January 2023
 .. _kernelshark: https://kernelshark.org/
 .. _trace_cmd: https://www.trace-cmd.org/
 .. _flamegraph_pl: https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl
--- a/doc/usage/cmd/trace.rst
+++ b/doc/usage/cmd/trace.rst
@ -0,0 +1,163 @@
 .. SPDX-License-Identifier: GPL-2.0+:
 trace command
 =============
 Synopis
 -------
 ::
    trace stats
    trace pause
    trace resume
    trace funclist [<addr> <size>]
    trace calls [<addr> <size>]
 Description
 -----------
 The *trace* command is used to control the U-Boot tracing system. It allows
 tracing to be paused and resumed, shows statistics for traces and provides a
 way to dump out the trace information.
 trace stats
 ~~~~~~~~~~~
 This display tracing statistics, as follows:
 function sites
    Functions are binned as a way of reducing the amount of space needed to
    hold all the function information. This is controlled by FUNC_SITE_SIZE in
    the trace.h header file. The usual value is 4, which provides the finest
    granularity (assuming a minimum instruction size of 4 bytes) which means
    that every function can be resolved individually.
 function calls
    Total number of function calls, including those which were not traced due
    to buffer space. This count does not include functions which exceeded the
    depth limit.
 untracked function calls
    Total number of function calls which did not appear in the U-Boot image.
    This can happen if a function is called outside the normal code area.
 traced function calls
    Total number of function calls which were actually traced, i.e. are included
    in the recorded trace data.
 dropped due to overflow
    If the trace buffer was exhausted then this shows the number of records that
    were dropped. Try reducing the depth limit or expanding the buffer size.
 maximum observed call depth
    Maximum observed call depth while tracing.
 calls not traced due to depth
    Counts the number of function calls that were not recorded because they
    exceeded the maximum call depth.
 max function calls
    Maximum number of function calls which can be recorded in the trace buffer,
    given its size. Once `function calls` hits this value, recording stops.
 trace buffer
    Address of trace buffer
 call records
    Address of first trace record. This is near the start of the trace buffer,
    after the function-call counts.
 trace pause
 ~~~~~~~~~~~
 Pauses tracing, so that no more data is added to the trace buffer.
 trace resume
 ~~~~~~~~~~~~
 Resumes tracing, so that new function calls are added to the trace buffer if
 there is sufficient space.
 trace funclist [<addr> <size>]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Dumps a list of functions into the provided buffer. The file uses a format
 specific to U-Boot: a header, following by the function offset and call count.
 If the address and size are not given, these are obtained from
 :ref:`develop/trace:environment variables`. In any case the environment
 variables are updated after the command runs.
 The resulting data should be written out to the host, e.g. using Ethernet or
 a filesystem. There are no tools provided to read this sdata.
 trace calls [<addr> <size>]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Dumps a list of function calls into the provided buffer. The file uses a format
 specific to U-Boot: a header, following by the list of calls. The proftool
 tool can be used to convert this information ready for further analysis.
 Example
 -------
 ::
    => trace stats
            269,252 function sites
         38,025,059 function calls
                  3 untracked function calls
          7,382,690 traced function calls
                 17 maximum observed call depth
                 15 call depth limit
         68,667,432 calls not traced due to depth
         22,190,112 max function calls
    trace buffer 6c000000 call records 6c20de78
    => trace resume
    => trace pause
 This shows that resuming the trace causes the buffer to overflow::
    => trace stats
            269,252 function sites
         49,573,694 function calls
                  3 untracked function calls
         22,190,112 traced function calls (8289848 dropped due to overflow)
                 17 maximum observed call depth
                 15 call depth limit
         68,667,432 calls not traced due to depth
         22,190,112 max function calls
    trace buffer 6c000000 call records 6c20de78
    => trace funcs 30000000 0x100000
    Function trace dumped to 30000000, size 0x1e70
 This shows collecting and writing out the result trace data:
 ::
    => trace calls 20000000 0x10000000
    Call list dumped to 20000000, size 0xfdf21a0
    => save mmc 1:1 20000000 /trace ${profoffset}
    File System is consistent
    file found, deleting
    update journal finished
    File System is consistent
    update journal finished
    266281376 bytes written in 18584 ms (13.7 MiB/s)
 From here you can use proftool to convert it:
 .. code-block:: bash
    tools/proftool -m System.map -t trace -o asc.fg dump-ftrace
 .. _`ACPI specification`: https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
--- a/doc/usage/index.rst
+++ b/doc/usage/index.rst
@ -84,6 +84,7 @@ Shell commands
   cmd/sm
   cmd/temperature
   cmd/tftpput
   cmd/trace
   cmd/true
   cmd/ums
   cmd/ut