coreutils/src/uu/factor/BENCHMARKING.md

# Benchmarking `factor`

<!-- spell-checker:ignore (names) Daniel Lemire * Lemire's ; (misc) nohz -->

The benchmarks for `factor` are located under `tests/benches/factor`
and can be invoked with `cargo bench` in that directory.

They are located outside the `uu_factor` crate, as they do not comply
with the project's minimum supported Rust version, *i.e.* may require
a newer version of `rustc`.

## Microbenchmarking deterministic functions

We currently use [`criterion`] to benchmark deterministic functions,
such as `gcd` and `table::factor`.

However, microbenchmarks are by nature unstable: not only are they specific to
the hardware, operating system version, etc., but they are noisy and affected
by other tasks on the system (browser, compile jobs, etc.), which can cause
`criterion` to report spurious performance improvements and regressions.

This can be mitigated by getting as close to [idealized conditions][lemire]
as possible:

- minimize the amount of computation and I/O running concurrently to the
  benchmark, *i.e.* close your browser and IM clients, don't compile at the
  same time, etc. ;
- ensure the CPU's [frequency stays constant] during the benchmark ;
- [isolate a **physical** core], set it to `nohz_full`, and pin the benchmark
  to it, so it won't be preempted in the middle of a measurement ;
- disable ASLR by running `setarch -R cargo bench`, so we can compare results
  across multiple executions.

[`criterion`]: https://bheisler.github.io/criterion.rs/book/index.html
[lemire]: https://lemire.me/blog/2018/01/16/microbenchmarking-calls-for-idealized-conditions/
[isolate a **physical** core]: https://pyperf.readthedocs.io/en/latest/system.html#isolate-cpus-on-linux
[frequency stays constant]: ... <!-- ToDO -->

### Guidance for designing microbenchmarks

*Note:* this guidance is specific to `factor` and takes its application domain
into account; do not expect it to generalize to other projects.  It is based
on Daniel Lemire's [*Microbenchmarking calls for idealized conditions*][lemire],
which I recommend reading if you want to add benchmarks to `factor`.

1. Select a small, self-contained, deterministic component
   (`gcd` and `table::factor` are good examples):

   - no I/O or access to external data structures ;
   - no call into other components ;
   - behavior is deterministic: no RNG, no concurrency, ... ;
   - the test's body is *fast* (~100ns for `gcd`, ~10µs for `factor::table`),
     so each sample takes a very short time, minimizing variability and
     maximizing the numbers of samples we can take in a given time.

2. Benchmarks are immutable (once merged in `uutils`)

   Modifying a benchmark means previously-collected values cannot meaningfully
   be compared, silently giving nonsensical results.  If you must modify an
   existing benchmark, rename it.

3. Test common cases

   We are interested in overall performance, rather than specific edge-cases;
   use **reproducibly-randomized inputs**, sampling from either all possible
   input values or some subset of interest.

4. Use [`criterion`], `criterion::black_box`, ...

   `criterion` isn't perfect, but it is also much better than ad-hoc
   solutions in each benchmark.

## Wishlist

### Configurable statistical estimators

`criterion` always uses the arithmetic average as estimator; in microbenchmarks,
where the code under test is fully deterministic and the measurements are
subject to additive, positive noise, [the minimum is more appropriate][lemire].

### CI & reproducible performance testing

Measuring performance on real hardware is important, as it relates directly
to what users of `factor` experience; however, such measurements are subject
to the constraints of the real-world, and aren't perfectly reproducible.
Moreover, the mitigation for it (described above) isn't achievable in
virtualized, multi-tenant environments such as CI.

Instead, we could run the microbenchmarks in a simulated CPU with [`cachegrind`],
measure execution “time” in that model (in CI), and use it to detect and report
performance improvements and regressions.

[`iai`] is an implementation of this idea for Rust.

[`cachegrind`]: https://www.valgrind.org/docs/manual/cg-manual.html
[`iai`]: https://bheisler.github.io/criterion.rs/book/iai/iai.html

### Comparing randomized implementations across multiple inputs

`factor` is a challenging target for system benchmarks as it combines two
characteristics:

1. integer factoring algorithms are randomized, with large variance in
   execution time ;

2. various inputs also have large differences in factoring time, that
   corresponds to no natural, linear ordering of the inputs.

If (1) was untrue (i.e. if execution time wasn't random), we could faithfully
compare 2 implementations (2 successive versions, or `uutils` and GNU) using
a scatter plot, where each axis corresponds to the perf. of one implementation.

Similarly, without (2) we could plot numbers on the X axis and their factoring
time on the Y axis, using multiple lines for various quantiles.  The large
differences in factoring times for successive numbers, mean that such a plot
would be unreadable.
factor: Add BENCHMARKING.md 2021-05-03 10:26:05 +00:00			# Benchmarking `factor`

refactor/factor ~ polish spelling (comments, names, and exceptions) 2021-05-30 05:09:01 +00:00			`<!-- spell-checker:ignore (names) Daniel Lemire * Lemire's ; (misc) nohz -->`

factor: Move benchmarks out-of-crate 2021-05-17 17:22:56 +00:00			The benchmarks for `factor` are located under `tests/benches/factor`
			and can be invoked with `cargo bench` in that directory.

			They are located outside the `uu_factor` crate, as they do not comply
			`with the project's minimum supported Rust version, i.e. may require`
			a newer version of `rustc`.

factor: Add BENCHMARKING.md 2021-05-03 10:26:05 +00:00			`## Microbenchmarking deterministic functions`

			We currently use [`criterion`] to benchmark deterministic functions,
			such as `gcd` and `table::factor`.

refactor/factor ~ polish spelling (comments, names, and exceptions) 2021-05-30 05:09:01 +00:00			`However, microbenchmarks are by nature unstable: not only are they specific to`
factor::benchmarking(doc): Add guidance on running µbenches 2021-05-03 12:45:00 +00:00			`the hardware, operating system version, etc., but they are noisy and affected`
			`by other tasks on the system (browser, compile jobs, etc.), which can cause`
			`criterion` to report spurious performance improvements and regressions.

docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			`This can be mitigated by getting as close to [idealized conditions][lemire]`
factor::benchmarking(doc): Add guidance on running µbenches 2021-05-03 12:45:00 +00:00			`as possible:`
docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00
factor::benchmarking(doc): Add guidance on running µbenches 2021-05-03 12:45:00 +00:00			`- minimize the amount of computation and I/O running concurrently to the`
			`benchmark, i.e. close your browser and IM clients, don't compile at the`
			`same time, etc. ;`
			`- ensure the CPU's [frequency stays constant] during the benchmark ;`
			- [isolate a physical core], set it to `nohz_full`, and pin the benchmark
			`to it, so it won't be preempted in the middle of a measurement ;`
			- disable ASLR by running `setarch -R cargo bench`, so we can compare results
docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			`across multiple executions.`
factor::benchmarking(doc): Add guidance on running µbenches 2021-05-03 12:45:00 +00:00
factor: Add BENCHMARKING.md 2021-05-03 10:26:05 +00:00			[`criterion`]: https://bheisler.github.io/criterion.rs/book/index.html
factor::benchmarking(doc): Add guidance on running µbenches 2021-05-03 12:45:00 +00:00			`[lemire]: https://lemire.me/blog/2018/01/16/microbenchmarking-calls-for-idealized-conditions/`
			`[isolate a physical core]: https://pyperf.readthedocs.io/en/latest/system.html#isolate-cpus-on-linux`
refactor/factor ~ polish spelling (comments, names, and exceptions) 2021-05-30 05:09:01 +00:00			`[frequency stays constant]: ... <!-- ToDO -->`
factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00
refactor/factor ~ polish spelling (comments, names, and exceptions) 2021-05-30 05:09:01 +00:00			`### Guidance for designing microbenchmarks`
factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00
			Note: this guidance is specific to `factor` and takes its application domain
refactor/factor ~ polish spelling (comments, names, and exceptions) 2021-05-30 05:09:01 +00:00			`into account; do not expect it to generalize to other projects. It is based`
factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00			`on Daniel Lemire's [Microbenchmarking calls for idealized conditions][lemire],`
			which I recommend reading if you want to add benchmarks to `factor`.

docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			`1. Select a small, self-contained, deterministic component`
docs/factor ~ (BENCHMARKING.md) fix formatting, returning missing newlines 2021-11-13 16:58:43 +00:00			(`gcd` and `table::factor` are good examples):

factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00			`- no I/O or access to external data structures ;`
			`- no call into other components ;`
docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			`- behavior is deterministic: no RNG, no concurrency, ... ;`
factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00			- the test's body is fast (~100ns for `gcd`, ~10µs for `factor::table`),
			`so each sample takes a very short time, minimizing variability and`
			`maximizing the numbers of samples we can take in a given time.`

docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			2. Benchmarks are immutable (once merged in `uutils`)
docs/factor ~ (BENCHMARKING.md) fix formatting, returning missing newlines 2021-11-13 16:58:43 +00:00
factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00			`Modifying a benchmark means previously-collected values cannot meaningfully`
			`be compared, silently giving nonsensical results. If you must modify an`
			`existing benchmark, rename it.`

docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			`3. Test common cases`
docs/factor ~ (BENCHMARKING.md) fix formatting, returning missing newlines 2021-11-13 16:58:43 +00:00
factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00			`We are interested in overall performance, rather than specific edge-cases;`
docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			`use reproducibly-randomized inputs, sampling from either all possible`
factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00			`input values or some subset of interest.`

docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			4. Use [`criterion`], `criterion::black_box`, ...
docs/factor ~ (BENCHMARKING.md) fix formatting, returning missing newlines 2021-11-13 16:58:43 +00:00
factor::benchmarking(doc): Add guidance on writing µbenches 2021-05-03 12:45:24 +00:00			`criterion` isn't perfect, but it is also much better than ad-hoc
			`solutions in each benchmark.`
factor::benchmarking: Add wishlist / planned work 2021-05-03 13:04:06 +00:00
			`## Wishlist`

			`### Configurable statistical estimators`

refactor/factor ~ polish spelling (comments, names, and exceptions) 2021-05-30 05:09:01 +00:00			`criterion` always uses the arithmetic average as estimator; in microbenchmarks,
factor::benchmarking: Add wishlist / planned work 2021-05-03 13:04:06 +00:00			`where the code under test is fully deterministic and the measurements are`
			`subject to additive, positive noise, [the minimum is more appropriate][lemire].`

			`### CI & reproducible performance testing`

			`Measuring performance on real hardware is important, as it relates directly`
			to what users of `factor` experience; however, such measurements are subject
			`to the constraints of the real-world, and aren't perfectly reproducible.`
refactor/factor ~ polish spelling (comments, names, and exceptions) 2021-05-30 05:09:01 +00:00			`Moreover, the mitigation for it (described above) isn't achievable in`
factor::benchmarking: Add wishlist / planned work 2021-05-03 13:04:06 +00:00			`virtualized, multi-tenant environments such as CI.`

refactor/factor ~ polish spelling (comments, names, and exceptions) 2021-05-30 05:09:01 +00:00			Instead, we could run the microbenchmarks in a simulated CPU with [`cachegrind`],
factor::benchmarking: Add wishlist / planned work 2021-05-03 13:04:06 +00:00			`measure execution “time” in that model (in CI), and use it to detect and report`
			`performance improvements and regressions.`

			[`iai`] is an implementation of this idea for Rust.

			[`cachegrind`]: https://www.valgrind.org/docs/manual/cg-manual.html
			[`iai`]: https://bheisler.github.io/criterion.rs/book/iai/iai.html

docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			`### Comparing randomized implementations across multiple inputs`
factor::benchmarking: Add wishlist / planned work 2021-05-03 13:04:06 +00:00
			`factor` is a challenging target for system benchmarks as it combines two
			`characteristics:`

docs/factor ~ fix markdownlint complaints 2021-05-30 05:23:12 +00:00			`1. integer factoring algorithms are randomized, with large variance in`
factor::benchmarking: Add wishlist / planned work 2021-05-03 13:04:06 +00:00			`execution time ;`

			`2. various inputs also have large differences in factoring time, that`
			`corresponds to no natural, linear ordering of the inputs.`

			`If (1) was untrue (i.e. if execution time wasn't random), we could faithfully`
			compare 2 implementations (2 successive versions, or `uutils` and GNU) using
			`a scatter plot, where each axis corresponds to the perf. of one implementation.`

			`Similarly, without (2) we could plot numbers on the X axis and their factoring`
			`time on the Y axis, using multiple lines for various quantiles. The large`
			`differences in factoring times for successive numbers, mean that such a plot`
			`would be unreadable.`