diff --git a/src/uu/shuf/BENCHMARKING.md b/src/uu/shuf/BENCHMARKING.md index 7607f04b4..cf5ee40e1 100644 --- a/src/uu/shuf/BENCHMARKING.md +++ b/src/uu/shuf/BENCHMARKING.md @@ -4,23 +4,46 @@ benchmark: with and without repetition. When benchmarking changes, make sure to always build with the `--release` flag. -You can compare with another branch by compiling on that branch and than +You can compare with another branch by compiling on that branch and then renaming the executable from `shuf` to `shuf.old`. +## Generate sample data + +Sample input can be generated using `/dev/random`: + +```shell +cat /dev/random | base64 | fold | head -n 50000000 > input.txt +``` + +To avoid distortions from IO, it is recommended to store input data in tmpfs. + ## Without repetition -By default, `shuf` samples without repetition. To benchmark only the -randomization and not IO, we can pass the `-i` flag with a range of numbers to -randomly sample from. An example of a command that works well for testing: +By default, `shuf` samples without repetition. + +To benchmark only the randomization and not IO, we can pass the `-i` flag with +a range of numbers to randomly sample from. An example of a command that works +well for testing: ```shell hyperfine --warmup 10 "target/release/shuf -i 0-10000000" ``` +To measure the time taken by shuffling an input file, the following command can +be used:: + +```shell +hyperfine --warmup 10 "target/release/shuf input.txt > /dev/null" +``` + +It is important to discard the output by redirecting it to `/dev/null`, since +otherwise, a substantial amount of time is added to write the output to the +filesystem. + ## With repetition When repetition is allowed, `shuf` works very differently under the hood, so it -should be benchmarked separately. In this case we have to pass the `-n` flag or +should be benchmarked separately. In this case, we have to pass the `-n` flag or the command will run forever. An example of a hyperfine command is ```shell