author Sylvestre Ledru <sylvestre@debian.org> 1677865358 +0100
committer Sylvestre Ledru <sylvestre@debian.org> 1677951797 +0100

md: Fix a bunch of warnings in the docs
This commit is contained in:
Sylvestre Ledru 2023-03-03 18:42:38 +01:00
parent 9d5dc500e6
commit 422a27d375
42 changed files with 470 additions and 364 deletions

View file

@ -116,7 +116,7 @@ the community.
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
<https://www.contributor-covenant.org/version/2/0/code_of_conduct.html>.
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
@ -124,5 +124,5 @@ enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.
<https://www.contributor-covenant.org/faq>. Translations are available at
<https://www.contributor-covenant.org/translations>.

View file

@ -38,20 +38,19 @@ search the issues to make sure no one else is working on it.
## Platforms
We take pride in supporting many operating systems and architectures.
We take pride in supporting many operating systems and architectures.
**Tip:**
For Windows, Microsoft provides some images (VMWare, Hyper-V, VirtualBox and Parallels)
For Windows, Microsoft provides some images (VMWare, Hyper-V, VirtualBox and Parallels)
for development:
https://developer.microsoft.com/windows/downloads/virtual-machines/
<https://developer.microsoft.com/windows/downloads/virtual-machines/>
## Commit messages
To help the project maintainers review pull requests from contributors across
numerous utilities, the team has settled on conventions for commit messages.
From http://git-scm.com/book/ch5-2.html:
From <http://git-scm.com/book/ch5-2.html>:
```
Short (50 chars or less) summary of changes

View file

@ -1,21 +1,19 @@
Documentation
-------------
# Documentation
The source of the documentation is available on:
https://uutils.github.io/dev/coreutils/
<https://uutils.github.io/dev/coreutils/>
The documentation is updated everyday on this repository:
https://github.com/uutils/uutils.github.io/
<https://github.com/uutils/uutils.github.io/>
Running GNU tests
-----------------
## Running GNU tests
<!-- spell-checker:ignore gnulib -->
- Check out https://github.com/coreutils/coreutils next to your fork as gnu
- Check out https://github.com/coreutils/gnulib next to your fork as gnulib
- Check out <https://github.com/coreutils/coreutils> next to your fork as gnu
- Check out <https://github.com/coreutils/gnulib> next to your fork as gnulib
- Rename the checkout of your fork to uutils
At the end you should have uutils, gnu and gnulib checked out next to each other.
@ -23,9 +21,7 @@ At the end you should have uutils, gnu and gnulib checked out next to each other
- Run `cd uutils && ./util/build-gnu.sh && cd ..` to get everything ready (this may take a while)
- Finally, you can run tests with `bash uutils/util/run-gnu-test.sh <tests>`. Instead of `<tests>` insert the tests you want to run, e.g. `tests/misc/wc-proc.sh`.
Code Coverage Report Generation
---------------------------------
## Code Coverage Report Generation
<!-- spell-checker:ignore (flags) Ccodegen Coverflow Cpanic Zinstrument Zpanic -->
@ -36,13 +32,13 @@ Code coverage report can be generated using [grcov](https://github.com/mozilla/g
To generate [gcov-based](https://github.com/mozilla/grcov#example-how-to-generate-gcda-files-for-a-rust-project) coverage report
```bash
$ export CARGO_INCREMENTAL=0
$ export RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Copt-level=0 -Clink-dead-code -Coverflow-checks=off -Zpanic_abort_tests -Cpanic=abort"
$ export RUSTDOCFLAGS="-Cpanic=abort"
$ cargo build <options...> # e.g., --features feat_os_unix
$ cargo test <options...> # e.g., --features feat_os_unix test_pathchk
$ grcov . -s . --binary-path ./target/debug/ -t html --branch --ignore-not-existing --ignore build.rs --excl-br-line "^\s*((debug_)?assert(_eq|_ne)?\#\[derive\()" -o ./target/debug/coverage/
$ # open target/debug/coverage/index.html in browser
export CARGO_INCREMENTAL=0
export RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Copt-level=0 -Clink-dead-code -Coverflow-checks=off -Zpanic_abort_tests -Cpanic=abort"
export RUSTDOCFLAGS="-Cpanic=abort"
cargo build <options...> # e.g., --features feat_os_unix
cargo test <options...> # e.g., --features feat_os_unix test_pathchk
grcov . -s . --binary-path ./target/debug/ -t html --branch --ignore-not-existing --ignore build.rs --excl-br-line "^\s*((debug_)?assert(_eq|_ne)?\#\[derive\()" -o ./target/debug/coverage/
# open target/debug/coverage/index.html in browser
```
if changes are not reflected in the report then run `cargo clean` and run the above commands.
@ -52,19 +48,17 @@ if changes are not reflected in the report then run `cargo clean` and run the ab
If you are using stable version of Rust that doesn't enable code coverage instrumentation by default
then add `-Z-Zinstrument-coverage` flag to `RUSTFLAGS` env variable specified above.
pre-commit hooks
----------------
## pre-commit hooks
A configuration for `pre-commit` is provided in the repository. It allows automatically checking every git commit you make to ensure it compiles, and passes `clippy` and `rustfmt` without warnings.
To use the provided hook:
1. [Install `pre-commit`](https://pre-commit.com/#install)
2. Run `pre-commit install` while in the repository directory
1. Run `pre-commit install` while in the repository directory
Your git commits will then automatically be checked. If a check fails, an error message will explain why, and your commit will be canceled. You can then make the suggested changes, and run `git commit ...` again.
### Using Clippy
## Using Clippy
The `msrv` key in the clippy configuration file `clippy.toml` is used to disable lints pertaining to newer features by specifying the minimum supported Rust version (MSRV). However, this key is only supported on `nightly`. To invoke clippy without errors, use `cargo +nightly clippy`. In order to also check tests and non-default crate features, use `cargo +nightly clippy --all-targets --all-features`.

118
README.md
View file

@ -21,11 +21,12 @@ or different behavior might be experienced.
To install it:
```
$ cargo install coreutils
$ ~/.cargo/bin/coreutils
```bash
cargo install coreutils
~/.cargo/bin/coreutils
```
<!-- markdownlint-disable-next-line MD026 -->
## Why?
uutils aims to work on as many platforms as possible, to be able to use the
@ -35,6 +36,7 @@ chosen not only because it is fast and safe, but is also excellent for
writing cross-platform code.
## Documentation
uutils has both user and developer documentation available:
- [User Manual](https://uutils.github.io/user/)
@ -46,8 +48,8 @@ Both can also be generated locally, the instructions for that can be found in th
<!-- ANCHOR: build (this mark is needed for mdbook) -->
## Requirements
* Rust (`cargo`, `rustc`)
* GNU Make (optional)
- Rust (`cargo`, `rustc`)
- GNU Make (optional)
### Rust Version
@ -65,8 +67,8 @@ or GNU Make.
For either method, we first need to fetch the repository:
```bash
$ git clone https://github.com/uutils/coreutils
$ cd coreutils
git clone https://github.com/uutils/coreutils
cd coreutils
```
### Cargo
@ -75,7 +77,7 @@ Building uutils using Cargo is easy because the process is the same as for
every other Rust program:
```bash
$ cargo build --release
cargo build --release
```
This command builds the most portable common core set of uutils into a multicall
@ -86,11 +88,11 @@ expanded sets of uutils for a platform (on that platform) is as simple as
specifying it as a feature:
```bash
$ cargo build --release --features macos
cargo build --release --features macos
# or ...
$ cargo build --release --features windows
cargo build --release --features windows
# or ...
$ cargo build --release --features unix
cargo build --release --features unix
```
If you don't want to build every utility available on your platform into the
@ -98,7 +100,7 @@ final binary, you can also specify which ones you want to build manually.
For example:
```bash
$ cargo build --features "base32 cat echo rm" --no-default-features
cargo build --features "base32 cat echo rm" --no-default-features
```
If you don't want to build the multicall binary and would prefer to build
@ -108,7 +110,7 @@ is contained in its own package within the main repository, named
specific packages (using the `--package` [aka `-p`] option). For example:
```bash
$ cargo build -p uu_base32 -p uu_cat -p uu_echo -p uu_rm
cargo build -p uu_base32 -p uu_cat -p uu_echo -p uu_rm
```
### GNU Make
@ -118,29 +120,29 @@ Building using `make` is a simple process as well.
To simply build all available utilities:
```bash
$ make
make
```
To build all but a few of the available utilities:
```bash
$ make SKIP_UTILS='UTILITY_1 UTILITY_2'
make SKIP_UTILS='UTILITY_1 UTILITY_2'
```
To build only a few of the available utilities:
```bash
$ make UTILS='UTILITY_1 UTILITY_2'
make UTILS='UTILITY_1 UTILITY_2'
```
## Installation
### Cargo
### Install with Cargo
Likewise, installing can simply be done using:
```bash
$ cargo install --path .
cargo install --path .
```
This command will install uutils into Cargo's *bin* folder (*e.g.* `$HOME/.cargo/bin`).
@ -148,49 +150,49 @@ This command will install uutils into Cargo's *bin* folder (*e.g.* `$HOME/.cargo
This does not install files necessary for shell completion. For shell completion to work,
use `GNU Make` or see `Manually install shell completions`.
### GNU Make
### Install with GNU Make
To install all available utilities:
```bash
$ make install
make install
```
To install using `sudo` switch `-E` must be used:
```bash
$ sudo -E make install
sudo -E make install
```
To install all but a few of the available utilities:
```bash
$ make SKIP_UTILS='UTILITY_1 UTILITY_2' install
make SKIP_UTILS='UTILITY_1 UTILITY_2' install
```
To install only a few of the available utilities:
```bash
$ make UTILS='UTILITY_1 UTILITY_2' install
make UTILS='UTILITY_1 UTILITY_2' install
```
To install every program with a prefix (e.g. uu-echo uu-cat):
```bash
$ make PROG_PREFIX=PREFIX_GOES_HERE install
make PROG_PREFIX=PREFIX_GOES_HERE install
```
To install the multicall binary:
```bash
$ make MULTICALL=y install
make MULTICALL=y install
```
Set install parent directory (default value is /usr/local):
```bash
# DESTDIR is also supported
$ make PREFIX=/my/path install
make PREFIX=/my/path install
```
Installing with `make` installs shell completions for all installed utilities
@ -203,6 +205,7 @@ The `coreutils` binary can generate completions for the `bash`, `elvish`, `fish`
and `zsh` shells. It prints the result to stdout.
The syntax is:
```bash
cargo run completion <utility> <shell>
```
@ -220,106 +223,107 @@ Un-installation differs depending on how you have installed uutils. If you used
Cargo to install, use Cargo to uninstall. If you used GNU Make to install, use
Make to uninstall.
### Cargo
### Uninstall with Cargo
To uninstall uutils:
```bash
$ cargo uninstall uutils
cargo uninstall uutils
```
### GNU Make
### Uninstall with GNU Make
To uninstall all utilities:
```bash
$ make uninstall
make uninstall
```
To uninstall every program with a set prefix:
```bash
$ make PROG_PREFIX=PREFIX_GOES_HERE uninstall
make PROG_PREFIX=PREFIX_GOES_HERE uninstall
```
To uninstall the multicall binary:
```bash
$ make MULTICALL=y uninstall
make MULTICALL=y uninstall
```
To uninstall from a custom parent directory:
```bash
# DESTDIR is also supported
$ make PREFIX=/my/path uninstall
make PREFIX=/my/path uninstall
```
<!-- ANCHOR_END: build (this mark is needed for mdbook) -->
## Testing
Testing can be done using either Cargo or `make`.
### Cargo
### Testing with Cargo
Just like with building, we follow the standard procedure for testing using
Cargo:
```bash
$ cargo test
cargo test
```
By default, `cargo test` only runs the common programs. To run also platform
specific tests, run:
```bash
$ cargo test --features unix
cargo test --features unix
```
If you would prefer to test a select few utilities:
```bash
$ cargo test --features "chmod mv tail" --no-default-features
cargo test --features "chmod mv tail" --no-default-features
```
If you also want to test the core utilities:
```bash
$ cargo test -p uucore -p coreutils
cargo test -p uucore -p coreutils
```
To debug:
```bash
$ gdb --args target/debug/coreutils ls
gdb --args target/debug/coreutils ls
(gdb) b ls.rs:79
(gdb) run
```
### GNU Make
### Testing with GNU Make
To simply test all available utilities:
```bash
$ make test
make test
```
To test all but a few of the available utilities:
```bash
$ make SKIP_UTILS='UTILITY_1 UTILITY_2' test
make SKIP_UTILS='UTILITY_1 UTILITY_2' test
```
To test only a few of the available utilities:
```bash
$ make UTILS='UTILITY_1 UTILITY_2' test
make UTILS='UTILITY_1 UTILITY_2' test
```
To include tests for unimplemented behavior:
```bash
$ make UTILS='UTILITY_1 UTILITY_2' SPEC=y test
make UTILS='UTILITY_1 UTILITY_2' SPEC=y test
```
### Run Busybox Tests
@ -330,19 +334,19 @@ requires `make`.
To run busybox tests for all utilities for which busybox has tests
```bash
$ make busytest
make busytest
```
To run busybox tests for a few of the available utilities
```bash
$ make UTILS='UTILITY_1 UTILITY_2' busytest
make UTILS='UTILITY_1 UTILITY_2' busytest
```
To pass an argument like "-v" to the busybox test runtime
```bash
$ make UTILS='UTILITY_1 UTILITY_2' RUNTEST_ARGS='-v' busytest
make UTILS='UTILITY_1 UTILITY_2' RUNTEST_ARGS='-v' busytest
```
### Comparing with GNU
@ -356,14 +360,14 @@ breakdown of the GNU test results of the main branch can be found
To run locally:
```bash
$ bash util/build-gnu.sh
$ bash util/run-gnu-test.sh
bash util/build-gnu.sh
bash util/run-gnu-test.sh
# To run a single test:
$ bash util/run-gnu-test.sh tests/touch/not-owner.sh # for example
bash util/run-gnu-test.sh tests/touch/not-owner.sh # for example
# To run several tests:
$ bash util/run-gnu-test.sh tests/touch/not-owner.sh tests/rm/no-give-up.sh # for example
bash util/run-gnu-test.sh tests/touch/not-owner.sh tests/rm/no-give-up.sh # for example
# If this is a perl (.pl) test, to run in debug:
$ DEBUG=1 bash util/run-gnu-test.sh tests/misc/sm3sum.pl
DEBUG=1 bash util/run-gnu-test.sh tests/misc/sm3sum.pl
```
Note that it relies on individual utilities (not the multicall binary).
@ -387,7 +391,6 @@ To improve the GNU compatibility, the following process is recommended:
1. Start to modify the Rust implementation to match the expected behavior
1. Add a test to make sure that we don't regress (our test suite is super quick)
## Contributing
To contribute to uutils, please see [CONTRIBUTING](CONTRIBUTING.md).
@ -395,11 +398,12 @@ To contribute to uutils, please see [CONTRIBUTING](CONTRIBUTING.md).
## Utilities
Please note that this is not fully accurate:
* Some new options can be added / removed in the GNU implementation;
* Some error management might be missing;
* Some behaviors might be different.
See https://github.com/uutils/coreutils/issues/3336 for the main meta bugs
- Some new options can be added / removed in the GNU implementation;
- Some error management might be missing;
- Some behaviors might be different.
See <https://github.com/uutils/coreutils/issues/3336> for the main meta bugs
(many are missing).
| Done | WIP |

View file

@ -1,3 +1,3 @@
# Build from source
{{#include ../../README.md:build }}
{{#include ../../README.md:build }}

View file

@ -1 +1,3 @@
{{ #include ../../CONTRIBUTING.md }}
<!-- markdownlint-disable MD041 -->
{{ #include ../../CONTRIBUTING.md }}

View file

@ -1,5 +1,9 @@
<!-- markdownlint-disable MD041 -->
{{#include logo.svg}}
<!-- markdownlint-disable MD033 -->
<style>
/* Make the logo a bit bigger and center */
#logo {

View file

@ -11,6 +11,7 @@ You can also [build uutils from source](/build.md).
<!-- toc -->
## Cargo
[![crates.io package](https://repology.org/badge/version-for-repo/crates_io/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
```bash
@ -23,6 +24,7 @@ cargo install coreutils --features windows
```
## Linux
### Alpine
[![Alpine Linux Edge package](https://repology.org/badge/version-for-repo/alpine_edge/uutils-coreutils.svg)](https://pkgs.alpinelinux.org/packages?name=uutils-coreutils)
@ -62,6 +64,7 @@ emerge -pv sys-apps/uutils-coreutils
```
### Manjaro
![Manjaro Stable package](https://repology.org/badge/version-for-repo/manjaro_stable/uutils-coreutils.svg)
[![Manjaro Testing package](https://repology.org/badge/version-for-repo/manjaro_testing/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
[![Manjaro Unstable package](https://repology.org/badge/version-for-repo/manjaro_unstable/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
@ -73,6 +76,7 @@ pamac install uutils-coreutils
```
### NixOS
[![nixpkgs unstable package](https://repology.org/badge/version-for-repo/nix_unstable/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
```bash
@ -80,6 +84,7 @@ nix-env -iA nixos.uutils-coreutils
```
### OpenMandriva Lx
[![openmandriva cooker package](https://repology.org/badge/version-for-repo/openmandriva_cooker/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
```bash
@ -101,6 +106,7 @@ export PATH=/usr/lib/cargo/bin/coreutils:$PATH
## MacOS
### Homebrew
[![Homebrew package](https://repology.org/badge/version-for-repo/homebrew/uutils-coreutils.svg)](https://formulae.brew.sh/formula/uutils-coreutils)
```bash
@ -108,6 +114,7 @@ brew install uutils-coreutils
```
### MacPorts
[![MacPorts package](https://repology.org/badge/version-for-repo/macports/uutils-coreutils.svg)](https://ports.macports.org/port/coreutils-uutils/)
```
@ -115,6 +122,7 @@ port install coreutils-uutils
```
## FreeBSD
[![FreeBSD port](https://repology.org/badge/version-for-repo/freebsd/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
```sh
@ -124,6 +132,7 @@ pkg install uutils
## Windows
### Scoop
[![Scoop package](https://repology.org/badge/version-for-repo/scoop/uutils-coreutils.svg)](https://scoop.sh/#/apps?q=uutils-coreutils&s=0&d=1&o=true)
```bash
@ -136,4 +145,6 @@ scoop install uutils-coreutils
[![AUR package](https://repology.org/badge/version-for-repo/aur/coreutils-hybrid.svg)](https://aur.archlinux.org/packages/coreutils-hybrid)
A GNU coreutils / uutils coreutils hybrid package. Uses stable uutils programs mixed with GNU counterparts if uutils counterpart is unfinished or buggy.
A GNU coreutils / uutils coreutils hybrid package. Uses stable uutils
programs mixed with GNU counterparts if uutils counterpart is
unfinished or buggy.

View file

@ -1,4 +1,5 @@
# Multi-call binary
# Multi-call binary
uutils includes a multi-call binary from which the utils can be invoked. This
reduces the binary size of the binary and can be useful for portability.
@ -12,6 +13,7 @@ coreutils [util] [util options]
The `--help` flag will print a list of available utils.
## Example
```
```shell
coreutils ls -l
```
```

View file

@ -1,5 +1,7 @@
# GNU Test Coverage
<!-- markdownlint-disable MD033 -->
uutils is actively tested against the GNU coreutils test suite. The results
below are automatically updated every day.

View file

@ -4,11 +4,8 @@
arch
```
Display machine architecture
## After Help
Determine architecture name for current machine.

View file

@ -7,8 +7,8 @@ base32 [OPTION]... [FILE]
encode/decode data and print to standard output
With no FILE, or when FILE is -, read standard input.
The data are encoded as described for the base32 alphabet in RFC
4648. When decoding, the input may contain newlines in addition
The data are encoded as described for the base32 alphabet in RFC 4648.
When decoding, the input may contain newlines in addition
to the bytes of the formal base32 alphabet. Use --ignore-garbage
to attempt to recover from any other non-alphabet bytes in the
encoded stream.

View file

@ -7,8 +7,8 @@ base64 [OPTION]... [FILE]
encode/decode data and print to standard output
With no FILE, or when FILE is -, read standard input.
The data are encoded as described for the base64 alphabet in RFC
3548. When decoding, the input may contain newlines in addition
The data are encoded as described for the base64 alphabet in RFC 3548.
When decoding, the input may contain newlines in addition
to the bytes of the formal base64 alphabet. Use --ignore-garbage
to attempt to recover from any other non-alphabet bytes in the
encoded stream.

View file

@ -7,5 +7,5 @@ chcon [OPTION]... [-u USER] [-r ROLE] [-l RANGE] [-t TYPE] FILE...
chcon [OPTION]... --reference=RFILE FILE...
```
Change the SELinux security context of each FILE to CONTEXT.
With --reference, change the security context of each FILE to that of RFILE.
Change the SELinux security context of each FILE to CONTEXT.
With --reference, change the security context of each FILE to that of RFILE.

View file

@ -1,18 +1,18 @@
<!-- markdownlint-disable first-line-heading -->
<!-- spell-checker:ignore (markdown) markdownlint -->
## Feature list
# Feature list
<!-- spell-checker:ignore (options) linkgs reflink -->
### To Do
## To Do
- [ ] cli-symbolic-links
- [ ] context
- [ ] copy-contents
- [ ] sparse
### Completed
## Completed
- [x] archive
- [x] attributes-only

View file

@ -1,46 +1,45 @@
## Benchmarking cut
# Benchmarking cut
### Performance profile
## Performance profile
In normal use cases a significant amount of the total execution time of `cut`
is spent performing I/O. When invoked with the `-f` option (cut fields) some
CPU time is spent on detecting fields (in `Searcher::next`). Other than that
some small amount of CPU time is spent on breaking the input stream into lines.
### How to
## How to
When fixing bugs or adding features you might want to compare
performance before and after your code changes.
- `hyperfine` can be used to accurately measure and compare the total
- `hyperfine` can be used to accurately measure and compare the total
execution time of one or more commands.
```
$ cargo build --release --package uu_cut
```shell
cargo build --release --package uu_cut
$ hyperfine -w3 "./target/release/cut -f2-4,8 -d' ' input.txt" "cut -f2-4,8 -d' ' input.txt"
hyperfine -w3 "./target/release/cut -f2-4,8 -d' ' input.txt" "cut -f2-4,8 -d' ' input.txt"
```
You can put those two commands in a shell script to be sure that you don't
forget to build after making any changes.
When optimizing or fixing performance regressions seeing the number of times a
function is called, and the amount of time it takes can be useful.
- `cargo flamegraph` generates flame graphs from function level metrics it records using `perf` or `dtrace`
- `cargo flamegraph` generates flame graphs from function level metrics it records using `perf` or `dtrace`
```
$ cargo flamegraph --bin cut --package uu_cut -- -f1,3-4 input.txt > /dev/null
```shell
cargo flamegraph --bin cut --package uu_cut -- -f1,3-4 input.txt > /dev/null
```
### What to benchmark
## What to benchmark
There are four different performance paths in `cut` to benchmark.
- Byte ranges `-c`/`--characters` or `-b`/`--bytes` e.g. `cut -c 2,4,6-`
- Byte ranges with output delimiters e.g. `cut -c 4- --output-delimiter=/`
- Fields e.g. `cut -f -4`
- Fields with output delimiters e.g. `cut -f 7-10 --output-delimiter=:`
- Byte ranges `-c`/`--characters` or `-b`/`--bytes` e.g. `cut -c 2,4,6-`
- Byte ranges with output delimiters e.g. `cut -c 4- --output-delimiter=/`
- Fields e.g. `cut -f -4`
- Fields with output delimiters e.g. `cut -f 7-10 --output-delimiter=:`
Choose a test input file with large number of lines so that program startup time does not significantly affect the benchmark.

View file

@ -45,7 +45,7 @@ be roughly equivalent to the total bytes copied (`blocksize` x `count`).
Some useful invocations for testing would be the following:
```
```shell
hyperfine "./target/release/dd bs=4k count=1000000 < /dev/zero > /dev/null"
hyperfine "./target/release/dd bs=1M count=20000 < /dev/zero > /dev/null"
hyperfine "./target/release/dd bs=1G count=10 < /dev/zero > /dev/null"
@ -57,7 +57,7 @@ Typically you would choose a small blocksize for measuring the performance of
typically does some set amount of work per block which only depends on the size
of the block if conversions are used.
As an example, https://github.com/uutils/coreutils/pull/3600 made a change to
As an example, <https://github.com/uutils/coreutils/pull/3600> made a change to
reuse the same buffer between block copies, avoiding the need to reallocate a
new block of memory for each copy. The impact of that change mostly had an
impact on large block size copies because those are the circumstances where the

View file

@ -1,6 +1,7 @@
<!-- spell-checker:ignore convs iseek oseek -->
# dd
<!-- spell-checker:ignore convs iseek oseek -->
```
dd [OPERAND]...
dd OPTION
@ -19,51 +20,53 @@ OPERANDS:
conv=CONVS a comma-separated list of conversion options or
(for legacy reasons) file flags.
count=N stop reading input after N ibs-sized read operations rather
than proceeding until EOF. See iflag=count_bytes if stopping
than proceeding until EOF. See iflag=count_bytes if stopping
after N bytes is preferred
ibs=N the size of buffer used for reads (default: 512)
if=FILE the file used for input. When not specified, stdin is used instead
iflag=FLAGS a comma-separated list of input flags which specify how the input
source is treated. FLAGS may be any of the input-flags or
source is treated. FLAGS may be any of the input-flags or
general-flags specified below.
skip=N (or iseek=N) skip N ibs-sized records into input before beginning
copy/convert operations. See iflag=seek_bytes if seeking N bytes
skip=N (or iseek=N) skip N ibs-sized records into input before beginning
copy/convert operations. See iflag=seek_bytes if seeking N bytes
is preferred.
obs=N the size of buffer used for writes (default: 512)
of=FILE the file used for output. When not specified, stdout is used
of=FILE the file used for output. When not specified, stdout is used
instead
oflag=FLAGS comma separated list of output flags which specify how the output
source is treated. FLAGS may be any of the output flags or
oflag=FLAGS comma separated list of output flags which specify how the output
source is treated. FLAGS may be any of the output flags or
general flags specified below
seek=N (or oseek=N) seeks N obs-sized records into output before
beginning copy/convert operations. See oflag=seek_bytes if
seek=N (or oseek=N) seeks N obs-sized records into output before
beginning copy/convert operations. See oflag=seek_bytes if
seeking N bytes is preferred
status=LEVEL controls whether volume and performance stats are written to
status=LEVEL controls whether volume and performance stats are written to
stderr.
When unspecified, dd will print stats upon completion. An example is below.
When unspecified, dd will print stats upon completion. An
example is below.
6+0 records in
16+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.00057009 s, 14.4 MB/s
The first two lines are the 'volume' stats and the final line is
the 'performance' stats.
The volume stats indicate the number of complete and partial
ibs-sized reads, or obs-sized writes that took place during the
copy. The format of the volume stats is
<complete>+<partial>. If records have been truncated (see
conv=block), the volume stats will contain the number of
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.00057009 s,
14.4 MB/s
The first two lines are the 'volume' stats and the final line
is the 'performance' stats.
The volume stats indicate the number of complete and partial
ibs-sized reads, or obs-sized writes that took place during
the copy. The format of the volume stats is
<complete>+<partial>. If records have been truncated (see
conv=block), the volume stats will contain the number of
truncated records.
Possible LEVEL values are:
progress: Print periodic performance stats as the copy
progress: Print periodic performance stats as the copy
proceeds.
noxfer: Print final volume stats, but not performance stats.
none: Do not print any stats.
Printing performance stats is also triggered by the INFO signal
(where supported), or the USR1 signal. Setting the
POSIXLY_CORRECT environment variable to any value (including an
empty value) will cause the USR1 signal to be ignored.
Printing performance stats is also triggered by the INFO signal
(where supported), or the USR1 signal. Setting the
POSIXLY_CORRECT environment variable to any value (including
an empty value) will cause the USR1 signal to be ignored.
CONVERSION OPTIONS:
@ -71,15 +74,15 @@ CONVERSION OPTIONS:
option. Implies conv=unblock.
ebcdic convert from ASCII to EBCDIC. This is the inverse of the 'ascii'
option. Implies conv=block.
ibm convert from ASCII to EBCDIC, applying the conventions for '[', ']'
ibm convert from ASCII to EBCDIC, applying the conventions for '[', ']'
and '~' specified in POSIX. Implies conv=block.
ucase convert from lower-case to upper-case
lcase converts from upper-case to lower-case.
block for each newline less than the size indicated by cbs=BYTES, remove
the newline and pad with spaces up to cbs. Lines longer than cbs are
truncated.
block for each newline less than the size indicated by cbs=BYTES, remove
the newline and pad with spaces up to cbs. Lines longer than cbs
are truncated.
unblock for each block of input of the size indicated by cbs=BYTES, remove
right-trailing spaces and replace with a newline character.
@ -115,7 +118,7 @@ OUTPUT FLAGS:
GENERAL FLAGS:
direct use direct I/O for data.
directory fail unless the given input (if used as an iflag) or output (if used
directory fail unless the given input (if used as an iflag) or output (if used
as an oflag) is a directory.
dsync use synchronized I/O for data.
sync use synchronized I/O for data and metadata.

View file

@ -1,18 +1,18 @@
## How to update the internal database
# How to update the internal database
Create the test fixtures by writing the output of the GNU dircolors commands to the fixtures folder:
```
$ dircolors --print-database > /PATH_TO_COREUTILS/tests/fixtures/dircolors/internal.expected
$ dircolors --print-ls-colors > /PATH_TO_COREUTILS/tests/fixtures/dircolors/ls_colors.expected
$ dircolors -b > /PATH_TO_COREUTILS/tests/fixtures/dircolors/bash_def.expected
$ dircolors -c > /PATH_TO_COREUTILS/tests/fixtures/dircolors/csh_def.expected
```shell
dircolors --print-database > /PATH_TO_COREUTILS/tests/fixtures/dircolors/internal.expected
dircolors --print-ls-colors > /PATH_TO_COREUTILS/tests/fixtures/dircolors/ls_colors.expected
dircolors -b > /PATH_TO_COREUTILS/tests/fixtures/dircolors/bash_def.expected
dircolors -c > /PATH_TO_COREUTILS/tests/fixtures/dircolors/csh_def.expected
```
Run the tests:
```
$ cargo test --features "dircolors" --no-default-features
```shell
cargo test --features "dircolors" --no-default-features
```
Edit `/PATH_TO_COREUTILS/src/uu/dircolors/src/colors.rs` until the tests pass.

View file

@ -19,6 +19,6 @@ of 1000).
PATTERN allows some advanced exclusions. For example, the following syntaxes
are supported:
? will match only one character
* will match zero or more characters
{a,b} will match a or b
`?` will match only one character
`*` will match zero or more characters
`{a,b}` will match a or b

View file

@ -10,7 +10,7 @@ Print the value of `EXPRESSION` to standard output
## After help
Print the value of `EXPRESSION` to standard output. A blank line below
separates increasing precedence groups.
separates increasing precedence groups.
`EXPRESSION` may be:
@ -48,11 +48,13 @@ Comparisons are arithmetic if both ARGs are numbers, else lexicographical.
Pattern matches return the string matched between \( and \) or null; if
\( and \) are not used, they return the number of characters matched or 0.
Exit status is `0` if `EXPRESSION` is neither null nor `0`, `1` if `EXPRESSION` is null
or `0`, `2` if `EXPRESSION` is syntactically invalid, and `3` if an error occurred.
Exit status is `0` if `EXPRESSION` is neither null nor `0`, `1` if `EXPRESSION`
is null or `0`, `2` if `EXPRESSION` is syntactically invalid, and `3` if an
error occurred.
Environment variables:
- `EXPR_DEBUG_TOKENS=1`: dump expression's tokens
- `EXPR_DEBUG_RPN=1`: dump expression represented in reverse polish notation
- `EXPR_DEBUG_SYA_STEP=1`: dump each parser step
- `EXPR_DEBUG_AST=1`: dump expression represented abstract syntax tree
- `EXPR_DEBUG_TOKENS=1`: dump expression's tokens
- `EXPR_DEBUG_RPN=1`: dump expression represented in reverse polish notation
- `EXPR_DEBUG_SYA_STEP=1`: dump each parser step
- `EXPR_DEBUG_AST=1`: dump expression represented abstract syntax tree

View file

@ -53,19 +53,19 @@ which I recommend reading if you want to add benchmarks to `factor`.
so each sample takes a very short time, minimizing variability and
maximizing the numbers of samples we can take in a given time.
2. Benchmarks are immutable (once merged in `uutils`)
1. Benchmarks are immutable (once merged in `uutils`)
Modifying a benchmark means previously-collected values cannot meaningfully
be compared, silently giving nonsensical results. If you must modify an
existing benchmark, rename it.
3. Test common cases
1. Test common cases
We are interested in overall performance, rather than specific edge-cases;
use **reproducibly-randomized inputs**, sampling from either all possible
input values or some subset of interest.
4. Use [`criterion`], `criterion::black_box`, ...
1. Use [`criterion`], `criterion::black_box`, ...
`criterion` isn't perfect, but it is also much better than ad-hoc
solutions in each benchmark.
@ -103,7 +103,7 @@ characteristics:
1. integer factoring algorithms are randomized, with large variance in
execution time ;
2. various inputs also have large differences in factoring time, that
1. various inputs also have large differences in factoring time, that
corresponds to no natural, linear ordering of the inputs.
If (1) was untrue (i.e. if execution time wasn't random), we could faithfully

View file

@ -1,9 +1,11 @@
## Benchmarking hashsum
# Benchmarking hashsum
### To bench blake2
## To bench blake2
Taken from: https://github.com/uutils/coreutils/pull/2296
Taken from: <https://github.com/uutils/coreutils/pull/2296>
With a large file:
$ hyperfine "./target/release/coreutils hashsum --b2sum large-file" "b2sum large-file"
```shell
hyperfine "./target/release/coreutils hashsum --b2sum large-file" "b2sum large-file"
```

View file

@ -5,23 +5,31 @@ GNU version of `head`, you can use a benchmarking tool like
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
running
sudo apt-get install hyperfine
```shell
sudo apt-get install hyperfine
```
Next, build the `head` binary under the release profile:
cargo build --release -p uu_head
```shell
cargo build --release -p uu_head
```
Now, get a text file to test `head` on. I used the *Complete Works of
William Shakespeare*, which is in the public domain in the United States
and most other parts of the world.
wget -O shakespeare.txt https://www.gutenberg.org/files/100/100-0.txt
```shell
wget -O shakespeare.txt https://www.gutenberg.org/files/100/100-0.txt
```
This particular file has about 170,000 lines, each of which is no longer
than 96 characters:
$ wc -lL shakespeare.txt
170592 96 shakespeare.txt
```shell
$ wc -lL shakespeare.txt
170592 96 shakespeare.txt
```
You could use files of different shapes and sizes to test the
performance of `head` in different situations. For a larger file, you
@ -32,9 +40,11 @@ contains about 130 million lines.
Finally, you can compare the performance of the two versions of `head`
by running, for example,
hyperfine \
"head -n 100000 shakespeare.txt" \
"target/release/head -n 100000 shakespeare.txt"
```shell
hyperfine \
"head -n 100000 shakespeare.txt" \
"target/release/head -n 100000 shakespeare.txt"
```
[0]: https://github.com/sharkdp/hyperfine
[1]: https://www.wikidata.org/wiki/Wikidata:Database_download

View file

@ -17,11 +17,14 @@ A benchmark with `-j` and `-i` shows the following time:
| libc | 25% | I/O and memory allocation. |
More detailed profiles can be obtained via [flame graphs](https://github.com/flamegraph-rs/flamegraph):
```
```shell
cargo flamegraph --bin join --package uu_join -- file1 file2 > /dev/null
```
You may need to add the following lines to the top-level `Cargo.toml` to get full stack traces:
```
```toml
[profile.release]
debug = true
```
@ -34,22 +37,26 @@ in practice many CSV datasets will function well after being sorted.
Like most of the utils, the recommended tool for benchmarking is [hyperfine](https://github.com/sharkdp/hyperfine).
To benchmark your changes:
- checkout the main branch (without your changes), do a `--release` build, and back up the executable produced at `target/release/join`
- checkout your working branch (with your changes), do a `--release` build
- run
```
hyperfine -w 5 "/path/to/main/branch/build/join file1 file2" "/path/to/working/branch/build/join file1 file2"
```
- you'll likely need to add additional options to both commands, such as a field separator, or if you're benchmarking some particular behavior
- you can also optionally benchmark against GNU's join
- checkout the main branch (without your changes), do a `--release` build, and back up the executable produced at `target/release/join`
- checkout your working branch (with your changes), do a `--release` build
- run
```shell
hyperfine -w 5 "/path/to/main/branch/build/join file1 file2" "/path/to/working/branch/build/join file1 file2"
```
- you'll likely need to add additional options to both commands, such as a field separator, or if you're benchmarking some particular behavior
- you can also optionally benchmark against GNU's join
## What to benchmark
The following options can have a non-trivial impact on performance:
- `-a`/`-v` if one of the two files has significantly more lines than the other
- `-j`/`-1`/`-2` cause work to be done to grab the appropriate field
- `-i` adds a call to `to_ascii_lowercase()` that adds some time for allocating and dropping memory for the lowercase key
- `--nocheck-order` causes some calls of `Input::compare` to be skipped
- `-a`/`-v` if one of the two files has significantly more lines than the other
- `-j`/`-1`/`-2` cause work to be done to grab the appropriate field
- `-i` adds a call to `to_ascii_lowercase()` that adds some time for allocating and dropping memory for the lowercase key
- `--nocheck-order` causes some calls of `Input::compare` to be skipped
The content of the files being joined has a very significant impact on the performance.
Things like how long each line is, how many fields there are, how long the key fields are, how many lines there are, how many lines can be joined, and how many lines each line can be joined with all change the behavior of the hotpaths.

View file

@ -9,13 +9,13 @@ Run `cargo build --release` before benchmarking after you make a change!
## Simple recursive ls
- Get a large tree, for example linux kernel source tree.
- Benchmark simple recursive ls with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -R tree > /dev/null"`.
- Get a large tree, for example linux kernel source tree.
- Benchmark simple recursive ls with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -R tree > /dev/null"`.
## Recursive ls with all and long options
- Same tree as above
- Benchmark recursive ls with -al -R options with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/null"`.
- Same tree as above
- Benchmark recursive ls with -al -R options with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/null"`.
## Comparing with GNU ls
@ -29,6 +29,7 @@ Example: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/n
This can also be used to compare with version of ls built before your changes to ensure your change does not regress this.
Here is a `bash` script for doing this comparison:
```bash
#!/bin/bash
cargo build --no-default-features --features ls --release
@ -46,11 +47,13 @@ hyperfine "ls $args" "target/release/coreutils ls $args"
## Cargo Flamegraph
With Cargo Flamegraph you can easily make a flamegraph of `ls`:
```bash
cargo flamegraph --cmd coreutils -- ls [additional parameters]
```
However, if the `-R` option is given, the output becomes pretty much useless due to recursion. We can fix this by merging all the direct recursive calls with `uniq`, below is a `bash` script that does this.
```bash
#!/bin/bash
cargo build --release --no-default-features --features ls

View file

@ -1,6 +1,7 @@
<!-- spell-checker:ignore ugoa -->
# mkdir
<!-- spell-checker:ignore ugoa -->
```
mkdir [OPTION]... [USER]
```

View file

@ -1,6 +1,7 @@
<!-- spell-checker:ignore N'th M'th -->
# numfmt
<!-- spell-checker:ignore N'th M'th -->
```
numfmt [OPTION]... [NUMBER]...
```
@ -10,24 +11,25 @@ Convert numbers from/to human-readable strings
## After Help
`UNIT` options:
- `none`: no auto-scaling is done; suffixes will trigger an error
- `auto`: accept optional single/two letter suffix:
1K = 1000, 1Ki = 1024, 1M = 1000000, 1Mi = 1048576,
- `none`: no auto-scaling is done; suffixes will trigger an error
- `auto`: accept optional single/two letter suffix:
- `si`: accept optional single letter suffix:
1K = 1000, 1Ki = 1024, 1M = 1000000, 1Mi = 1048576,
1K = 1000, 1M = 1000000, ...
- `si`: accept optional single letter suffix:
- `iec`: accept optional single letter suffix:
1K = 1000, 1M = 1000000, ...
1K = 1024, 1M = 1048576, ...
- `iec`: accept optional single letter suffix:
1K = 1024, 1M = 1048576, ...
- `iec-i`: accept optional two-letter suffix:
1Ki = 1024, 1Mi = 1048576, ...
1Ki = 1024, 1Mi = 1048576, ...
`FIELDS` supports `cut(1)` style field ranges:
- `FIELDS` supports `cut(1)` style field ranges:
N N'th field, counted from 1
N- from N'th field, to end of line

View file

@ -4,4 +4,4 @@
realpath [OPTION]... FILE...
```
Print the resolved path
Print the resolved path

View file

@ -5,15 +5,21 @@ GNU version of `seq`, you can use a benchmarking tool like
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
running
sudo apt-get install hyperfine
```shell
sudo apt-get install hyperfine
```
Next, build the `seq` binary under the release profile:
cargo build --release -p uu_seq
```shell
cargo build --release -p uu_seq
```
Finally, you can compare the performance of the two versions of `head`
by running, for example,
hyperfine "seq 1000000" "target/release/seq 1000000"
```shell
hyperfine "seq 1000000" "target/release/seq 1000000"
```
[0]: https://github.com/sharkdp/hyperfine

View file

@ -21,10 +21,10 @@ To avoid distortions from IO, it is recommended to store input data in tmpfs.
## Without repetition
By default, `shuf` samples without repetition.
By default, `shuf` samples without repetition.
To benchmark only the randomization and not IO, we can pass the `-i` flag with
a range of numbers to randomly sample from. An example of a command that works
To benchmark only the randomization and not IO, we can pass the `-i` flag with
a range of numbers to randomly sample from. An example of a command that works
well for testing:
```shell

View file

@ -8,4 +8,4 @@ shuf -i LO-HI [OPTION]...;
Shuffle the input by outputting a random permutation of input lines.
Each output permutation is equally likely.
With no FILE, or when FILE is -, read standard input.
With no FILE, or when FILE is -, read standard input.

View file

@ -13,4 +13,4 @@ Pause for NUMBER seconds. SUFFIX may be 's' for seconds (the default),
'm' for minutes, 'h' for hours or 'd' for days. Unlike most implementations
that require NUMBER be an integer, here NUMBER may be an arbitrary floating
point number. Given two or more arguments, pause for the amount of time
specified by the sum of their values.
specified by the sum of their values.

View file

@ -12,64 +12,59 @@ Run `cargo build --release` before benchmarking after you make a change!
## Sorting a wordlist
- Get a wordlist, for example with [words](<https://en.wikipedia.org/wiki/Words_(Unix)>) on Linux. The exact wordlist
- Get a wordlist, for example with [words](<https://en.wikipedia.org/wiki/Words_(Unix)>) on Linux. The exact wordlist
doesn't matter for performance comparisons. In this example I'm using `/usr/share/dict/american-english` as the wordlist.
- Shuffle the wordlist by running `sort -R /usr/share/dict/american-english > shuffled_wordlist.txt`.
- Benchmark sorting the wordlist with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -o output.txt"`.
- Shuffle the wordlist by running `sort -R /usr/share/dict/american-english > shuffled_wordlist.txt`.
- Benchmark sorting the wordlist with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -o output.txt"`.
## Sorting a wordlist with ignore_case
- Same wordlist as above
- Benchmark sorting the wordlist ignoring the case with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -f -o output.txt"`.
- Same wordlist as above
- Benchmark sorting the wordlist ignoring the case with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -f -o output.txt"`.
## Sorting numbers
- Generate a list of numbers: `seq 0 100000 | sort -R > shuffled_numbers.txt`.
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"`.
- Generate a list of numbers: `seq 0 100000 | sort -R > shuffled_numbers.txt`.
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"`.
## Sorting numbers with -g
- Same list of numbers as above.
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -g -o output.txt"`.
- Same list of numbers as above.
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -g -o output.txt"`.
## Sorting numbers with SI prefixes
- Generate a list of numbers:
<details>
<summary>Rust script</summary>
- Generate a list of numbers:
## Cargo.toml
## Cargo.toml
```toml
[dependencies]
rand = "0.8.3"
```
```toml
[dependencies]
rand = "0.8.3"
```
## main.rs
## main.rs
```rust
use rand::prelude::*;
fn main() {
let suffixes = ['k', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'];
let mut rng = thread_rng();
for _ in 0..100000 {
println!(
"{}{}",
rng.gen_range(0..1000000),
suffixes.choose(&mut rng).unwrap()
)
}
```rust
use rand::prelude::*;
fn main() {
let suffixes = ['k', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'];
let mut rng = thread_rng();
for _ in 0..100000 {
println!(
"{}{}",
rng.gen_range(0..1000000),
suffixes.choose(&mut rng).unwrap()
)
}
}
```
```
## running
## running
`cargo run > shuffled_numbers_si.txt`
`cargo run > shuffled_numbers_si.txt`
</details>
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers_si.txt -h -o output.txt"`.
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers_si.txt -h -o output.txt"`.
## External sorting
@ -83,28 +78,28 @@ Example: Run `hyperfine './target/release/coreutils sort shuffled_wordlist.txt -
"Merge" sort merges already sorted files. It is a sub-step of external sorting, so benchmarking it separately may be helpful.
- Splitting `shuffled_wordlist.txt` can be achieved by running `split shuffled_wordlist.txt shuffled_wordlist_slice_ --additional-suffix=.txt`
- Sort each part by running `for f in shuffled_wordlist_slice_*; do sort $f -o $f; done`
- Benchmark merging by running `hyperfine "target/release/coreutils sort -m shuffled_wordlist_slice_*"`
- Splitting `shuffled_wordlist.txt` can be achieved by running `split shuffled_wordlist.txt shuffled_wordlist_slice_ --additional-suffix=.txt`
- Sort each part by running `for f in shuffled_wordlist_slice_*; do sort $f -o $f; done`
- Benchmark merging by running `hyperfine "target/release/coreutils sort -m shuffled_wordlist_slice_*"`
## Check
When invoked with -c, we simply check if the input is already ordered. The input for benchmarking should be an already sorted file.
- Benchmark checking by running `hyperfine "target/release/coreutils sort -c sorted_wordlist.txt"`
- Benchmark checking by running `hyperfine "target/release/coreutils sort -c sorted_wordlist.txt"`
## Stdout and stdin performance
Try to run the above benchmarks by piping the input through stdin (standard input) and redirect the
output through stdout (standard output):
- Remove the input file from the arguments and add `cat [input_file] | ` at the beginning.
- Remove `-o output.txt` and add `> output.txt` at the end.
- Remove the input file from the arguments and add ```cat [input_file] |``` at the beginning.
- Remove `-o output.txt` and add `> output.txt` at the end.
Example: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"` becomes
`hyperfine "cat shuffled_numbers.txt | target/release/coreutils sort -n > output.txt`
- Check that performance is similar to the original benchmark.
- Check that performance is similar to the original benchmark.
## Comparing with GNU sort
@ -121,37 +116,34 @@ The above benchmarks use hyperfine to measure the speed of sorting. There are ho
resource usage. One way to measure them is the `time` command. This is not to be confused with the `time` that is built in to the bash shell.
You may have to install `time` first, then you have to run it with `/bin/time -v` to give it precedence over the built in `time`.
<details>
<summary>Example output</summary>
Command being timed: "target/release/coreutils sort shuffled_numbers.txt"
User time (seconds): 0.10
System time (seconds): 0.00
Percent of CPU this job got: 365%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 25360
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 5802
Voluntary context switches: 462
Involuntary context switches: 73
Swaps: 0
File system inputs: 1184
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
</details>
```plain
Command being timed: "target/release/coreutils sort shuffled_numbers.txt"
User time (seconds): 0.10
System time (seconds): 0.00
Percent of CPU this job got: 365%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 25360
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 5802
Voluntary context switches: 462
Involuntary context switches: 73
Swaps: 0
File system inputs: 1184
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```
Useful metrics to look at could be:
- User time
- Percent of CPU this job got
- Maximum resident set size
- User time
- Percent of CPU this job got
- Maximum resident set size

View file

@ -7,11 +7,15 @@ GNU version of `split`, you can use a benchmarking tool like
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
running
sudo apt-get install hyperfine
```
sudo apt-get install hyperfine
```
Next, build the `split` binary under the release profile:
cargo build --release -p uu_split
```
cargo build --release -p uu_split
```
Now, get a text file to test `split` on. The `split` program has three
main modes of operation: chunk by lines, chunk by bytes, and chunk by
@ -21,7 +25,9 @@ operation. For example, to test chunking by bytes on a large input file,
you can create a file named `testfile.txt` containing one million null
bytes like this:
printf "%0.s\0" {1..1000000} > testfile.txt
```
printf "%0.s\0" {1..1000000} > testfile.txt
```
For another example, to test chunking by bytes on a large real-world
input file, you could download a [database dump of Wikidata][1] or some
@ -31,10 +37,12 @@ file][2] contains about 130 million lines.
Finally, you can compare the performance of the two versions of `split`
by running, for example,
cd /tmp && hyperfine \
--prepare 'rm x* || true' \
"split -b 1000 testfile.txt" \
"target/release/split -b 1000 testfile.txt"
```
cd /tmp && hyperfine \
--prepare 'rm x* || true' \
"split -b 1000 testfile.txt" \
"target/release/split -b 1000 testfile.txt"
```
Since `split` creates a lot of files on the filesystem, I recommend
changing to the `/tmp` directory before running the benchmark. The

View file

@ -4,7 +4,8 @@
### Flags
* [ ] `--verbose` - created file printing is implemented, don't know if there is anything else
* [ ] `--verbose` - created file printing is implemented, don't know
if there is anything else
## Possible Optimizations

View file

@ -1,4 +1,4 @@
## Benchmarking `sum`
# Benchmarking `sum`
<!-- spell-checker:ignore wikidatawiki -->
@ -7,17 +7,17 @@ Large sample files can for example be found in the [Wikipedia database dumps](ht
After you have obtained and uncompressed such a file, you need to build `sum` in release mode
```shell
$ cargo build --release --package uu_sum
cargo build --release --package uu_sum
```
and then you can time how it long it takes to checksum the file by running
```shell
$ time ./target/release/sum wikidatawiki-20211001-pages-logging.xml
time ./target/release/sum wikidatawiki-20211001-pages-logging.xml
```
For more systematic measurements that include warm-ups, repetitions and comparisons, [Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful. For example, to compare this implementation to the one provided by your distribution run
```shell
$ hyperfine "./target/release/sum wikidatawiki-20211001-pages-logging.xml" "sum wikidatawiki-20211001-pages-logging.xml"
hyperfine "./target/release/sum wikidatawiki-20211001-pages-logging.xml" "sum wikidatawiki-20211001-pages-logging.xml"
```

View file

@ -1,25 +1,36 @@
## Benchmarking `tac`
# Benchmarking `tac`
<!-- spell-checker:ignore wikidatawiki -->
`tac` is often used to process log files in reverse chronological order, i.e. from newer towards older entries. In this case, the performance target to yield results as fast as possible, i.e. without reading in the whole file that is to be reversed line-by-line. Therefore, a sensible benchmark is to read a large log file containing N lines and measure how long it takes to produce the last K lines from that file.
`tac` is often used to process log files in reverse chronological order, i.e.
from newer towards older entries. In this case, the performance target to yield
results as fast as possible, i.e. without reading in the whole file that is to
be reversed line-by-line. Therefore, a sensible benchmark is to read a large log
file containing N lines and measure how long it takes to produce the last K
lines from that file.
Large text files can for example be found in the [Wikipedia database dumps](https://dumps.wikimedia.org/wikidatawiki/latest/), usually sized at multiple gigabytes and comprising more than 100M lines.
Large text files can for example be found in the
[Wikipedia database dumps](https://dumps.wikimedia.org/wikidatawiki/latest/),
usually sized at multiple gigabytes and comprising more than 100M lines.
After you have obtained and uncompressed such a file, you need to build `tac` in release mode
After you have obtained and uncompressed such a file, you need to build `tac`
in release mode
```shell
$ cargo build --release --package uu_tac
cargo build --release --package uu_tac
```
and then you can time how it long it takes to extract the last 10M lines by running
and then you can time how it long it takes to extract the last 10M lines by
running
```shell
$ /usr/bin/time ./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null
/usr/bin/time ./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null
```
For more systematic measurements that include warm-ups, repetitions and comparisons, [Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful. For example, to compare this implementation to the one provided by your distribution run
For more systematic measurements that include warm-ups, repetitions and comparisons,
[Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful.
For example, to compare this implementation to the one provided by your distribution run
```shell
$ hyperfine "./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null" "/usr/bin/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null"
hyperfine "./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null" "/usr/bin/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null"
```

View file

@ -7,40 +7,59 @@
* `--max-unchanged-stats`
Note:
There's a stub for `--max-unchanged-stats` so GNU test-suite checks using it can run, however this flag has no functionality yet.
There's a stub for `--max-unchanged-stats` so GNU test-suite checks using it
can run, however this flag has no functionality yet.
### Platform support for `--follow` and `--retry`
The `--follow=descriptor`, `--follow=name` and `--retry` flags have very good support on Linux (inotify backend).
They work good enough on macOS/BSD (kqueue backend) with some tests failing due to differences of how kqueue works compared to inotify.
Windows support is there in theory due to ReadDirectoryChanges support by the notify-crate, however these flags are completely untested on Windows.
The `--follow=descriptor`, `--follow=name` and `--retry` flags have very good
support on Linux (inotify backend).
They work good enough on macOS/BSD (kqueue backend) with some tests failing due
to differences of how kqueue works compared to inotify.
Windows support is there in theory due to ReadDirectoryChanges support by the
notify-crate, however these flags are completely untested on Windows.
Note:
The undocumented `---disable-inotify` flag is used to disable the inotify backend to test polling.
However inotify is a Linux only backend and polling is now supported also for the other backends.
Because of this, `disable-inotify` is now an alias to the new and more versatile flag name: `--use-polling`.
The undocumented `---disable-inotify` flag is used to disable the inotify
backend to test polling.
However inotify is a Linux only backend and polling is now supported also
for the other backends.
Because of this, `disable-inotify` is now an alias to the new and more versatile
flag name: `--use-polling`.
## Possible optimizations
* Don't read the whole file if not using `-f` and input is regular file. Read in chunks from the end going backwards, reading each individual chunk forward.
* Don't read the whole file if not using `-f` and input is regular file.
Read in chunks from the end going backwards, reading each individual chunk
forward.
* Reduce number of system calls to e.g. `fstat`
* Improve resource management by adding more system calls to `inotify_rm_watch` when appropriate.
* Improve resource management by adding more system calls to `inotify_rm_watch`
when appropriate.
# GNU test-suite results (9.1.8-e08752)
The functionality for the test "gnu/tests/tail-2/follow-stdin.sh" is implemented.
It fails because it is provoking closing a file descriptor with `tail -f <&-` and as part of a workaround, Rust's stdlib reopens closed FDs as `/dev/null` which means uu_tail cannot detect this.
See also, e.g. the discussion at: https://github.com/uutils/coreutils/issues/2873
It fails because it is provoking closing a file descriptor with `tail -f <&-`
and as part of a workaround, Rust's stdlib reopens closed FDs as `/dev/null`
which means uu_tail cannot detect this.
See also, e.g. the discussion at:
<https://github.com/uutils/coreutils/issues/2873>
The functionality for the test "gnu/tests/tail-2/inotify-rotate-resources.sh" is implemented.
It fails with an error because it is using `strace` to look for calls to `inotify_add_watch` and `inotify_rm_watch`,
The functionality for the test "gnu/tests/tail-2/inotify-rotate-resources.sh"
is implemented.
It fails with an error because it is using `strace` to look for calls to
`inotify_add_watch` and `inotify_rm_watch`,
however in uu_tail these system calls are invoked from a separate thread.
If the GNU test would follow threads, i.e. use `strace -f`, this issue could be resolved.
If the GNU test would follow threads, i.e. use `strace -f`, this issue could be
resolved.
There are 5 tests which are fixed but do not (always) pass the test suite if it's run inside the CI.
There are 5 tests which are fixed but do not (always) pass the test suite
if it's run inside the CI.
The reason for this is probably related to load/scheduling on the CI test VM.
The tests in question are:
- [x] `tail-2/F-vs-rename.sh`
- [x] `tail-2/follow-name.sh`
- [x] `tail-2/inotify-rotate.sh`
- [x] `tail-2/overlay-headers.sh`
- [x] `tail-2/retry.sh`
* [x] `tail-2/F-vs-rename.sh`
* [x] `tail-2/follow-name.sh`
* [x] `tail-2/inotify-rotate.sh`
* [x] `tail-2/overlay-headers.sh`
* [x] `tail-2/retry.sh`

View file

@ -1,5 +1,6 @@
# truncate
```
truncate [OPTION]... [FILE]...
```
@ -22,4 +23,4 @@ file based on its current size:
'<' => at most
'>' => at least
'/' => round down to multiple of
'%' => round up to multiple of
'%' => round up to multiple of

View file

@ -2,45 +2,59 @@
<!-- spell-checker:ignore (words) uuwc uucat largefile somefile Mshortlines moby lwcm cmds tablefmt -->
Much of what makes wc fast is avoiding unnecessary work. It has multiple strategies, depending on which data is requested.
Much of what makes wc fast is avoiding unnecessary work. It has multiple strategies,
depending on which data is requested.
## Strategies
### Counting bytes
In the case of `wc -c` the content of the input doesn't have to be inspected at all, only the size has to be known. That enables a few optimizations.
In the case of `wc -c` the content of the input doesn't have to be inspected at all,
only the size has to be known. That enables a few optimizations.
#### File size
If it can, wc reads the file size directly. This is not interesting to benchmark, except to see if it still works. Try `wc -c largefile`.
If it can, wc reads the file size directly. This is not interesting to benchmark,
except to see if it still works. Try `wc -c largefile`.
#### `splice()`
On Linux `splice()` is used to get the input's length while discarding it directly.
The best way I've found to generate a fast input to test `splice()` is to pipe the output of uutils `cat` into it. Note that GNU `cat` is slower and therefore less suitable, and that if a file is given as its input directly (as in `wc -c < largefile`) the first strategy kicks in. Try `uucat somefile | wc -c`.
The best way I've found to generate a fast input to test `splice()` is to pipe the
output of uutils `cat` into it. Note that GNU `cat` is slower and therefore less
suitable, and that if a file is given as its input directly (as in
`wc -c < largefile`) the first strategy kicks in. Try `uucat somefile | wc -c`.
### Counting lines
In the case of `wc -l` or `wc -cl` the input doesn't have to be decoded. It's read in chunks and the `bytecount` crate is used to count the newlines.
In the case of `wc -l` or `wc -cl` the input doesn't have to be decoded. It's
read in chunks and the `bytecount` crate is used to count the newlines.
It's useful to vary the line length in the input. GNU wc seems particularly bad at short lines.
It's useful to vary the line length in the input. GNU wc seems particularly
bad at short lines.
### Processing unicode
This is the most general strategy, and it's necessary for counting words, characters, and line lengths. Individual steps are still switched on and off depending on what must be reported.
This is the most general strategy, and it's necessary for counting words,
characters, and line lengths. Individual steps are still switched on and off
depending on what must be reported.
Try varying which of the `-w`, `-m`, `-l` and `-L` flags are used. (The `-c` flag is unlikely to make a difference.)
Try varying which of the `-w`, `-m`, `-l` and `-L` flags are used.
(The `-c` flag is unlikely to make a difference.)
Passing no flags is equivalent to passing `-wcl`. That case should perhaps be given special attention as it's the default.
Passing no flags is equivalent to passing `-wcl`. That case should perhaps be
given special attention as it's the default.
## Generating files
To generate a file with many very short lines, run `yes | head -c50000000 > 25Mshortlines`.
To generate a file with many very short lines, run
`yes | head -c50000000 > 25Mshortlines`.
To get a file with less artificial contents, download a book from Project Gutenberg and concatenate it a lot of times:
To get a file with less artificial contents, download a book from
Project Gutenberg and concatenate it a lot of times:
```
```shell
wget https://www.gutenberg.org/files/2701/2701-0.txt -O moby.txt
cat moby.txt moby.txt moby.txt moby.txt > moby4.txt
cat moby4.txt moby4.txt moby4.txt moby4.txt > moby16.txt
@ -49,7 +63,7 @@ cat moby16.txt moby16.txt moby16.txt moby16.txt > moby64.txt
And get one with lots of unicode too:
```
```shell
wget https://www.gutenberg.org/files/30613/30613-0.txt -O odyssey.txt
cat odyssey.txt odyssey.txt odyssey.txt odyssey.txt > odyssey4.txt
cat odyssey4.txt odyssey4.txt odyssey4.txt odyssey4.txt > odyssey16.txt
@ -57,11 +71,14 @@ cat odyssey16.txt odyssey16.txt odyssey16.txt odyssey16.txt > odyssey64.txt
cat odyssey64.txt odyssey64.txt odyssey64.txt odyssey64.txt > odyssey256.txt
```
Finally, it's interesting to try a binary file. Look for one with `du -sh /usr/bin/* | sort -h`. On my system `/usr/bin/docker` is a good candidate as it's fairly large.
Finally, it's interesting to try a binary file. Look for one with
`du -sh /usr/bin/* | sort -h`. On my system `/usr/bin/docker` is a good
candidate as it's fairly large.
## Running benchmarks
Use [`hyperfine`](https://github.com/sharkdp/hyperfine) to compare the performance. For example, `hyperfine 'wc somefile' 'uuwc somefile'`.
Use [`hyperfine`](https://github.com/sharkdp/hyperfine) to compare the
performance. For example, `hyperfine 'wc somefile' 'uuwc somefile'`.
If you want to get fancy and exhaustive, generate a table:
@ -77,12 +94,16 @@ If you want to get fancy and exhaustive, generate a table:
| `wc -lwcmL <FILE>` | 1.1687 | 0.9169 | 4.4092 | 2.0663 |
Beware that:
- Results are fuzzy and change from run to run
- You'll often want to check versions of uutils wc against each other instead of against GNU
- You'll often want to check versions of uutils wc against each other instead
of against GNU
- This takes a lot of time to generate
- This only shows the relative speedup, not the absolute time, which may be misleading if the time is very short
- This only shows the relative speedup, not the absolute time, which may be
misleading if the time is very short
Created by the following Python script:
```python
import json
import subprocess
@ -121,4 +142,6 @@ for cmd in cmds:
table.append(row)
print(tabulate(table, [""] + files, tablefmt="github"))
```
(You may have to adjust the `bins` and `files` variables depending on your setup, and please do add other interesting cases to `cmds`.)
(You may have to adjust the `bins` and `files` variables depending on your
setup, and please do add other interesting cases to `cmds`.)

View file

@ -1,5 +1,6 @@
# wc
```
wc [OPTION]... [FILE]...
```