mirror of
https://github.com/uutils/coreutils
synced 2024-11-15 01:17:09 +00:00
Merge pull request #4457 from sylvestre/md-check
Various improvements in the docs
This commit is contained in:
commit
e1a98fea44
45 changed files with 670 additions and 557 deletions
8
.github/workflows/CICD.yml
vendored
8
.github/workflows/CICD.yml
vendored
|
@ -291,7 +291,13 @@ jobs:
|
||||||
shell: bash
|
shell: bash
|
||||||
run: |
|
run: |
|
||||||
RUSTDOCFLAGS="-Dwarnings" cargo doc ${{ steps.vars.outputs.CARGO_FEATURES_OPTION }} --no-deps --workspace --document-private-items
|
RUSTDOCFLAGS="-Dwarnings" cargo doc ${{ steps.vars.outputs.CARGO_FEATURES_OPTION }} --no-deps --workspace --document-private-items
|
||||||
|
- uses: DavidAnson/markdownlint-cli2-action@v9
|
||||||
|
with:
|
||||||
|
command: fix
|
||||||
|
globs: |
|
||||||
|
*.md
|
||||||
|
docs/src/*.md
|
||||||
|
src/uu/*/*.md
|
||||||
|
|
||||||
min_version:
|
min_version:
|
||||||
name: MinRustV # Minimum supported rust version (aka, MinSRV or MSRV)
|
name: MinRustV # Minimum supported rust version (aka, MinSRV or MSRV)
|
||||||
|
|
6
.markdownlint.yaml
Normal file
6
.markdownlint.yaml
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
# Disable 'Line length'. Doesn't provide much values
|
||||||
|
MD013: false
|
||||||
|
# Disable 'Fenced code blocks should have a language specified'
|
||||||
|
# Doesn't provide much in src/ to enforce it
|
||||||
|
MD040: false
|
||||||
|
|
|
@ -116,7 +116,7 @@ the community.
|
||||||
|
|
||||||
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
|
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
|
||||||
version 2.0, available at
|
version 2.0, available at
|
||||||
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
|
<https://www.contributor-covenant.org/version/2/0/code_of_conduct.html>.
|
||||||
|
|
||||||
Community Impact Guidelines were inspired by [Mozilla's code of conduct
|
Community Impact Guidelines were inspired by [Mozilla's code of conduct
|
||||||
enforcement ladder](https://github.com/mozilla/diversity).
|
enforcement ladder](https://github.com/mozilla/diversity).
|
||||||
|
@ -124,5 +124,5 @@ enforcement ladder](https://github.com/mozilla/diversity).
|
||||||
[homepage]: https://www.contributor-covenant.org
|
[homepage]: https://www.contributor-covenant.org
|
||||||
|
|
||||||
For answers to common questions about this code of conduct, see the FAQ at
|
For answers to common questions about this code of conduct, see the FAQ at
|
||||||
https://www.contributor-covenant.org/faq. Translations are available at
|
<https://www.contributor-covenant.org/faq>. Translations are available at
|
||||||
https://www.contributor-covenant.org/translations.
|
<https://www.contributor-covenant.org/translations>.
|
||||||
|
|
|
@ -43,15 +43,14 @@ We take pride in supporting many operating systems and architectures.
|
||||||
**Tip:**
|
**Tip:**
|
||||||
For Windows, Microsoft provides some images (VMWare, Hyper-V, VirtualBox and Parallels)
|
For Windows, Microsoft provides some images (VMWare, Hyper-V, VirtualBox and Parallels)
|
||||||
for development:
|
for development:
|
||||||
https://developer.microsoft.com/windows/downloads/virtual-machines/
|
<https://developer.microsoft.com/windows/downloads/virtual-machines/>
|
||||||
|
|
||||||
|
|
||||||
## Commit messages
|
## Commit messages
|
||||||
|
|
||||||
To help the project maintainers review pull requests from contributors across
|
To help the project maintainers review pull requests from contributors across
|
||||||
numerous utilities, the team has settled on conventions for commit messages.
|
numerous utilities, the team has settled on conventions for commit messages.
|
||||||
|
|
||||||
From http://git-scm.com/book/ch5-2.html:
|
From <http://git-scm.com/book/ch5-2.html>:
|
||||||
|
|
||||||
```
|
```
|
||||||
Short (50 chars or less) summary of changes
|
Short (50 chars or less) summary of changes
|
||||||
|
|
|
@ -1,21 +1,19 @@
|
||||||
Documentation
|
# Documentation
|
||||||
-------------
|
|
||||||
|
|
||||||
The source of the documentation is available on:
|
The source of the documentation is available on:
|
||||||
|
|
||||||
https://uutils.github.io/dev/coreutils/
|
<https://uutils.github.io/dev/coreutils/>
|
||||||
|
|
||||||
The documentation is updated everyday on this repository:
|
The documentation is updated everyday on this repository:
|
||||||
|
|
||||||
https://github.com/uutils/uutils.github.io/
|
<https://github.com/uutils/uutils.github.io/>
|
||||||
|
|
||||||
Running GNU tests
|
## Running GNU tests
|
||||||
-----------------
|
|
||||||
|
|
||||||
<!-- spell-checker:ignore gnulib -->
|
<!-- spell-checker:ignore gnulib -->
|
||||||
|
|
||||||
- Check out https://github.com/coreutils/coreutils next to your fork as gnu
|
- Check out <https://github.com/coreutils/coreutils> next to your fork as gnu
|
||||||
- Check out https://github.com/coreutils/gnulib next to your fork as gnulib
|
- Check out <https://github.com/coreutils/gnulib> next to your fork as gnulib
|
||||||
- Rename the checkout of your fork to uutils
|
- Rename the checkout of your fork to uutils
|
||||||
|
|
||||||
At the end you should have uutils, gnu and gnulib checked out next to each other.
|
At the end you should have uutils, gnu and gnulib checked out next to each other.
|
||||||
|
@ -23,9 +21,7 @@ At the end you should have uutils, gnu and gnulib checked out next to each other
|
||||||
- Run `cd uutils && ./util/build-gnu.sh && cd ..` to get everything ready (this may take a while)
|
- Run `cd uutils && ./util/build-gnu.sh && cd ..` to get everything ready (this may take a while)
|
||||||
- Finally, you can run tests with `bash uutils/util/run-gnu-test.sh <tests>`. Instead of `<tests>` insert the tests you want to run, e.g. `tests/misc/wc-proc.sh`.
|
- Finally, you can run tests with `bash uutils/util/run-gnu-test.sh <tests>`. Instead of `<tests>` insert the tests you want to run, e.g. `tests/misc/wc-proc.sh`.
|
||||||
|
|
||||||
|
## Code Coverage Report Generation
|
||||||
Code Coverage Report Generation
|
|
||||||
---------------------------------
|
|
||||||
|
|
||||||
<!-- spell-checker:ignore (flags) Ccodegen Coverflow Cpanic Zinstrument Zpanic -->
|
<!-- spell-checker:ignore (flags) Ccodegen Coverflow Cpanic Zinstrument Zpanic -->
|
||||||
|
|
||||||
|
@ -35,14 +31,14 @@ Code coverage report can be generated using [grcov](https://github.com/mozilla/g
|
||||||
|
|
||||||
To generate [gcov-based](https://github.com/mozilla/grcov#example-how-to-generate-gcda-files-for-a-rust-project) coverage report
|
To generate [gcov-based](https://github.com/mozilla/grcov#example-how-to-generate-gcda-files-for-a-rust-project) coverage report
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ export CARGO_INCREMENTAL=0
|
export CARGO_INCREMENTAL=0
|
||||||
$ export RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Copt-level=0 -Clink-dead-code -Coverflow-checks=off -Zpanic_abort_tests -Cpanic=abort"
|
export RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Copt-level=0 -Clink-dead-code -Coverflow-checks=off -Zpanic_abort_tests -Cpanic=abort"
|
||||||
$ export RUSTDOCFLAGS="-Cpanic=abort"
|
export RUSTDOCFLAGS="-Cpanic=abort"
|
||||||
$ cargo build <options...> # e.g., --features feat_os_unix
|
cargo build <options...> # e.g., --features feat_os_unix
|
||||||
$ cargo test <options...> # e.g., --features feat_os_unix test_pathchk
|
cargo test <options...> # e.g., --features feat_os_unix test_pathchk
|
||||||
$ grcov . -s . --binary-path ./target/debug/ -t html --branch --ignore-not-existing --ignore build.rs --excl-br-line "^\s*((debug_)?assert(_eq|_ne)?\#\[derive\()" -o ./target/debug/coverage/
|
grcov . -s . --binary-path ./target/debug/ -t html --branch --ignore-not-existing --ignore build.rs --excl-br-line "^\s*((debug_)?assert(_eq|_ne)?\#\[derive\()" -o ./target/debug/coverage/
|
||||||
$ # open target/debug/coverage/index.html in browser
|
# open target/debug/coverage/index.html in browser
|
||||||
```
|
```
|
||||||
|
|
||||||
if changes are not reflected in the report then run `cargo clean` and run the above commands.
|
if changes are not reflected in the report then run `cargo clean` and run the above commands.
|
||||||
|
@ -52,19 +48,21 @@ if changes are not reflected in the report then run `cargo clean` and run the ab
|
||||||
If you are using stable version of Rust that doesn't enable code coverage instrumentation by default
|
If you are using stable version of Rust that doesn't enable code coverage instrumentation by default
|
||||||
then add `-Z-Zinstrument-coverage` flag to `RUSTFLAGS` env variable specified above.
|
then add `-Z-Zinstrument-coverage` flag to `RUSTFLAGS` env variable specified above.
|
||||||
|
|
||||||
|
## pre-commit hooks
|
||||||
pre-commit hooks
|
|
||||||
----------------
|
|
||||||
|
|
||||||
A configuration for `pre-commit` is provided in the repository. It allows automatically checking every git commit you make to ensure it compiles, and passes `clippy` and `rustfmt` without warnings.
|
A configuration for `pre-commit` is provided in the repository. It allows automatically checking every git commit you make to ensure it compiles, and passes `clippy` and `rustfmt` without warnings.
|
||||||
|
|
||||||
To use the provided hook:
|
To use the provided hook:
|
||||||
|
|
||||||
1. [Install `pre-commit`](https://pre-commit.com/#install)
|
1. [Install `pre-commit`](https://pre-commit.com/#install)
|
||||||
2. Run `pre-commit install` while in the repository directory
|
1. Run `pre-commit install` while in the repository directory
|
||||||
|
|
||||||
Your git commits will then automatically be checked. If a check fails, an error message will explain why, and your commit will be canceled. You can then make the suggested changes, and run `git commit ...` again.
|
Your git commits will then automatically be checked. If a check fails, an error message will explain why, and your commit will be canceled. You can then make the suggested changes, and run `git commit ...` again.
|
||||||
|
|
||||||
### Using Clippy
|
## Using Clippy
|
||||||
|
|
||||||
The `msrv` key in the clippy configuration file `clippy.toml` is used to disable lints pertaining to newer features by specifying the minimum supported Rust version (MSRV). However, this key is only supported on `nightly`. To invoke clippy without errors, use `cargo +nightly clippy`. In order to also check tests and non-default crate features, use `cargo +nightly clippy --all-targets --all-features`.
|
The `msrv` key in the clippy configuration file `clippy.toml` is used to disable lints pertaining to newer features by specifying the minimum supported Rust version (MSRV). However, this key is only supported on `nightly`. To invoke clippy without errors, use `cargo +nightly clippy`. In order to also check tests and non-default crate features, use `cargo +nightly clippy --all-targets --all-features`.
|
||||||
|
|
||||||
|
## Markdown linter
|
||||||
|
|
||||||
|
We use <https://github.com/DavidAnson/markdownlint> to lint the Markdown files.
|
||||||
|
|
190
README.md
190
README.md
|
@ -21,11 +21,12 @@ or different behavior might be experienced.
|
||||||
|
|
||||||
To install it:
|
To install it:
|
||||||
|
|
||||||
```
|
```shell
|
||||||
$ cargo install coreutils
|
cargo install coreutils
|
||||||
$ ~/.cargo/bin/coreutils
|
~/.cargo/bin/coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<!-- markdownlint-disable-next-line MD026 -->
|
||||||
## Why?
|
## Why?
|
||||||
|
|
||||||
uutils aims to work on as many platforms as possible, to be able to use the
|
uutils aims to work on as many platforms as possible, to be able to use the
|
||||||
|
@ -35,6 +36,7 @@ chosen not only because it is fast and safe, but is also excellent for
|
||||||
writing cross-platform code.
|
writing cross-platform code.
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
uutils has both user and developer documentation available:
|
uutils has both user and developer documentation available:
|
||||||
|
|
||||||
- [User Manual](https://uutils.github.io/user/)
|
- [User Manual](https://uutils.github.io/user/)
|
||||||
|
@ -46,8 +48,8 @@ Both can also be generated locally, the instructions for that can be found in th
|
||||||
<!-- ANCHOR: build (this mark is needed for mdbook) -->
|
<!-- ANCHOR: build (this mark is needed for mdbook) -->
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
* Rust (`cargo`, `rustc`)
|
- Rust (`cargo`, `rustc`)
|
||||||
* GNU Make (optional)
|
- GNU Make (optional)
|
||||||
|
|
||||||
### Rust Version
|
### Rust Version
|
||||||
|
|
||||||
|
@ -64,9 +66,9 @@ or GNU Make.
|
||||||
|
|
||||||
For either method, we first need to fetch the repository:
|
For either method, we first need to fetch the repository:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ git clone https://github.com/uutils/coreutils
|
git clone https://github.com/uutils/coreutils
|
||||||
$ cd coreutils
|
cd coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
### Cargo
|
### Cargo
|
||||||
|
@ -74,8 +76,8 @@ $ cd coreutils
|
||||||
Building uutils using Cargo is easy because the process is the same as for
|
Building uutils using Cargo is easy because the process is the same as for
|
||||||
every other Rust program:
|
every other Rust program:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo build --release
|
cargo build --release
|
||||||
```
|
```
|
||||||
|
|
||||||
This command builds the most portable common core set of uutils into a multicall
|
This command builds the most portable common core set of uutils into a multicall
|
||||||
|
@ -85,20 +87,20 @@ Additional platform-specific uutils are often available. Building these
|
||||||
expanded sets of uutils for a platform (on that platform) is as simple as
|
expanded sets of uutils for a platform (on that platform) is as simple as
|
||||||
specifying it as a feature:
|
specifying it as a feature:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo build --release --features macos
|
cargo build --release --features macos
|
||||||
# or ...
|
# or ...
|
||||||
$ cargo build --release --features windows
|
cargo build --release --features windows
|
||||||
# or ...
|
# or ...
|
||||||
$ cargo build --release --features unix
|
cargo build --release --features unix
|
||||||
```
|
```
|
||||||
|
|
||||||
If you don't want to build every utility available on your platform into the
|
If you don't want to build every utility available on your platform into the
|
||||||
final binary, you can also specify which ones you want to build manually.
|
final binary, you can also specify which ones you want to build manually.
|
||||||
For example:
|
For example:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo build --features "base32 cat echo rm" --no-default-features
|
cargo build --features "base32 cat echo rm" --no-default-features
|
||||||
```
|
```
|
||||||
|
|
||||||
If you don't want to build the multicall binary and would prefer to build
|
If you don't want to build the multicall binary and would prefer to build
|
||||||
|
@ -107,8 +109,8 @@ is contained in its own package within the main repository, named
|
||||||
"uu_UTILNAME". To build individual utilities, use cargo to build just the
|
"uu_UTILNAME". To build individual utilities, use cargo to build just the
|
||||||
specific packages (using the `--package` [aka `-p`] option). For example:
|
specific packages (using the `--package` [aka `-p`] option). For example:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo build -p uu_base32 -p uu_cat -p uu_echo -p uu_rm
|
cargo build -p uu_base32 -p uu_cat -p uu_echo -p uu_rm
|
||||||
```
|
```
|
||||||
|
|
||||||
### GNU Make
|
### GNU Make
|
||||||
|
@ -117,30 +119,30 @@ Building using `make` is a simple process as well.
|
||||||
|
|
||||||
To simply build all available utilities:
|
To simply build all available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make
|
make
|
||||||
```
|
```
|
||||||
|
|
||||||
To build all but a few of the available utilities:
|
To build all but a few of the available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make SKIP_UTILS='UTILITY_1 UTILITY_2'
|
make SKIP_UTILS='UTILITY_1 UTILITY_2'
|
||||||
```
|
```
|
||||||
|
|
||||||
To build only a few of the available utilities:
|
To build only a few of the available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make UTILS='UTILITY_1 UTILITY_2'
|
make UTILS='UTILITY_1 UTILITY_2'
|
||||||
```
|
```
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
### Cargo
|
### Install with Cargo
|
||||||
|
|
||||||
Likewise, installing can simply be done using:
|
Likewise, installing can simply be done using:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo install --path .
|
cargo install --path .
|
||||||
```
|
```
|
||||||
|
|
||||||
This command will install uutils into Cargo's *bin* folder (*e.g.* `$HOME/.cargo/bin`).
|
This command will install uutils into Cargo's *bin* folder (*e.g.* `$HOME/.cargo/bin`).
|
||||||
|
@ -148,49 +150,49 @@ This command will install uutils into Cargo's *bin* folder (*e.g.* `$HOME/.cargo
|
||||||
This does not install files necessary for shell completion or manpages.
|
This does not install files necessary for shell completion or manpages.
|
||||||
For manpages or shell completion to work, use `GNU Make` or see `Manually install shell completions`/`Manually install manpages`.
|
For manpages or shell completion to work, use `GNU Make` or see `Manually install shell completions`/`Manually install manpages`.
|
||||||
|
|
||||||
### GNU Make
|
### Install with GNU Make
|
||||||
|
|
||||||
To install all available utilities:
|
To install all available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make install
|
make install
|
||||||
```
|
```
|
||||||
|
|
||||||
To install using `sudo` switch `-E` must be used:
|
To install using `sudo` switch `-E` must be used:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ sudo -E make install
|
sudo -E make install
|
||||||
```
|
```
|
||||||
|
|
||||||
To install all but a few of the available utilities:
|
To install all but a few of the available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make SKIP_UTILS='UTILITY_1 UTILITY_2' install
|
make SKIP_UTILS='UTILITY_1 UTILITY_2' install
|
||||||
```
|
```
|
||||||
|
|
||||||
To install only a few of the available utilities:
|
To install only a few of the available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make UTILS='UTILITY_1 UTILITY_2' install
|
make UTILS='UTILITY_1 UTILITY_2' install
|
||||||
```
|
```
|
||||||
|
|
||||||
To install every program with a prefix (e.g. uu-echo uu-cat):
|
To install every program with a prefix (e.g. uu-echo uu-cat):
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make PROG_PREFIX=PREFIX_GOES_HERE install
|
make PROG_PREFIX=PREFIX_GOES_HERE install
|
||||||
```
|
```
|
||||||
|
|
||||||
To install the multicall binary:
|
To install the multicall binary:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make MULTICALL=y install
|
make MULTICALL=y install
|
||||||
```
|
```
|
||||||
|
|
||||||
Set install parent directory (default value is /usr/local):
|
Set install parent directory (default value is /usr/local):
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
# DESTDIR is also supported
|
# DESTDIR is also supported
|
||||||
$ make PREFIX=/my/path install
|
make PREFIX=/my/path install
|
||||||
```
|
```
|
||||||
|
|
||||||
Installing with `make` installs shell completions for all installed utilities
|
Installing with `make` installs shell completions for all installed utilities
|
||||||
|
@ -203,14 +205,15 @@ The `coreutils` binary can generate completions for the `bash`, `elvish`, `fish`
|
||||||
and `zsh` shells. It prints the result to stdout.
|
and `zsh` shells. It prints the result to stdout.
|
||||||
|
|
||||||
The syntax is:
|
The syntax is:
|
||||||
```bash
|
|
||||||
|
```shell
|
||||||
cargo run completion <utility> <shell>
|
cargo run completion <utility> <shell>
|
||||||
```
|
```
|
||||||
|
|
||||||
So, to install completions for `ls` on `bash` to `/usr/local/share/bash-completion/completions/ls`,
|
So, to install completions for `ls` on `bash` to `/usr/local/share/bash-completion/completions/ls`,
|
||||||
run:
|
run:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
cargo run completion ls bash > /usr/local/share/bash-completion/completions/ls
|
cargo run completion ls bash > /usr/local/share/bash-completion/completions/ls
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -234,106 +237,107 @@ Un-installation differs depending on how you have installed uutils. If you used
|
||||||
Cargo to install, use Cargo to uninstall. If you used GNU Make to install, use
|
Cargo to install, use Cargo to uninstall. If you used GNU Make to install, use
|
||||||
Make to uninstall.
|
Make to uninstall.
|
||||||
|
|
||||||
### Cargo
|
### Uninstall with Cargo
|
||||||
|
|
||||||
To uninstall uutils:
|
To uninstall uutils:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo uninstall uutils
|
cargo uninstall uutils
|
||||||
```
|
```
|
||||||
|
|
||||||
### GNU Make
|
### Uninstall with GNU Make
|
||||||
|
|
||||||
To uninstall all utilities:
|
To uninstall all utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make uninstall
|
make uninstall
|
||||||
```
|
```
|
||||||
|
|
||||||
To uninstall every program with a set prefix:
|
To uninstall every program with a set prefix:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make PROG_PREFIX=PREFIX_GOES_HERE uninstall
|
make PROG_PREFIX=PREFIX_GOES_HERE uninstall
|
||||||
```
|
```
|
||||||
|
|
||||||
To uninstall the multicall binary:
|
To uninstall the multicall binary:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make MULTICALL=y uninstall
|
make MULTICALL=y uninstall
|
||||||
```
|
```
|
||||||
|
|
||||||
To uninstall from a custom parent directory:
|
To uninstall from a custom parent directory:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
# DESTDIR is also supported
|
# DESTDIR is also supported
|
||||||
$ make PREFIX=/my/path uninstall
|
make PREFIX=/my/path uninstall
|
||||||
```
|
```
|
||||||
|
|
||||||
<!-- ANCHOR_END: build (this mark is needed for mdbook) -->
|
<!-- ANCHOR_END: build (this mark is needed for mdbook) -->
|
||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
Testing can be done using either Cargo or `make`.
|
Testing can be done using either Cargo or `make`.
|
||||||
|
|
||||||
### Cargo
|
### Testing with Cargo
|
||||||
|
|
||||||
Just like with building, we follow the standard procedure for testing using
|
Just like with building, we follow the standard procedure for testing using
|
||||||
Cargo:
|
Cargo:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo test
|
cargo test
|
||||||
```
|
```
|
||||||
|
|
||||||
By default, `cargo test` only runs the common programs. To run also platform
|
By default, `cargo test` only runs the common programs. To run also platform
|
||||||
specific tests, run:
|
specific tests, run:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo test --features unix
|
cargo test --features unix
|
||||||
```
|
```
|
||||||
|
|
||||||
If you would prefer to test a select few utilities:
|
If you would prefer to test a select few utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo test --features "chmod mv tail" --no-default-features
|
cargo test --features "chmod mv tail" --no-default-features
|
||||||
```
|
```
|
||||||
|
|
||||||
If you also want to test the core utilities:
|
If you also want to test the core utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ cargo test -p uucore -p coreutils
|
cargo test -p uucore -p coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
To debug:
|
To debug:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ gdb --args target/debug/coreutils ls
|
gdb --args target/debug/coreutils ls
|
||||||
(gdb) b ls.rs:79
|
(gdb) b ls.rs:79
|
||||||
(gdb) run
|
(gdb) run
|
||||||
```
|
```
|
||||||
|
|
||||||
### GNU Make
|
### Testing with GNU Make
|
||||||
|
|
||||||
To simply test all available utilities:
|
To simply test all available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make test
|
make test
|
||||||
```
|
```
|
||||||
|
|
||||||
To test all but a few of the available utilities:
|
To test all but a few of the available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make SKIP_UTILS='UTILITY_1 UTILITY_2' test
|
make SKIP_UTILS='UTILITY_1 UTILITY_2' test
|
||||||
```
|
```
|
||||||
|
|
||||||
To test only a few of the available utilities:
|
To test only a few of the available utilities:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make UTILS='UTILITY_1 UTILITY_2' test
|
make UTILS='UTILITY_1 UTILITY_2' test
|
||||||
```
|
```
|
||||||
|
|
||||||
To include tests for unimplemented behavior:
|
To include tests for unimplemented behavior:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make UTILS='UTILITY_1 UTILITY_2' SPEC=y test
|
make UTILS='UTILITY_1 UTILITY_2' SPEC=y test
|
||||||
```
|
```
|
||||||
|
|
||||||
### Run Busybox Tests
|
### Run Busybox Tests
|
||||||
|
@ -343,20 +347,20 @@ requires `make`.
|
||||||
|
|
||||||
To run busybox tests for all utilities for which busybox has tests
|
To run busybox tests for all utilities for which busybox has tests
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make busytest
|
make busytest
|
||||||
```
|
```
|
||||||
|
|
||||||
To run busybox tests for a few of the available utilities
|
To run busybox tests for a few of the available utilities
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make UTILS='UTILITY_1 UTILITY_2' busytest
|
make UTILS='UTILITY_1 UTILITY_2' busytest
|
||||||
```
|
```
|
||||||
|
|
||||||
To pass an argument like "-v" to the busybox test runtime
|
To pass an argument like "-v" to the busybox test runtime
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ make UTILS='UTILITY_1 UTILITY_2' RUNTEST_ARGS='-v' busytest
|
make UTILS='UTILITY_1 UTILITY_2' RUNTEST_ARGS='-v' busytest
|
||||||
```
|
```
|
||||||
|
|
||||||
### Comparing with GNU
|
### Comparing with GNU
|
||||||
|
@ -369,15 +373,15 @@ breakdown of the GNU test results of the main branch can be found
|
||||||
|
|
||||||
To run locally:
|
To run locally:
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
$ bash util/build-gnu.sh
|
bash util/build-gnu.sh
|
||||||
$ bash util/run-gnu-test.sh
|
bash util/run-gnu-test.sh
|
||||||
# To run a single test:
|
# To run a single test:
|
||||||
$ bash util/run-gnu-test.sh tests/touch/not-owner.sh # for example
|
bash util/run-gnu-test.sh tests/touch/not-owner.sh # for example
|
||||||
# To run several tests:
|
# To run several tests:
|
||||||
$ bash util/run-gnu-test.sh tests/touch/not-owner.sh tests/rm/no-give-up.sh # for example
|
bash util/run-gnu-test.sh tests/touch/not-owner.sh tests/rm/no-give-up.sh # for example
|
||||||
# If this is a perl (.pl) test, to run in debug:
|
# If this is a perl (.pl) test, to run in debug:
|
||||||
$ DEBUG=1 bash util/run-gnu-test.sh tests/misc/sm3sum.pl
|
DEBUG=1 bash util/run-gnu-test.sh tests/misc/sm3sum.pl
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that it relies on individual utilities (not the multicall binary).
|
Note that it relies on individual utilities (not the multicall binary).
|
||||||
|
@ -401,7 +405,6 @@ To improve the GNU compatibility, the following process is recommended:
|
||||||
1. Start to modify the Rust implementation to match the expected behavior
|
1. Start to modify the Rust implementation to match the expected behavior
|
||||||
1. Add a test to make sure that we don't regress (our test suite is super quick)
|
1. Add a test to make sure that we don't regress (our test suite is super quick)
|
||||||
|
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
To contribute to uutils, please see [CONTRIBUTING](CONTRIBUTING.md).
|
To contribute to uutils, please see [CONTRIBUTING](CONTRIBUTING.md).
|
||||||
|
@ -409,11 +412,12 @@ To contribute to uutils, please see [CONTRIBUTING](CONTRIBUTING.md).
|
||||||
## Utilities
|
## Utilities
|
||||||
|
|
||||||
Please note that this is not fully accurate:
|
Please note that this is not fully accurate:
|
||||||
* Some new options can be added / removed in the GNU implementation;
|
|
||||||
* Some error management might be missing;
|
|
||||||
* Some behaviors might be different.
|
|
||||||
|
|
||||||
See https://github.com/uutils/coreutils/issues/3336 for the main meta bugs
|
- Some new options can be added / removed in the GNU implementation;
|
||||||
|
- Some error management might be missing;
|
||||||
|
- Some behaviors might be different.
|
||||||
|
|
||||||
|
See <https://github.com/uutils/coreutils/issues/3336> for the main meta bugs
|
||||||
(many are missing).
|
(many are missing).
|
||||||
|
|
||||||
| Done | WIP |
|
| Done | WIP |
|
||||||
|
|
|
@ -1 +1,3 @@
|
||||||
|
<!-- markdownlint-disable MD041 -->
|
||||||
|
|
||||||
{{ #include ../../CONTRIBUTING.md }}
|
{{ #include ../../CONTRIBUTING.md }}
|
|
@ -1,5 +1,9 @@
|
||||||
|
<!-- markdownlint-disable MD041 -->
|
||||||
|
|
||||||
{{#include logo.svg}}
|
{{#include logo.svg}}
|
||||||
|
|
||||||
|
<!-- markdownlint-disable MD033 -->
|
||||||
|
|
||||||
<style>
|
<style>
|
||||||
/* Make the logo a bit bigger and center */
|
/* Make the logo a bit bigger and center */
|
||||||
#logo {
|
#logo {
|
||||||
|
|
|
@ -11,9 +11,10 @@ You can also [build uutils from source](/build.md).
|
||||||
<!-- toc -->
|
<!-- toc -->
|
||||||
|
|
||||||
## Cargo
|
## Cargo
|
||||||
|
|
||||||
[![crates.io package](https://repology.org/badge/version-for-repo/crates_io/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
[![crates.io package](https://repology.org/badge/version-for-repo/crates_io/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
# Linux
|
# Linux
|
||||||
cargo install coreutils --features unix
|
cargo install coreutils --features unix
|
||||||
# MacOs
|
# MacOs
|
||||||
|
@ -23,11 +24,12 @@ cargo install coreutils --features windows
|
||||||
```
|
```
|
||||||
|
|
||||||
## Linux
|
## Linux
|
||||||
|
|
||||||
### Alpine
|
### Alpine
|
||||||
|
|
||||||
[![Alpine Linux Edge package](https://repology.org/badge/version-for-repo/alpine_edge/uutils-coreutils.svg)](https://pkgs.alpinelinux.org/packages?name=uutils-coreutils)
|
[![Alpine Linux Edge package](https://repology.org/badge/version-for-repo/alpine_edge/uutils-coreutils.svg)](https://pkgs.alpinelinux.org/packages?name=uutils-coreutils)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
apk update uutils-coreutils
|
apk update uutils-coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -37,7 +39,7 @@ apk update uutils-coreutils
|
||||||
|
|
||||||
[![Arch package](https://repology.org/badge/version-for-repo/arch/uutils-coreutils.svg)](https://archlinux.org/packages/community/x86_64/uutils-coreutils/)
|
[![Arch package](https://repology.org/badge/version-for-repo/arch/uutils-coreutils.svg)](https://archlinux.org/packages/community/x86_64/uutils-coreutils/)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
pacman -S uutils-coreutils
|
pacman -S uutils-coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -45,7 +47,7 @@ pacman -S uutils-coreutils
|
||||||
|
|
||||||
[![Debian package](https://repology.org/badge/version-for-repo/debian_unstable/uutils-coreutils.svg)](https://packages.debian.org/sid/source/rust-coreutils)
|
[![Debian package](https://repology.org/badge/version-for-repo/debian_unstable/uutils-coreutils.svg)](https://packages.debian.org/sid/source/rust-coreutils)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
apt install rust-coreutils
|
apt install rust-coreutils
|
||||||
# To use it:
|
# To use it:
|
||||||
export PATH=/usr/lib/cargo/bin/coreutils:$PATH
|
export PATH=/usr/lib/cargo/bin/coreutils:$PATH
|
||||||
|
@ -57,32 +59,35 @@ export PATH=/usr/lib/cargo/bin/coreutils:$PATH
|
||||||
|
|
||||||
[![Gentoo package](https://repology.org/badge/version-for-repo/gentoo/uutils-coreutils.svg)](https://packages.gentoo.org/packages/sys-apps/uutils-coreutils)
|
[![Gentoo package](https://repology.org/badge/version-for-repo/gentoo/uutils-coreutils.svg)](https://packages.gentoo.org/packages/sys-apps/uutils-coreutils)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
emerge -pv sys-apps/uutils-coreutils
|
emerge -pv sys-apps/uutils-coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
### Manjaro
|
### Manjaro
|
||||||
|
|
||||||
![Manjaro Stable package](https://repology.org/badge/version-for-repo/manjaro_stable/uutils-coreutils.svg)
|
![Manjaro Stable package](https://repology.org/badge/version-for-repo/manjaro_stable/uutils-coreutils.svg)
|
||||||
[![Manjaro Testing package](https://repology.org/badge/version-for-repo/manjaro_testing/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
[![Manjaro Testing package](https://repology.org/badge/version-for-repo/manjaro_testing/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||||
[![Manjaro Unstable package](https://repology.org/badge/version-for-repo/manjaro_unstable/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
[![Manjaro Unstable package](https://repology.org/badge/version-for-repo/manjaro_unstable/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
pacman -S uutils-coreutils
|
pacman -S uutils-coreutils
|
||||||
# or
|
# or
|
||||||
pamac install uutils-coreutils
|
pamac install uutils-coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
### NixOS
|
### NixOS
|
||||||
|
|
||||||
[![nixpkgs unstable package](https://repology.org/badge/version-for-repo/nix_unstable/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
[![nixpkgs unstable package](https://repology.org/badge/version-for-repo/nix_unstable/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
nix-env -iA nixos.uutils-coreutils
|
nix-env -iA nixos.uutils-coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
### OpenMandriva Lx
|
### OpenMandriva Lx
|
||||||
|
|
||||||
[![openmandriva cooker package](https://repology.org/badge/version-for-repo/openmandriva_cooker/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
[![openmandriva cooker package](https://repology.org/badge/version-for-repo/openmandriva_cooker/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
dnf install uutils-coreutils
|
dnf install uutils-coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -90,7 +95,7 @@ dnf install uutils-coreutils
|
||||||
|
|
||||||
[![Ubuntu package](https://repology.org/badge/version-for-repo/ubuntu_23_04/uutils-coreutils.svg)](https://packages.ubuntu.com/source/lunar/rust-coreutils)
|
[![Ubuntu package](https://repology.org/badge/version-for-repo/ubuntu_23_04/uutils-coreutils.svg)](https://packages.ubuntu.com/source/lunar/rust-coreutils)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
apt install rust-coreutils
|
apt install rust-coreutils
|
||||||
# To use it:
|
# To use it:
|
||||||
export PATH=/usr/lib/cargo/bin/coreutils:$PATH
|
export PATH=/usr/lib/cargo/bin/coreutils:$PATH
|
||||||
|
@ -101,13 +106,15 @@ export PATH=/usr/lib/cargo/bin/coreutils:$PATH
|
||||||
## MacOS
|
## MacOS
|
||||||
|
|
||||||
### Homebrew
|
### Homebrew
|
||||||
|
|
||||||
[![Homebrew package](https://repology.org/badge/version-for-repo/homebrew/uutils-coreutils.svg)](https://formulae.brew.sh/formula/uutils-coreutils)
|
[![Homebrew package](https://repology.org/badge/version-for-repo/homebrew/uutils-coreutils.svg)](https://formulae.brew.sh/formula/uutils-coreutils)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
brew install uutils-coreutils
|
brew install uutils-coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
### MacPorts
|
### MacPorts
|
||||||
|
|
||||||
[![MacPorts package](https://repology.org/badge/version-for-repo/macports/uutils-coreutils.svg)](https://ports.macports.org/port/coreutils-uutils/)
|
[![MacPorts package](https://repology.org/badge/version-for-repo/macports/uutils-coreutils.svg)](https://ports.macports.org/port/coreutils-uutils/)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -115,6 +122,7 @@ port install coreutils-uutils
|
||||||
```
|
```
|
||||||
|
|
||||||
## FreeBSD
|
## FreeBSD
|
||||||
|
|
||||||
[![FreeBSD port](https://repology.org/badge/version-for-repo/freebsd/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
[![FreeBSD port](https://repology.org/badge/version-for-repo/freebsd/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
|
@ -124,9 +132,10 @@ pkg install uutils
|
||||||
## Windows
|
## Windows
|
||||||
|
|
||||||
### Scoop
|
### Scoop
|
||||||
|
|
||||||
[![Scoop package](https://repology.org/badge/version-for-repo/scoop/uutils-coreutils.svg)](https://scoop.sh/#/apps?q=uutils-coreutils&s=0&d=1&o=true)
|
[![Scoop package](https://repology.org/badge/version-for-repo/scoop/uutils-coreutils.svg)](https://scoop.sh/#/apps?q=uutils-coreutils&s=0&d=1&o=true)
|
||||||
|
|
||||||
```bash
|
```shell
|
||||||
scoop install uutils-coreutils
|
scoop install uutils-coreutils
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -136,4 +145,6 @@ scoop install uutils-coreutils
|
||||||
|
|
||||||
[![AUR package](https://repology.org/badge/version-for-repo/aur/coreutils-hybrid.svg)](https://aur.archlinux.org/packages/coreutils-hybrid)
|
[![AUR package](https://repology.org/badge/version-for-repo/aur/coreutils-hybrid.svg)](https://aur.archlinux.org/packages/coreutils-hybrid)
|
||||||
|
|
||||||
A GNU coreutils / uutils coreutils hybrid package. Uses stable uutils programs mixed with GNU counterparts if uutils counterpart is unfinished or buggy.
|
A GNU coreutils / uutils coreutils hybrid package. Uses stable uutils
|
||||||
|
programs mixed with GNU counterparts if uutils counterpart is
|
||||||
|
unfinished or buggy.
|
||||||
|
|
|
@ -1,4 +1,5 @@
|
||||||
# Multi-call binary
|
# Multi-call binary
|
||||||
|
|
||||||
uutils includes a multi-call binary from which the utils can be invoked. This
|
uutils includes a multi-call binary from which the utils can be invoked. This
|
||||||
reduces the binary size of the binary and can be useful for portability.
|
reduces the binary size of the binary and can be useful for portability.
|
||||||
|
|
||||||
|
@ -12,6 +13,7 @@ coreutils [util] [util options]
|
||||||
The `--help` flag will print a list of available utils.
|
The `--help` flag will print a list of available utils.
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
```
|
|
||||||
|
```shell
|
||||||
coreutils ls -l
|
coreutils ls -l
|
||||||
```
|
```
|
|
@ -1,5 +1,7 @@
|
||||||
# GNU Test Coverage
|
# GNU Test Coverage
|
||||||
|
|
||||||
|
<!-- markdownlint-disable MD033 -->
|
||||||
|
|
||||||
uutils is actively tested against the GNU coreutils test suite. The results
|
uutils is actively tested against the GNU coreutils test suite. The results
|
||||||
below are automatically updated every day.
|
below are automatically updated every day.
|
||||||
|
|
||||||
|
|
|
@ -4,11 +4,8 @@
|
||||||
arch
|
arch
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
Display machine architecture
|
Display machine architecture
|
||||||
|
|
||||||
|
|
||||||
## After Help
|
## After Help
|
||||||
|
|
||||||
Determine architecture name for current machine.
|
Determine architecture name for current machine.
|
||||||
|
|
||||||
|
|
|
@ -7,8 +7,8 @@ base32 [OPTION]... [FILE]
|
||||||
encode/decode data and print to standard output
|
encode/decode data and print to standard output
|
||||||
With no FILE, or when FILE is -, read standard input.
|
With no FILE, or when FILE is -, read standard input.
|
||||||
|
|
||||||
The data are encoded as described for the base32 alphabet in RFC
|
The data are encoded as described for the base32 alphabet in RFC 4648.
|
||||||
4648. When decoding, the input may contain newlines in addition
|
When decoding, the input may contain newlines in addition
|
||||||
to the bytes of the formal base32 alphabet. Use --ignore-garbage
|
to the bytes of the formal base32 alphabet. Use --ignore-garbage
|
||||||
to attempt to recover from any other non-alphabet bytes in the
|
to attempt to recover from any other non-alphabet bytes in the
|
||||||
encoded stream.
|
encoded stream.
|
||||||
|
|
|
@ -7,8 +7,8 @@ base64 [OPTION]... [FILE]
|
||||||
encode/decode data and print to standard output
|
encode/decode data and print to standard output
|
||||||
With no FILE, or when FILE is -, read standard input.
|
With no FILE, or when FILE is -, read standard input.
|
||||||
|
|
||||||
The data are encoded as described for the base64 alphabet in RFC
|
The data are encoded as described for the base64 alphabet in RFC 3548.
|
||||||
3548. When decoding, the input may contain newlines in addition
|
When decoding, the input may contain newlines in addition
|
||||||
to the bytes of the formal base64 alphabet. Use --ignore-garbage
|
to the bytes of the formal base64 alphabet. Use --ignore-garbage
|
||||||
to attempt to recover from any other non-alphabet bytes in the
|
to attempt to recover from any other non-alphabet bytes in the
|
||||||
encoded stream.
|
encoded stream.
|
||||||
|
|
|
@ -1,18 +1,18 @@
|
||||||
<!-- markdownlint-disable first-line-heading -->
|
<!-- markdownlint-disable first-line-heading -->
|
||||||
<!-- spell-checker:ignore (markdown) markdownlint -->
|
<!-- spell-checker:ignore (markdown) markdownlint -->
|
||||||
|
|
||||||
## Feature list
|
# Feature list
|
||||||
|
|
||||||
<!-- spell-checker:ignore (options) linkgs reflink -->
|
<!-- spell-checker:ignore (options) linkgs reflink -->
|
||||||
|
|
||||||
### To Do
|
## To Do
|
||||||
|
|
||||||
- [ ] cli-symbolic-links
|
- [ ] cli-symbolic-links
|
||||||
- [ ] context
|
- [ ] context
|
||||||
- [ ] copy-contents
|
- [ ] copy-contents
|
||||||
- [ ] sparse
|
- [ ] sparse
|
||||||
|
|
||||||
### Completed
|
## Completed
|
||||||
|
|
||||||
- [x] archive
|
- [x] archive
|
||||||
- [x] attributes-only
|
- [x] attributes-only
|
||||||
|
|
|
@ -1,46 +1,45 @@
|
||||||
## Benchmarking cut
|
# Benchmarking cut
|
||||||
|
|
||||||
### Performance profile
|
## Performance profile
|
||||||
|
|
||||||
In normal use cases a significant amount of the total execution time of `cut`
|
In normal use cases a significant amount of the total execution time of `cut`
|
||||||
is spent performing I/O. When invoked with the `-f` option (cut fields) some
|
is spent performing I/O. When invoked with the `-f` option (cut fields) some
|
||||||
CPU time is spent on detecting fields (in `Searcher::next`). Other than that
|
CPU time is spent on detecting fields (in `Searcher::next`). Other than that
|
||||||
some small amount of CPU time is spent on breaking the input stream into lines.
|
some small amount of CPU time is spent on breaking the input stream into lines.
|
||||||
|
|
||||||
|
## How to
|
||||||
### How to
|
|
||||||
|
|
||||||
When fixing bugs or adding features you might want to compare
|
When fixing bugs or adding features you might want to compare
|
||||||
performance before and after your code changes.
|
performance before and after your code changes.
|
||||||
|
|
||||||
- `hyperfine` can be used to accurately measure and compare the total
|
- `hyperfine` can be used to accurately measure and compare the total
|
||||||
execution time of one or more commands.
|
execution time of one or more commands.
|
||||||
|
|
||||||
```
|
```shell
|
||||||
$ cargo build --release --package uu_cut
|
cargo build --release --package uu_cut
|
||||||
|
|
||||||
$ hyperfine -w3 "./target/release/cut -f2-4,8 -d' ' input.txt" "cut -f2-4,8 -d' ' input.txt"
|
hyperfine -w3 "./target/release/cut -f2-4,8 -d' ' input.txt" "cut -f2-4,8 -d' ' input.txt"
|
||||||
```
|
```
|
||||||
|
|
||||||
You can put those two commands in a shell script to be sure that you don't
|
You can put those two commands in a shell script to be sure that you don't
|
||||||
forget to build after making any changes.
|
forget to build after making any changes.
|
||||||
|
|
||||||
When optimizing or fixing performance regressions seeing the number of times a
|
When optimizing or fixing performance regressions seeing the number of times a
|
||||||
function is called, and the amount of time it takes can be useful.
|
function is called, and the amount of time it takes can be useful.
|
||||||
|
|
||||||
- `cargo flamegraph` generates flame graphs from function level metrics it records using `perf` or `dtrace`
|
- `cargo flamegraph` generates flame graphs from function level metrics it records using `perf` or `dtrace`
|
||||||
|
|
||||||
```
|
```shell
|
||||||
$ cargo flamegraph --bin cut --package uu_cut -- -f1,3-4 input.txt > /dev/null
|
cargo flamegraph --bin cut --package uu_cut -- -f1,3-4 input.txt > /dev/null
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## What to benchmark
|
||||||
### What to benchmark
|
|
||||||
|
|
||||||
There are four different performance paths in `cut` to benchmark.
|
There are four different performance paths in `cut` to benchmark.
|
||||||
|
|
||||||
- Byte ranges `-c`/`--characters` or `-b`/`--bytes` e.g. `cut -c 2,4,6-`
|
- Byte ranges `-c`/`--characters` or `-b`/`--bytes` e.g. `cut -c 2,4,6-`
|
||||||
- Byte ranges with output delimiters e.g. `cut -c 4- --output-delimiter=/`
|
- Byte ranges with output delimiters e.g. `cut -c 4- --output-delimiter=/`
|
||||||
- Fields e.g. `cut -f -4`
|
- Fields e.g. `cut -f -4`
|
||||||
- Fields with output delimiters e.g. `cut -f 7-10 --output-delimiter=:`
|
- Fields with output delimiters e.g. `cut -f 7-10 --output-delimiter=:`
|
||||||
|
|
||||||
Choose a test input file with large number of lines so that program startup time does not significantly affect the benchmark.
|
Choose a test input file with large number of lines so that program startup time does not significantly affect the benchmark.
|
||||||
|
|
|
@ -45,7 +45,7 @@ be roughly equivalent to the total bytes copied (`blocksize` x `count`).
|
||||||
|
|
||||||
Some useful invocations for testing would be the following:
|
Some useful invocations for testing would be the following:
|
||||||
|
|
||||||
```
|
```shell
|
||||||
hyperfine "./target/release/dd bs=4k count=1000000 < /dev/zero > /dev/null"
|
hyperfine "./target/release/dd bs=4k count=1000000 < /dev/zero > /dev/null"
|
||||||
hyperfine "./target/release/dd bs=1M count=20000 < /dev/zero > /dev/null"
|
hyperfine "./target/release/dd bs=1M count=20000 < /dev/zero > /dev/null"
|
||||||
hyperfine "./target/release/dd bs=1G count=10 < /dev/zero > /dev/null"
|
hyperfine "./target/release/dd bs=1G count=10 < /dev/zero > /dev/null"
|
||||||
|
@ -57,7 +57,7 @@ Typically you would choose a small blocksize for measuring the performance of
|
||||||
typically does some set amount of work per block which only depends on the size
|
typically does some set amount of work per block which only depends on the size
|
||||||
of the block if conversions are used.
|
of the block if conversions are used.
|
||||||
|
|
||||||
As an example, https://github.com/uutils/coreutils/pull/3600 made a change to
|
As an example, <https://github.com/uutils/coreutils/pull/3600> made a change to
|
||||||
reuse the same buffer between block copies, avoiding the need to reallocate a
|
reuse the same buffer between block copies, avoiding the need to reallocate a
|
||||||
new block of memory for each copy. The impact of that change mostly had an
|
new block of memory for each copy. The impact of that change mostly had an
|
||||||
impact on large block size copies because those are the circumstances where the
|
impact on large block size copies because those are the circumstances where the
|
||||||
|
|
200
src/uu/dd/dd.md
200
src/uu/dd/dd.md
|
@ -1,6 +1,7 @@
|
||||||
<!-- spell-checker:ignore convs iseek oseek -->
|
|
||||||
# dd
|
# dd
|
||||||
|
|
||||||
|
<!-- spell-checker:ignore convs iseek oseek -->
|
||||||
|
|
||||||
```
|
```
|
||||||
dd [OPERAND]...
|
dd [OPERAND]...
|
||||||
dd OPTION
|
dd OPTION
|
||||||
|
@ -10,117 +11,116 @@ Copy, and optionally convert, a file system resource
|
||||||
|
|
||||||
## After Help
|
## After Help
|
||||||
|
|
||||||
OPERANDS:
|
### Operands
|
||||||
|
|
||||||
bs=BYTES read and write up to BYTES bytes at a time (default: 512);
|
- `Bs=BYTES` : read and write up to BYTES bytes at a time (default: 512);
|
||||||
overwrites ibs and obs.
|
overwrites `ibs` and `obs`.
|
||||||
cbs=BYTES the 'conversion block size' in bytes. Applies to
|
- `cbs=BYTES` : the 'conversion block size' in bytes. Applies to the
|
||||||
the conv=block, and conv=unblock operations.
|
`conv=block`, and `conv=unblock` operations.
|
||||||
conv=CONVS a comma-separated list of conversion options or
|
- `conv=CONVS` : a comma-separated list of conversion options or (for legacy
|
||||||
(for legacy reasons) file flags.
|
reasons) file flags.
|
||||||
count=N stop reading input after N ibs-sized read operations rather
|
- `count=N` : stop reading input after N ibs-sized read operations rather
|
||||||
than proceeding until EOF. See iflag=count_bytes if stopping
|
than proceeding until EOF. See `iflag=count_bytes` if stopping after N bytes
|
||||||
after N bytes is preferred
|
is preferred
|
||||||
ibs=N the size of buffer used for reads (default: 512)
|
- `ibs=N` : the size of buffer used for reads (default: 512)
|
||||||
if=FILE the file used for input. When not specified, stdin is used instead
|
- `if=FILE` : the file used for input. When not specified, stdin is used instead
|
||||||
iflag=FLAGS a comma-separated list of input flags which specify how the input
|
- `iflag=FLAGS` : a comma-separated list of input flags which specify how the
|
||||||
source is treated. FLAGS may be any of the input-flags or
|
input source is treated. FLAGS may be any of the input-flags or general-flags
|
||||||
general-flags specified below.
|
specified below.
|
||||||
skip=N (or iseek=N) skip N ibs-sized records into input before beginning
|
- `skip=N` (or `iseek=N`) : skip N ibs-sized records into input before beginning
|
||||||
copy/convert operations. See iflag=seek_bytes if seeking N bytes
|
copy/convert operations. See iflag=seek_bytes if seeking N bytes is preferred.
|
||||||
is preferred.
|
- `obs=N` : the size of buffer used for writes (default: 512)
|
||||||
obs=N the size of buffer used for writes (default: 512)
|
- `of=FILE` : the file used for output. When not specified, stdout is used
|
||||||
of=FILE the file used for output. When not specified, stdout is used
|
instead
|
||||||
instead
|
- `oflag=FLAGS` : comma separated list of output flags which specify how the
|
||||||
oflag=FLAGS comma separated list of output flags which specify how the output
|
output source is treated. FLAGS may be any of the output flags or general
|
||||||
source is treated. FLAGS may be any of the output flags or
|
flags specified below
|
||||||
general flags specified below
|
- `seek=N` (or `oseek=N`) : seeks N obs-sized records into output before
|
||||||
seek=N (or oseek=N) seeks N obs-sized records into output before
|
beginning copy/convert operations. See oflag=seek_bytes if seeking N bytes is
|
||||||
beginning copy/convert operations. See oflag=seek_bytes if
|
preferred
|
||||||
seeking N bytes is preferred
|
- `status=LEVEL` : controls whether volume and performance stats are written to
|
||||||
status=LEVEL controls whether volume and performance stats are written to
|
stderr.
|
||||||
stderr.
|
|
||||||
|
|
||||||
When unspecified, dd will print stats upon completion. An example is below.
|
When unspecified, dd will print stats upon completion. An example is below.
|
||||||
6+0 records in
|
|
||||||
16+0 records out
|
|
||||||
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.00057009 s, 14.4 MB/s
|
|
||||||
The first two lines are the 'volume' stats and the final line is
|
|
||||||
the 'performance' stats.
|
|
||||||
The volume stats indicate the number of complete and partial
|
|
||||||
ibs-sized reads, or obs-sized writes that took place during the
|
|
||||||
copy. The format of the volume stats is
|
|
||||||
<complete>+<partial>. If records have been truncated (see
|
|
||||||
conv=block), the volume stats will contain the number of
|
|
||||||
truncated records.
|
|
||||||
|
|
||||||
Possible LEVEL values are:
|
```plain
|
||||||
progress: Print periodic performance stats as the copy
|
6+0 records in
|
||||||
proceeds.
|
16+0 records out
|
||||||
noxfer: Print final volume stats, but not performance stats.
|
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.00057009 s,
|
||||||
none: Do not print any stats.
|
14.4 MB/s
|
||||||
|
```
|
||||||
|
|
||||||
Printing performance stats is also triggered by the INFO signal
|
The first two lines are the 'volume' stats and the final line is the
|
||||||
(where supported), or the USR1 signal. Setting the
|
'performance' stats.
|
||||||
POSIXLY_CORRECT environment variable to any value (including an
|
The volume stats indicate the number of complete and partial ibs-sized reads,
|
||||||
empty value) will cause the USR1 signal to be ignored.
|
or obs-sized writes that took place during the copy. The format of the volume
|
||||||
|
stats is `<complete>+<partial>`. If records have been truncated (see
|
||||||
|
`conv=block`), the volume stats will contain the number of truncated records.
|
||||||
|
|
||||||
CONVERSION OPTIONS:
|
Possible LEVEL values are:
|
||||||
|
- `progress` : Print periodic performance stats as the copy proceeds.
|
||||||
|
- `noxfer` : Print final volume stats, but not performance stats.
|
||||||
|
- `none` : Do not print any stats.
|
||||||
|
|
||||||
ascii convert from EBCDIC to ASCII. This is the inverse of the 'ebcdic'
|
Printing performance stats is also triggered by the INFO signal (where supported),
|
||||||
option. Implies conv=unblock.
|
or the USR1 signal. Setting the POSIXLY_CORRECT environment variable to any value
|
||||||
ebcdic convert from ASCII to EBCDIC. This is the inverse of the 'ascii'
|
(including an empty value) will cause the USR1 signal to be ignored.
|
||||||
option. Implies conv=block.
|
|
||||||
ibm convert from ASCII to EBCDIC, applying the conventions for '[', ']'
|
|
||||||
and '~' specified in POSIX. Implies conv=block.
|
|
||||||
|
|
||||||
ucase convert from lower-case to upper-case
|
### Conversion Options
|
||||||
lcase converts from upper-case to lower-case.
|
|
||||||
|
|
||||||
block for each newline less than the size indicated by cbs=BYTES, remove
|
- `ascii` : convert from EBCDIC to ASCII. This is the inverse of the `ebcdic`
|
||||||
the newline and pad with spaces up to cbs. Lines longer than cbs are
|
option. Implies `conv=unblock`.
|
||||||
truncated.
|
- `ebcdic` : convert from ASCII to EBCDIC. This is the inverse of the `ascii`
|
||||||
unblock for each block of input of the size indicated by cbs=BYTES, remove
|
option. Implies `conv=block`.
|
||||||
right-trailing spaces and replace with a newline character.
|
- `ibm` : convert from ASCII to EBCDIC, applying the conventions for `[`, `]`
|
||||||
|
and `~` specified in POSIX. Implies `conv=block`.
|
||||||
|
|
||||||
sparse attempts to seek the output when an obs-sized block consists of only
|
- `ucase` : convert from lower-case to upper-case.
|
||||||
zeros.
|
- `lcase` : converts from upper-case to lower-case.
|
||||||
swab swaps each adjacent pair of bytes. If an odd number of bytes is
|
|
||||||
present, the final byte is omitted.
|
|
||||||
sync pad each ibs-sided block with zeros. If 'block' or 'unblock' is
|
|
||||||
specified, pad with spaces instead.
|
|
||||||
excl the output file must be created. Fail if the output file is already
|
|
||||||
present.
|
|
||||||
nocreat the output file will not be created. Fail if the output file in not
|
|
||||||
already present.
|
|
||||||
notrunc the output file will not be truncated. If this option is not
|
|
||||||
present, output will be truncated when opened.
|
|
||||||
noerror all read errors will be ignored. If this option is not present, dd
|
|
||||||
will only ignore Error::Interrupted.
|
|
||||||
fdatasync data will be written before finishing.
|
|
||||||
fsync data and metadata will be written before finishing.
|
|
||||||
|
|
||||||
INPUT FLAGS:
|
- `block` : for each newline less than the size indicated by cbs=BYTES, remove
|
||||||
|
the newline and pad with spaces up to cbs. Lines longer than cbs are truncated.
|
||||||
|
- `unblock` : for each block of input of the size indicated by cbs=BYTES, remove
|
||||||
|
right-trailing spaces and replace with a newline character.
|
||||||
|
|
||||||
count_bytes a value to count=N will be interpreted as bytes.
|
- `sparse` : attempts to seek the output when an obs-sized block consists of
|
||||||
skip_bytes a value to skip=N will be interpreted as bytes.
|
only zeros.
|
||||||
fullblock wait for ibs bytes from each read. zero-length reads are still
|
- `swab` : swaps each adjacent pair of bytes. If an odd number of bytes is
|
||||||
considered EOF.
|
present, the final byte is omitted.
|
||||||
|
- `sync` : pad each ibs-sided block with zeros. If `block` or `unblock` is
|
||||||
|
specified, pad with spaces instead.
|
||||||
|
- `excl` : the output file must be created. Fail if the output file is already
|
||||||
|
present.
|
||||||
|
- `nocreat` : the output file will not be created. Fail if the output file in
|
||||||
|
not already present.
|
||||||
|
- `notrunc` : the output file will not be truncated. If this option is not
|
||||||
|
present, output will be truncated when opened.
|
||||||
|
- `noerror` : all read errors will be ignored. If this option is not present,
|
||||||
|
dd will only ignore Error::Interrupted.
|
||||||
|
- `fdatasync` : data will be written before finishing.
|
||||||
|
- `fsync` : data and metadata will be written before finishing.
|
||||||
|
|
||||||
OUTPUT FLAGS:
|
### Input flags
|
||||||
|
|
||||||
append open file in append mode. Consider setting conv=notrunc as well.
|
- `count_bytes` : a value to `count=N` will be interpreted as bytes.
|
||||||
seek_bytes a value to seek=N will be interpreted as bytes.
|
- `skip_bytes` : a value to `skip=N` will be interpreted as bytes.
|
||||||
|
- `fullblock` : wait for ibs bytes from each read. zero-length reads are still
|
||||||
|
considered EOF.
|
||||||
|
|
||||||
GENERAL FLAGS:
|
### Output flags
|
||||||
|
|
||||||
direct use direct I/O for data.
|
- `append` : open file in append mode. Consider setting conv=notrunc as well.
|
||||||
directory fail unless the given input (if used as an iflag) or output (if used
|
- `seek_bytes` : a value to seek=N will be interpreted as bytes.
|
||||||
as an oflag) is a directory.
|
|
||||||
dsync use synchronized I/O for data.
|
### General Flags
|
||||||
sync use synchronized I/O for data and metadata.
|
|
||||||
nonblock use non-blocking I/O.
|
- `Direct` : use direct I/O for data.
|
||||||
noatime do not update access time.
|
- `directory` : fail unless the given input (if used as an iflag) or
|
||||||
nocache request that OS drop cache.
|
output (if used as an oflag) is a directory.
|
||||||
noctty do not assign a controlling tty.
|
- `dsync` : use synchronized I/O for data.
|
||||||
nofollow do not follow system links.
|
- `sync` : use synchronized I/O for data and metadata.
|
||||||
|
- `nonblock` : use non-blocking I/O.
|
||||||
|
- `noatime` : do not update access time.
|
||||||
|
- `nocache` : request that OS drop cache.
|
||||||
|
- `noctty` : do not assign a controlling tty.
|
||||||
|
- `nofollow` : do not follow system links.
|
||||||
|
|
|
@ -1,18 +1,18 @@
|
||||||
## How to update the internal database
|
# How to update the internal database
|
||||||
|
|
||||||
Create the test fixtures by writing the output of the GNU dircolors commands to the fixtures folder:
|
Create the test fixtures by writing the output of the GNU dircolors commands to the fixtures folder:
|
||||||
|
|
||||||
```
|
```shell
|
||||||
$ dircolors --print-database > /PATH_TO_COREUTILS/tests/fixtures/dircolors/internal.expected
|
dircolors --print-database > /PATH_TO_COREUTILS/tests/fixtures/dircolors/internal.expected
|
||||||
$ dircolors --print-ls-colors > /PATH_TO_COREUTILS/tests/fixtures/dircolors/ls_colors.expected
|
dircolors --print-ls-colors > /PATH_TO_COREUTILS/tests/fixtures/dircolors/ls_colors.expected
|
||||||
$ dircolors -b > /PATH_TO_COREUTILS/tests/fixtures/dircolors/bash_def.expected
|
dircolors -b > /PATH_TO_COREUTILS/tests/fixtures/dircolors/bash_def.expected
|
||||||
$ dircolors -c > /PATH_TO_COREUTILS/tests/fixtures/dircolors/csh_def.expected
|
dircolors -c > /PATH_TO_COREUTILS/tests/fixtures/dircolors/csh_def.expected
|
||||||
```
|
```
|
||||||
|
|
||||||
Run the tests:
|
Run the tests:
|
||||||
|
|
||||||
```
|
```shell
|
||||||
$ cargo test --features "dircolors" --no-default-features
|
cargo test --features "dircolors" --no-default-features
|
||||||
```
|
```
|
||||||
|
|
||||||
Edit `/PATH_TO_COREUTILS/src/uu/dircolors/src/colors.rs` until the tests pass.
|
Edit `/PATH_TO_COREUTILS/src/uu/dircolors/src/colors.rs` until the tests pass.
|
||||||
|
|
|
@ -19,6 +19,6 @@ of 1000).
|
||||||
|
|
||||||
PATTERN allows some advanced exclusions. For example, the following syntaxes
|
PATTERN allows some advanced exclusions. For example, the following syntaxes
|
||||||
are supported:
|
are supported:
|
||||||
? will match only one character
|
`?` will match only one character
|
||||||
* will match zero or more characters
|
`*` will match zero or more characters
|
||||||
{a,b} will match a or b
|
`{a,b}` will match a or b
|
||||||
|
|
|
@ -14,45 +14,40 @@ separates increasing precedence groups.
|
||||||
|
|
||||||
`EXPRESSION` may be:
|
`EXPRESSION` may be:
|
||||||
|
|
||||||
ARG1 | ARG2 ARG1 if it is neither null nor 0, otherwise ARG2
|
- `ARG1 | ARG2`: `ARG1` if it is neither null nor 0, otherwise `ARG2`
|
||||||
|
- `ARG1 & ARG2`: `ARG1` if neither argument is null or 0, otherwise 0
|
||||||
ARG1 & ARG2 ARG1 if neither argument is null or 0, otherwise 0
|
- `ARG1 < ARG2`: `ARG1` is less than `ARG2`
|
||||||
|
- `ARG1 <= ARG2`: `ARG1` is less than or equal to `ARG2`
|
||||||
ARG1 < ARG2 ARG1 is less than ARG2
|
- `ARG1 = ARG2`: `ARG1` is equal to `ARG2`
|
||||||
ARG1 <= ARG2 ARG1 is less than or equal to ARG2
|
- `ARG1 != ARG2`: `ARG1` is unequal to `ARG2`
|
||||||
ARG1 = ARG2 ARG1 is equal to ARG2
|
- `ARG1 >= ARG2`: `ARG1` is greater than or equal to `ARG2`
|
||||||
ARG1 != ARG2 ARG1 is unequal to ARG2
|
- `ARG1 > ARG2`: `ARG1` is greater than `ARG2`
|
||||||
ARG1 >= ARG2 ARG1 is greater than or equal to ARG2
|
- `ARG1 + ARG2`: arithmetic sum of `ARG1` and `ARG2`
|
||||||
ARG1 > ARG2 ARG1 is greater than ARG2
|
- `ARG1 - ARG2`: arithmetic difference of `ARG1` and `ARG2`
|
||||||
|
- `ARG1 * ARG2`: arithmetic product of `ARG1` and `ARG2`
|
||||||
ARG1 + ARG2 arithmetic sum of ARG1 and ARG2
|
- `ARG1 / ARG2`: arithmetic quotient of `ARG1` divided by `ARG2`
|
||||||
ARG1 - ARG2 arithmetic difference of ARG1 and ARG2
|
- `ARG1 % ARG2`: arithmetic remainder of `ARG1` divided by `ARG2`
|
||||||
|
- `STRING : REGEXP`: anchored pattern match of `REGEXP` in `STRING`
|
||||||
ARG1 * ARG2 arithmetic product of ARG1 and ARG2
|
- `match STRING REGEXP`: same as `STRING : REGEXP`
|
||||||
ARG1 / ARG2 arithmetic quotient of ARG1 divided by ARG2
|
- `substr STRING POS LENGTH`: substring of `STRING`, `POS` counted from 1
|
||||||
ARG1 % ARG2 arithmetic remainder of ARG1 divided by ARG2
|
- `index STRING CHARS`: index in `STRING` where any `CHARS` is found, or 0
|
||||||
|
- `length STRING`: length of `STRING`
|
||||||
STRING : REGEXP anchored pattern match of REGEXP in STRING
|
- `+ TOKEN`: interpret `TOKEN` as a string, even if it is a keyword like `match`
|
||||||
|
or an operator like `/`
|
||||||
match STRING REGEXP same as STRING : REGEXP
|
- `( EXPRESSION )`: value of `EXPRESSION`
|
||||||
substr STRING POS LENGTH substring of STRING, POS counted from 1
|
|
||||||
index STRING CHARS index in STRING where any CHARS is found, or 0
|
|
||||||
length STRING length of STRING
|
|
||||||
+ TOKEN interpret TOKEN as a string, even if it is a
|
|
||||||
keyword like 'match' or an operator like '/'
|
|
||||||
|
|
||||||
( EXPRESSION ) value of EXPRESSION
|
|
||||||
|
|
||||||
Beware that many operators need to be escaped or quoted for shells.
|
Beware that many operators need to be escaped or quoted for shells.
|
||||||
Comparisons are arithmetic if both ARGs are numbers, else lexicographical.
|
Comparisons are arithmetic if both ARGs are numbers, else lexicographical.
|
||||||
Pattern matches return the string matched between \( and \) or null; if
|
Pattern matches return the string matched between \( and \) or null; if
|
||||||
\( and \) are not used, they return the number of characters matched or 0.
|
\( and \) are not used, they return the number of characters matched or 0.
|
||||||
|
|
||||||
Exit status is `0` if `EXPRESSION` is neither null nor `0`, `1` if `EXPRESSION` is null
|
Exit status is `0` if `EXPRESSION` is neither null nor `0`, `1` if `EXPRESSION`
|
||||||
or `0`, `2` if `EXPRESSION` is syntactically invalid, and `3` if an error occurred.
|
is null or `0`, `2` if `EXPRESSION` is syntactically invalid, and `3` if an
|
||||||
|
error occurred.
|
||||||
|
|
||||||
Environment variables:
|
Environment variables:
|
||||||
- `EXPR_DEBUG_TOKENS=1`: dump expression's tokens
|
|
||||||
- `EXPR_DEBUG_RPN=1`: dump expression represented in reverse polish notation
|
- `EXPR_DEBUG_TOKENS=1`: dump expression's tokens
|
||||||
- `EXPR_DEBUG_SYA_STEP=1`: dump each parser step
|
- `EXPR_DEBUG_RPN=1`: dump expression represented in reverse polish notation
|
||||||
- `EXPR_DEBUG_AST=1`: dump expression represented abstract syntax tree
|
- `EXPR_DEBUG_SYA_STEP=1`: dump each parser step
|
||||||
|
- `EXPR_DEBUG_AST=1`: dump expression represented abstract syntax tree
|
||||||
|
|
|
@ -53,19 +53,19 @@ which I recommend reading if you want to add benchmarks to `factor`.
|
||||||
so each sample takes a very short time, minimizing variability and
|
so each sample takes a very short time, minimizing variability and
|
||||||
maximizing the numbers of samples we can take in a given time.
|
maximizing the numbers of samples we can take in a given time.
|
||||||
|
|
||||||
2. Benchmarks are immutable (once merged in `uutils`)
|
1. Benchmarks are immutable (once merged in `uutils`)
|
||||||
|
|
||||||
Modifying a benchmark means previously-collected values cannot meaningfully
|
Modifying a benchmark means previously-collected values cannot meaningfully
|
||||||
be compared, silently giving nonsensical results. If you must modify an
|
be compared, silently giving nonsensical results. If you must modify an
|
||||||
existing benchmark, rename it.
|
existing benchmark, rename it.
|
||||||
|
|
||||||
3. Test common cases
|
1. Test common cases
|
||||||
|
|
||||||
We are interested in overall performance, rather than specific edge-cases;
|
We are interested in overall performance, rather than specific edge-cases;
|
||||||
use **reproducibly-randomized inputs**, sampling from either all possible
|
use **reproducibly-randomized inputs**, sampling from either all possible
|
||||||
input values or some subset of interest.
|
input values or some subset of interest.
|
||||||
|
|
||||||
4. Use [`criterion`], `criterion::black_box`, ...
|
1. Use [`criterion`], `criterion::black_box`, ...
|
||||||
|
|
||||||
`criterion` isn't perfect, but it is also much better than ad-hoc
|
`criterion` isn't perfect, but it is also much better than ad-hoc
|
||||||
solutions in each benchmark.
|
solutions in each benchmark.
|
||||||
|
@ -103,7 +103,7 @@ characteristics:
|
||||||
1. integer factoring algorithms are randomized, with large variance in
|
1. integer factoring algorithms are randomized, with large variance in
|
||||||
execution time ;
|
execution time ;
|
||||||
|
|
||||||
2. various inputs also have large differences in factoring time, that
|
1. various inputs also have large differences in factoring time, that
|
||||||
corresponds to no natural, linear ordering of the inputs.
|
corresponds to no natural, linear ordering of the inputs.
|
||||||
|
|
||||||
If (1) was untrue (i.e. if execution time wasn't random), we could faithfully
|
If (1) was untrue (i.e. if execution time wasn't random), we could faithfully
|
||||||
|
|
|
@ -1,9 +1,11 @@
|
||||||
## Benchmarking hashsum
|
# Benchmarking hashsum
|
||||||
|
|
||||||
### To bench blake2
|
## To bench blake2
|
||||||
|
|
||||||
Taken from: https://github.com/uutils/coreutils/pull/2296
|
Taken from: <https://github.com/uutils/coreutils/pull/2296>
|
||||||
|
|
||||||
With a large file:
|
With a large file:
|
||||||
$ hyperfine "./target/release/coreutils hashsum --b2sum large-file" "b2sum large-file"
|
|
||||||
|
|
||||||
|
```shell
|
||||||
|
hyperfine "./target/release/coreutils hashsum --b2sum large-file" "b2sum large-file"
|
||||||
|
```
|
||||||
|
|
|
@ -5,23 +5,31 @@ GNU version of `head`, you can use a benchmarking tool like
|
||||||
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
||||||
running
|
running
|
||||||
|
|
||||||
sudo apt-get install hyperfine
|
```shell
|
||||||
|
sudo apt-get install hyperfine
|
||||||
|
```
|
||||||
|
|
||||||
Next, build the `head` binary under the release profile:
|
Next, build the `head` binary under the release profile:
|
||||||
|
|
||||||
cargo build --release -p uu_head
|
```shell
|
||||||
|
cargo build --release -p uu_head
|
||||||
|
```
|
||||||
|
|
||||||
Now, get a text file to test `head` on. I used the *Complete Works of
|
Now, get a text file to test `head` on. I used the *Complete Works of
|
||||||
William Shakespeare*, which is in the public domain in the United States
|
William Shakespeare*, which is in the public domain in the United States
|
||||||
and most other parts of the world.
|
and most other parts of the world.
|
||||||
|
|
||||||
wget -O shakespeare.txt https://www.gutenberg.org/files/100/100-0.txt
|
```shell
|
||||||
|
wget -O shakespeare.txt https://www.gutenberg.org/files/100/100-0.txt
|
||||||
|
```
|
||||||
|
|
||||||
This particular file has about 170,000 lines, each of which is no longer
|
This particular file has about 170,000 lines, each of which is no longer
|
||||||
than 96 characters:
|
than 96 characters:
|
||||||
|
|
||||||
$ wc -lL shakespeare.txt
|
```shell
|
||||||
170592 96 shakespeare.txt
|
$ wc -lL shakespeare.txt
|
||||||
|
170592 96 shakespeare.txt
|
||||||
|
```
|
||||||
|
|
||||||
You could use files of different shapes and sizes to test the
|
You could use files of different shapes and sizes to test the
|
||||||
performance of `head` in different situations. For a larger file, you
|
performance of `head` in different situations. For a larger file, you
|
||||||
|
@ -32,9 +40,11 @@ contains about 130 million lines.
|
||||||
Finally, you can compare the performance of the two versions of `head`
|
Finally, you can compare the performance of the two versions of `head`
|
||||||
by running, for example,
|
by running, for example,
|
||||||
|
|
||||||
hyperfine \
|
```shell
|
||||||
"head -n 100000 shakespeare.txt" \
|
hyperfine \
|
||||||
"target/release/head -n 100000 shakespeare.txt"
|
"head -n 100000 shakespeare.txt" \
|
||||||
|
"target/release/head -n 100000 shakespeare.txt"
|
||||||
|
```
|
||||||
|
|
||||||
[0]: https://github.com/sharkdp/hyperfine
|
[0]: https://github.com/sharkdp/hyperfine
|
||||||
[1]: https://www.wikidata.org/wiki/Wikidata:Database_download
|
[1]: https://www.wikidata.org/wiki/Wikidata:Database_download
|
||||||
|
|
|
@ -17,11 +17,14 @@ A benchmark with `-j` and `-i` shows the following time:
|
||||||
| libc | 25% | I/O and memory allocation. |
|
| libc | 25% | I/O and memory allocation. |
|
||||||
|
|
||||||
More detailed profiles can be obtained via [flame graphs](https://github.com/flamegraph-rs/flamegraph):
|
More detailed profiles can be obtained via [flame graphs](https://github.com/flamegraph-rs/flamegraph):
|
||||||
```
|
|
||||||
|
```shell
|
||||||
cargo flamegraph --bin join --package uu_join -- file1 file2 > /dev/null
|
cargo flamegraph --bin join --package uu_join -- file1 file2 > /dev/null
|
||||||
```
|
```
|
||||||
|
|
||||||
You may need to add the following lines to the top-level `Cargo.toml` to get full stack traces:
|
You may need to add the following lines to the top-level `Cargo.toml` to get full stack traces:
|
||||||
```
|
|
||||||
|
```toml
|
||||||
[profile.release]
|
[profile.release]
|
||||||
debug = true
|
debug = true
|
||||||
```
|
```
|
||||||
|
@ -34,22 +37,26 @@ in practice many CSV datasets will function well after being sorted.
|
||||||
|
|
||||||
Like most of the utils, the recommended tool for benchmarking is [hyperfine](https://github.com/sharkdp/hyperfine).
|
Like most of the utils, the recommended tool for benchmarking is [hyperfine](https://github.com/sharkdp/hyperfine).
|
||||||
To benchmark your changes:
|
To benchmark your changes:
|
||||||
- checkout the main branch (without your changes), do a `--release` build, and back up the executable produced at `target/release/join`
|
|
||||||
- checkout your working branch (with your changes), do a `--release` build
|
- checkout the main branch (without your changes), do a `--release` build, and back up the executable produced at `target/release/join`
|
||||||
- run
|
- checkout your working branch (with your changes), do a `--release` build
|
||||||
```
|
- run
|
||||||
hyperfine -w 5 "/path/to/main/branch/build/join file1 file2" "/path/to/working/branch/build/join file1 file2"
|
|
||||||
```
|
```shell
|
||||||
- you'll likely need to add additional options to both commands, such as a field separator, or if you're benchmarking some particular behavior
|
hyperfine -w 5 "/path/to/main/branch/build/join file1 file2" "/path/to/working/branch/build/join file1 file2"
|
||||||
- you can also optionally benchmark against GNU's join
|
```
|
||||||
|
|
||||||
|
- you'll likely need to add additional options to both commands, such as a field separator, or if you're benchmarking some particular behavior
|
||||||
|
- you can also optionally benchmark against GNU's join
|
||||||
|
|
||||||
## What to benchmark
|
## What to benchmark
|
||||||
|
|
||||||
The following options can have a non-trivial impact on performance:
|
The following options can have a non-trivial impact on performance:
|
||||||
- `-a`/`-v` if one of the two files has significantly more lines than the other
|
|
||||||
- `-j`/`-1`/`-2` cause work to be done to grab the appropriate field
|
- `-a`/`-v` if one of the two files has significantly more lines than the other
|
||||||
- `-i` adds a call to `to_ascii_lowercase()` that adds some time for allocating and dropping memory for the lowercase key
|
- `-j`/`-1`/`-2` cause work to be done to grab the appropriate field
|
||||||
- `--nocheck-order` causes some calls of `Input::compare` to be skipped
|
- `-i` adds a call to `to_ascii_lowercase()` that adds some time for allocating and dropping memory for the lowercase key
|
||||||
|
- `--nocheck-order` causes some calls of `Input::compare` to be skipped
|
||||||
|
|
||||||
The content of the files being joined has a very significant impact on the performance.
|
The content of the files being joined has a very significant impact on the performance.
|
||||||
Things like how long each line is, how many fields there are, how long the key fields are, how many lines there are, how many lines can be joined, and how many lines each line can be joined with all change the behavior of the hotpaths.
|
Things like how long each line is, how many fields there are, how long the key fields are, how many lines there are, how many lines can be joined, and how many lines each line can be joined with all change the behavior of the hotpaths.
|
||||||
|
|
|
@ -9,13 +9,13 @@ Run `cargo build --release` before benchmarking after you make a change!
|
||||||
|
|
||||||
## Simple recursive ls
|
## Simple recursive ls
|
||||||
|
|
||||||
- Get a large tree, for example linux kernel source tree.
|
- Get a large tree, for example linux kernel source tree.
|
||||||
- Benchmark simple recursive ls with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -R tree > /dev/null"`.
|
- Benchmark simple recursive ls with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -R tree > /dev/null"`.
|
||||||
|
|
||||||
## Recursive ls with all and long options
|
## Recursive ls with all and long options
|
||||||
|
|
||||||
- Same tree as above
|
- Same tree as above
|
||||||
- Benchmark recursive ls with -al -R options with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/null"`.
|
- Benchmark recursive ls with -al -R options with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/null"`.
|
||||||
|
|
||||||
## Comparing with GNU ls
|
## Comparing with GNU ls
|
||||||
|
|
||||||
|
@ -29,7 +29,8 @@ Example: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/n
|
||||||
This can also be used to compare with version of ls built before your changes to ensure your change does not regress this.
|
This can also be used to compare with version of ls built before your changes to ensure your change does not regress this.
|
||||||
|
|
||||||
Here is a `bash` script for doing this comparison:
|
Here is a `bash` script for doing this comparison:
|
||||||
```bash
|
|
||||||
|
```shell
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
cargo build --no-default-features --features ls --release
|
cargo build --no-default-features --features ls --release
|
||||||
args="$@"
|
args="$@"
|
||||||
|
@ -46,12 +47,14 @@ hyperfine "ls $args" "target/release/coreutils ls $args"
|
||||||
## Cargo Flamegraph
|
## Cargo Flamegraph
|
||||||
|
|
||||||
With Cargo Flamegraph you can easily make a flamegraph of `ls`:
|
With Cargo Flamegraph you can easily make a flamegraph of `ls`:
|
||||||
```bash
|
|
||||||
|
```shell
|
||||||
cargo flamegraph --cmd coreutils -- ls [additional parameters]
|
cargo flamegraph --cmd coreutils -- ls [additional parameters]
|
||||||
```
|
```
|
||||||
|
|
||||||
However, if the `-R` option is given, the output becomes pretty much useless due to recursion. We can fix this by merging all the direct recursive calls with `uniq`, below is a `bash` script that does this.
|
However, if the `-R` option is given, the output becomes pretty much useless due to recursion. We can fix this by merging all the direct recursive calls with `uniq`, below is a `bash` script that does this.
|
||||||
```bash
|
|
||||||
|
```shell
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
cargo build --release --no-default-features --features ls
|
cargo build --release --no-default-features --features ls
|
||||||
perf record target/release/coreutils ls "$@"
|
perf record target/release/coreutils ls "$@"
|
||||||
|
|
|
@ -1,6 +1,7 @@
|
||||||
<!-- spell-checker:ignore ugoa -->
|
|
||||||
# mkdir
|
# mkdir
|
||||||
|
|
||||||
|
<!-- spell-checker:ignore ugoa -->
|
||||||
|
|
||||||
```
|
```
|
||||||
mkdir [OPTION]... [USER]
|
mkdir [OPTION]... [USER]
|
||||||
```
|
```
|
||||||
|
|
|
@ -1,6 +1,7 @@
|
||||||
<!-- spell-checker:ignore N'th M'th -->
|
|
||||||
# numfmt
|
# numfmt
|
||||||
|
|
||||||
|
<!-- spell-checker:ignore N'th M'th -->
|
||||||
|
|
||||||
```
|
```
|
||||||
numfmt [OPTION]... [NUMBER]...
|
numfmt [OPTION]... [NUMBER]...
|
||||||
```
|
```
|
||||||
|
@ -10,24 +11,25 @@ Convert numbers from/to human-readable strings
|
||||||
## After Help
|
## After Help
|
||||||
|
|
||||||
`UNIT` options:
|
`UNIT` options:
|
||||||
- `none`: no auto-scaling is done; suffixes will trigger an error
|
|
||||||
- `auto`: accept optional single/two letter suffix:
|
|
||||||
|
|
||||||
1K = 1000, 1Ki = 1024, 1M = 1000000, 1Mi = 1048576,
|
- `none`: no auto-scaling is done; suffixes will trigger an error
|
||||||
|
- `auto`: accept optional single/two letter suffix:
|
||||||
|
|
||||||
- `si`: accept optional single letter suffix:
|
1K = 1000, 1Ki = 1024, 1M = 1000000, 1Mi = 1048576,
|
||||||
|
|
||||||
1K = 1000, 1M = 1000000, ...
|
- `si`: accept optional single letter suffix:
|
||||||
|
|
||||||
- `iec`: accept optional single letter suffix:
|
1K = 1000, 1M = 1000000, ...
|
||||||
|
|
||||||
1K = 1024, 1M = 1048576, ...
|
- `iec`: accept optional single letter suffix:
|
||||||
|
|
||||||
|
1K = 1024, 1M = 1048576, ...
|
||||||
|
|
||||||
- `iec-i`: accept optional two-letter suffix:
|
- `iec-i`: accept optional two-letter suffix:
|
||||||
|
|
||||||
1Ki = 1024, 1Mi = 1048576, ...
|
1Ki = 1024, 1Mi = 1048576, ...
|
||||||
|
|
||||||
`FIELDS` supports `cut(1)` style field ranges:
|
- `FIELDS` supports `cut(1)` style field ranges:
|
||||||
|
|
||||||
N N'th field, counted from 1
|
N N'th field, counted from 1
|
||||||
N- from N'th field, to end of line
|
N- from N'th field, to end of line
|
||||||
|
|
|
@ -5,15 +5,21 @@ GNU version of `seq`, you can use a benchmarking tool like
|
||||||
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
||||||
running
|
running
|
||||||
|
|
||||||
sudo apt-get install hyperfine
|
```shell
|
||||||
|
sudo apt-get install hyperfine
|
||||||
|
```
|
||||||
|
|
||||||
Next, build the `seq` binary under the release profile:
|
Next, build the `seq` binary under the release profile:
|
||||||
|
|
||||||
cargo build --release -p uu_seq
|
```shell
|
||||||
|
cargo build --release -p uu_seq
|
||||||
|
```
|
||||||
|
|
||||||
Finally, you can compare the performance of the two versions of `head`
|
Finally, you can compare the performance of the two versions of `head`
|
||||||
by running, for example,
|
by running, for example,
|
||||||
|
|
||||||
hyperfine "seq 1000000" "target/release/seq 1000000"
|
```shell
|
||||||
|
hyperfine "seq 1000000" "target/release/seq 1000000"
|
||||||
|
```
|
||||||
|
|
||||||
[0]: https://github.com/sharkdp/hyperfine
|
[0]: https://github.com/sharkdp/hyperfine
|
||||||
|
|
|
@ -12,64 +12,59 @@ Run `cargo build --release` before benchmarking after you make a change!
|
||||||
|
|
||||||
## Sorting a wordlist
|
## Sorting a wordlist
|
||||||
|
|
||||||
- Get a wordlist, for example with [words](<https://en.wikipedia.org/wiki/Words_(Unix)>) on Linux. The exact wordlist
|
- Get a wordlist, for example with [words](<https://en.wikipedia.org/wiki/Words_(Unix)>) on Linux. The exact wordlist
|
||||||
doesn't matter for performance comparisons. In this example I'm using `/usr/share/dict/american-english` as the wordlist.
|
doesn't matter for performance comparisons. In this example I'm using `/usr/share/dict/american-english` as the wordlist.
|
||||||
- Shuffle the wordlist by running `sort -R /usr/share/dict/american-english > shuffled_wordlist.txt`.
|
- Shuffle the wordlist by running `sort -R /usr/share/dict/american-english > shuffled_wordlist.txt`.
|
||||||
- Benchmark sorting the wordlist with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -o output.txt"`.
|
- Benchmark sorting the wordlist with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -o output.txt"`.
|
||||||
|
|
||||||
## Sorting a wordlist with ignore_case
|
## Sorting a wordlist with ignore_case
|
||||||
|
|
||||||
- Same wordlist as above
|
- Same wordlist as above
|
||||||
- Benchmark sorting the wordlist ignoring the case with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -f -o output.txt"`.
|
- Benchmark sorting the wordlist ignoring the case with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -f -o output.txt"`.
|
||||||
|
|
||||||
## Sorting numbers
|
## Sorting numbers
|
||||||
|
|
||||||
- Generate a list of numbers: `seq 0 100000 | sort -R > shuffled_numbers.txt`.
|
- Generate a list of numbers: `seq 0 100000 | sort -R > shuffled_numbers.txt`.
|
||||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"`.
|
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"`.
|
||||||
|
|
||||||
## Sorting numbers with -g
|
## Sorting numbers with -g
|
||||||
|
|
||||||
- Same list of numbers as above.
|
- Same list of numbers as above.
|
||||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -g -o output.txt"`.
|
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -g -o output.txt"`.
|
||||||
|
|
||||||
## Sorting numbers with SI prefixes
|
## Sorting numbers with SI prefixes
|
||||||
|
|
||||||
- Generate a list of numbers:
|
- Generate a list of numbers:
|
||||||
<details>
|
|
||||||
<summary>Rust script</summary>
|
|
||||||
|
|
||||||
## Cargo.toml
|
## Cargo.toml
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
[dependencies]
|
[dependencies]
|
||||||
rand = "0.8.3"
|
rand = "0.8.3"
|
||||||
```
|
```
|
||||||
|
|
||||||
## main.rs
|
## main.rs
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
use rand::prelude::*;
|
use rand::prelude::*;
|
||||||
fn main() {
|
fn main() {
|
||||||
let suffixes = ['k', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'];
|
let suffixes = ['k', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'];
|
||||||
let mut rng = thread_rng();
|
let mut rng = thread_rng();
|
||||||
for _ in 0..100000 {
|
for _ in 0..100000 {
|
||||||
println!(
|
println!(
|
||||||
"{}{}",
|
"{}{}",
|
||||||
rng.gen_range(0..1000000),
|
rng.gen_range(0..1000000),
|
||||||
suffixes.choose(&mut rng).unwrap()
|
suffixes.choose(&mut rng).unwrap()
|
||||||
)
|
)
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
```
|
## running
|
||||||
|
|
||||||
## running
|
`cargo run > shuffled_numbers_si.txt`
|
||||||
|
|
||||||
`cargo run > shuffled_numbers_si.txt`
|
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers_si.txt -h -o output.txt"`.
|
||||||
|
|
||||||
</details>
|
|
||||||
|
|
||||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers_si.txt -h -o output.txt"`.
|
|
||||||
|
|
||||||
## External sorting
|
## External sorting
|
||||||
|
|
||||||
|
@ -83,28 +78,28 @@ Example: Run `hyperfine './target/release/coreutils sort shuffled_wordlist.txt -
|
||||||
|
|
||||||
"Merge" sort merges already sorted files. It is a sub-step of external sorting, so benchmarking it separately may be helpful.
|
"Merge" sort merges already sorted files. It is a sub-step of external sorting, so benchmarking it separately may be helpful.
|
||||||
|
|
||||||
- Splitting `shuffled_wordlist.txt` can be achieved by running `split shuffled_wordlist.txt shuffled_wordlist_slice_ --additional-suffix=.txt`
|
- Splitting `shuffled_wordlist.txt` can be achieved by running `split shuffled_wordlist.txt shuffled_wordlist_slice_ --additional-suffix=.txt`
|
||||||
- Sort each part by running `for f in shuffled_wordlist_slice_*; do sort $f -o $f; done`
|
- Sort each part by running `for f in shuffled_wordlist_slice_*; do sort $f -o $f; done`
|
||||||
- Benchmark merging by running `hyperfine "target/release/coreutils sort -m shuffled_wordlist_slice_*"`
|
- Benchmark merging by running `hyperfine "target/release/coreutils sort -m shuffled_wordlist_slice_*"`
|
||||||
|
|
||||||
## Check
|
## Check
|
||||||
|
|
||||||
When invoked with -c, we simply check if the input is already ordered. The input for benchmarking should be an already sorted file.
|
When invoked with -c, we simply check if the input is already ordered. The input for benchmarking should be an already sorted file.
|
||||||
|
|
||||||
- Benchmark checking by running `hyperfine "target/release/coreutils sort -c sorted_wordlist.txt"`
|
- Benchmark checking by running `hyperfine "target/release/coreutils sort -c sorted_wordlist.txt"`
|
||||||
|
|
||||||
## Stdout and stdin performance
|
## Stdout and stdin performance
|
||||||
|
|
||||||
Try to run the above benchmarks by piping the input through stdin (standard input) and redirect the
|
Try to run the above benchmarks by piping the input through stdin (standard input) and redirect the
|
||||||
output through stdout (standard output):
|
output through stdout (standard output):
|
||||||
|
|
||||||
- Remove the input file from the arguments and add `cat [input_file] | ` at the beginning.
|
- Remove the input file from the arguments and add ```cat [input_file] |``` at the beginning.
|
||||||
- Remove `-o output.txt` and add `> output.txt` at the end.
|
- Remove `-o output.txt` and add `> output.txt` at the end.
|
||||||
|
|
||||||
Example: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"` becomes
|
Example: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"` becomes
|
||||||
`hyperfine "cat shuffled_numbers.txt | target/release/coreutils sort -n > output.txt`
|
`hyperfine "cat shuffled_numbers.txt | target/release/coreutils sort -n > output.txt`
|
||||||
|
|
||||||
- Check that performance is similar to the original benchmark.
|
- Check that performance is similar to the original benchmark.
|
||||||
|
|
||||||
## Comparing with GNU sort
|
## Comparing with GNU sort
|
||||||
|
|
||||||
|
@ -121,37 +116,34 @@ The above benchmarks use hyperfine to measure the speed of sorting. There are ho
|
||||||
resource usage. One way to measure them is the `time` command. This is not to be confused with the `time` that is built in to the bash shell.
|
resource usage. One way to measure them is the `time` command. This is not to be confused with the `time` that is built in to the bash shell.
|
||||||
You may have to install `time` first, then you have to run it with `/bin/time -v` to give it precedence over the built in `time`.
|
You may have to install `time` first, then you have to run it with `/bin/time -v` to give it precedence over the built in `time`.
|
||||||
|
|
||||||
<details>
|
```plain
|
||||||
<summary>Example output</summary>
|
Command being timed: "target/release/coreutils sort shuffled_numbers.txt"
|
||||||
|
User time (seconds): 0.10
|
||||||
Command being timed: "target/release/coreutils sort shuffled_numbers.txt"
|
System time (seconds): 0.00
|
||||||
User time (seconds): 0.10
|
Percent of CPU this job got: 365%
|
||||||
System time (seconds): 0.00
|
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
|
||||||
Percent of CPU this job got: 365%
|
Average shared text size (kbytes): 0
|
||||||
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
|
Average unshared data size (kbytes): 0
|
||||||
Average shared text size (kbytes): 0
|
Average stack size (kbytes): 0
|
||||||
Average unshared data size (kbytes): 0
|
Average total size (kbytes): 0
|
||||||
Average stack size (kbytes): 0
|
Maximum resident set size (kbytes): 25360
|
||||||
Average total size (kbytes): 0
|
Average resident set size (kbytes): 0
|
||||||
Maximum resident set size (kbytes): 25360
|
Major (requiring I/O) page faults: 0
|
||||||
Average resident set size (kbytes): 0
|
Minor (reclaiming a frame) page faults: 5802
|
||||||
Major (requiring I/O) page faults: 0
|
Voluntary context switches: 462
|
||||||
Minor (reclaiming a frame) page faults: 5802
|
Involuntary context switches: 73
|
||||||
Voluntary context switches: 462
|
Swaps: 0
|
||||||
Involuntary context switches: 73
|
File system inputs: 1184
|
||||||
Swaps: 0
|
File system outputs: 0
|
||||||
File system inputs: 1184
|
Socket messages sent: 0
|
||||||
File system outputs: 0
|
Socket messages received: 0
|
||||||
Socket messages sent: 0
|
Signals delivered: 0
|
||||||
Socket messages received: 0
|
Page size (bytes): 4096
|
||||||
Signals delivered: 0
|
Exit status: 0
|
||||||
Page size (bytes): 4096
|
```
|
||||||
Exit status: 0
|
|
||||||
|
|
||||||
</details>
|
|
||||||
|
|
||||||
Useful metrics to look at could be:
|
Useful metrics to look at could be:
|
||||||
|
|
||||||
- User time
|
- User time
|
||||||
- Percent of CPU this job got
|
- Percent of CPU this job got
|
||||||
- Maximum resident set size
|
- Maximum resident set size
|
||||||
|
|
|
@ -7,11 +7,15 @@ GNU version of `split`, you can use a benchmarking tool like
|
||||||
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
||||||
running
|
running
|
||||||
|
|
||||||
sudo apt-get install hyperfine
|
```
|
||||||
|
sudo apt-get install hyperfine
|
||||||
|
```
|
||||||
|
|
||||||
Next, build the `split` binary under the release profile:
|
Next, build the `split` binary under the release profile:
|
||||||
|
|
||||||
cargo build --release -p uu_split
|
```
|
||||||
|
cargo build --release -p uu_split
|
||||||
|
```
|
||||||
|
|
||||||
Now, get a text file to test `split` on. The `split` program has three
|
Now, get a text file to test `split` on. The `split` program has three
|
||||||
main modes of operation: chunk by lines, chunk by bytes, and chunk by
|
main modes of operation: chunk by lines, chunk by bytes, and chunk by
|
||||||
|
@ -21,7 +25,9 @@ operation. For example, to test chunking by bytes on a large input file,
|
||||||
you can create a file named `testfile.txt` containing one million null
|
you can create a file named `testfile.txt` containing one million null
|
||||||
bytes like this:
|
bytes like this:
|
||||||
|
|
||||||
printf "%0.s\0" {1..1000000} > testfile.txt
|
```
|
||||||
|
printf "%0.s\0" {1..1000000} > testfile.txt
|
||||||
|
```
|
||||||
|
|
||||||
For another example, to test chunking by bytes on a large real-world
|
For another example, to test chunking by bytes on a large real-world
|
||||||
input file, you could download a [database dump of Wikidata][1] or some
|
input file, you could download a [database dump of Wikidata][1] or some
|
||||||
|
@ -31,10 +37,12 @@ file][2] contains about 130 million lines.
|
||||||
Finally, you can compare the performance of the two versions of `split`
|
Finally, you can compare the performance of the two versions of `split`
|
||||||
by running, for example,
|
by running, for example,
|
||||||
|
|
||||||
cd /tmp && hyperfine \
|
```
|
||||||
--prepare 'rm x* || true' \
|
cd /tmp && hyperfine \
|
||||||
"split -b 1000 testfile.txt" \
|
--prepare 'rm x* || true' \
|
||||||
"target/release/split -b 1000 testfile.txt"
|
"split -b 1000 testfile.txt" \
|
||||||
|
"target/release/split -b 1000 testfile.txt"
|
||||||
|
```
|
||||||
|
|
||||||
Since `split` creates a lot of files on the filesystem, I recommend
|
Since `split` creates a lot of files on the filesystem, I recommend
|
||||||
changing to the `/tmp` directory before running the benchmark. The
|
changing to the `/tmp` directory before running the benchmark. The
|
||||||
|
|
|
@ -4,7 +4,8 @@
|
||||||
|
|
||||||
### Flags
|
### Flags
|
||||||
|
|
||||||
* [ ] `--verbose` - created file printing is implemented, don't know if there is anything else
|
* [ ] `--verbose` - created file printing is implemented, don't know
|
||||||
|
if there is anything else
|
||||||
|
|
||||||
## Possible Optimizations
|
## Possible Optimizations
|
||||||
|
|
||||||
|
|
|
@ -8,53 +8,53 @@ Display file or file system status.
|
||||||
|
|
||||||
## Long Usage
|
## Long Usage
|
||||||
|
|
||||||
The valid format sequences for files (without `--file-system`):
|
Valid format sequences for files (without `--file-system`):
|
||||||
|
|
||||||
%a access rights in octal (note '#' and '0' printf flags)
|
- `%a`: access rights in octal (note '#' and '0' printf flags)
|
||||||
%A access rights in human readable form
|
- `%A`: access rights in human readable form
|
||||||
%b number of blocks allocated (see %B)
|
- `%b`: number of blocks allocated (see %B)
|
||||||
%B the size in bytes of each block reported by %b
|
- `%B`: the size in bytes of each block reported by %b
|
||||||
%C SELinux security context string
|
- `%C`: SELinux security context string
|
||||||
%d device number in decimal
|
- `%d`: device number in decimal
|
||||||
%D device number in hex
|
- `%D`: device number in hex
|
||||||
%f raw mode in hex
|
- `%f`: raw mode in hex
|
||||||
%F file type
|
- `%F`: file type
|
||||||
%g group ID of owner
|
- `%g`: group ID of owner
|
||||||
%G group name of owner
|
- `%G`: group name of owner
|
||||||
%h number of hard links
|
- `%h`: number of hard links
|
||||||
%i inode number
|
- `%i`: inode number
|
||||||
%m mount point
|
- `%m`: mount point
|
||||||
%n file name
|
- `%n`: file name
|
||||||
%N quoted file name with dereference if symbolic link
|
- `%N`: quoted file name with dereference if symbolic link
|
||||||
%o optimal I/O transfer size hint
|
- `%o`: optimal I/O transfer size hint
|
||||||
%s total size, in bytes
|
- `%s`: total size, in bytes
|
||||||
%t major device type in hex, for character/block device special files
|
- `%t`: major device type in hex, for character/block device special files
|
||||||
%T minor device type in hex, for character/block device special files
|
- `%T`: minor device type in hex, for character/block device special files
|
||||||
%u user ID of owner
|
- `%u`: user ID of owner
|
||||||
%U user name of owner
|
- `%U`: user name of owner
|
||||||
%w time of file birth, human-readable; - if unknown
|
- `%w`: time of file birth, human-readable; - if unknown
|
||||||
%W time of file birth, seconds since Epoch; 0 if unknown
|
- `%W`: time of file birth, seconds since Epoch; 0 if unknown
|
||||||
%x time of last access, human-readable
|
- `%x`: time of last access, human-readable
|
||||||
%X time of last access, seconds since Epoch
|
- `%X`: time of last access, seconds since Epoch
|
||||||
%y time of last data modification, human-readable
|
- `%y`: time of last data modification, human-readable
|
||||||
%Y time of last data modification, seconds since Epoch
|
- `%Y`: time of last data modification, seconds since Epoch
|
||||||
%z time of last status change, human-readable
|
- `%z`: time of last status change, human-readable
|
||||||
%Z time of last status change, seconds since Epoch
|
- `%Z`: time of last status change, seconds since Epoch
|
||||||
|
|
||||||
Valid format sequences for file systems:
|
Valid format sequences for file systems:
|
||||||
|
|
||||||
%a free blocks available to non-superuser
|
- `%a`: free blocks available to non-superuser
|
||||||
%b total data blocks in file system
|
- `%b`: total data blocks in file system
|
||||||
%c total file nodes in file system
|
- `%c`: total file nodes in file system
|
||||||
%d free file nodes in file system
|
- `%d`: free file nodes in file system
|
||||||
%f free blocks in file system
|
- `%f`: free blocks in file system
|
||||||
%i file system ID in hex
|
- `%i`: file system ID in hex
|
||||||
%l maximum length of filenames
|
- `%l`: maximum length of filenames
|
||||||
%n file name
|
- `%n`: file name
|
||||||
%s block size (for faster transfers)
|
- `%s`: block size (for faster transfers)
|
||||||
%S fundamental block size (for block counts)
|
- `%S`: fundamental block size (for block counts)
|
||||||
%t file system type in hex
|
- `%t`: file system type in hex
|
||||||
%T file system type in human readable form
|
- `%T`: file system type in human readable form
|
||||||
|
|
||||||
NOTE: your shell may have its own version of stat, which usually supersedes
|
NOTE: your shell may have its own version of stat, which usually supersedes
|
||||||
the version described here. Please refer to your shell's documentation
|
the version described here. Please refer to your shell's documentation
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
## Benchmarking `sum`
|
# Benchmarking `sum`
|
||||||
|
|
||||||
<!-- spell-checker:ignore wikidatawiki -->
|
<!-- spell-checker:ignore wikidatawiki -->
|
||||||
|
|
||||||
|
@ -7,17 +7,17 @@ Large sample files can for example be found in the [Wikipedia database dumps](ht
|
||||||
After you have obtained and uncompressed such a file, you need to build `sum` in release mode
|
After you have obtained and uncompressed such a file, you need to build `sum` in release mode
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ cargo build --release --package uu_sum
|
cargo build --release --package uu_sum
|
||||||
```
|
```
|
||||||
|
|
||||||
and then you can time how it long it takes to checksum the file by running
|
and then you can time how it long it takes to checksum the file by running
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ time ./target/release/sum wikidatawiki-20211001-pages-logging.xml
|
time ./target/release/sum wikidatawiki-20211001-pages-logging.xml
|
||||||
```
|
```
|
||||||
|
|
||||||
For more systematic measurements that include warm-ups, repetitions and comparisons, [Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful. For example, to compare this implementation to the one provided by your distribution run
|
For more systematic measurements that include warm-ups, repetitions and comparisons, [Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful. For example, to compare this implementation to the one provided by your distribution run
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ hyperfine "./target/release/sum wikidatawiki-20211001-pages-logging.xml" "sum wikidatawiki-20211001-pages-logging.xml"
|
hyperfine "./target/release/sum wikidatawiki-20211001-pages-logging.xml" "sum wikidatawiki-20211001-pages-logging.xml"
|
||||||
```
|
```
|
||||||
|
|
|
@ -1,25 +1,36 @@
|
||||||
## Benchmarking `tac`
|
# Benchmarking `tac`
|
||||||
|
|
||||||
<!-- spell-checker:ignore wikidatawiki -->
|
<!-- spell-checker:ignore wikidatawiki -->
|
||||||
|
|
||||||
`tac` is often used to process log files in reverse chronological order, i.e. from newer towards older entries. In this case, the performance target to yield results as fast as possible, i.e. without reading in the whole file that is to be reversed line-by-line. Therefore, a sensible benchmark is to read a large log file containing N lines and measure how long it takes to produce the last K lines from that file.
|
`tac` is often used to process log files in reverse chronological order, i.e.
|
||||||
|
from newer towards older entries. In this case, the performance target to yield
|
||||||
|
results as fast as possible, i.e. without reading in the whole file that is to
|
||||||
|
be reversed line-by-line. Therefore, a sensible benchmark is to read a large log
|
||||||
|
file containing N lines and measure how long it takes to produce the last K
|
||||||
|
lines from that file.
|
||||||
|
|
||||||
Large text files can for example be found in the [Wikipedia database dumps](https://dumps.wikimedia.org/wikidatawiki/latest/), usually sized at multiple gigabytes and comprising more than 100M lines.
|
Large text files can for example be found in the
|
||||||
|
[Wikipedia database dumps](https://dumps.wikimedia.org/wikidatawiki/latest/),
|
||||||
|
usually sized at multiple gigabytes and comprising more than 100M lines.
|
||||||
|
|
||||||
After you have obtained and uncompressed such a file, you need to build `tac` in release mode
|
After you have obtained and uncompressed such a file, you need to build `tac`
|
||||||
|
in release mode
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ cargo build --release --package uu_tac
|
cargo build --release --package uu_tac
|
||||||
```
|
```
|
||||||
|
|
||||||
and then you can time how it long it takes to extract the last 10M lines by running
|
and then you can time how it long it takes to extract the last 10M lines by
|
||||||
|
running
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ /usr/bin/time ./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null
|
/usr/bin/time ./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null
|
||||||
```
|
```
|
||||||
|
|
||||||
For more systematic measurements that include warm-ups, repetitions and comparisons, [Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful. For example, to compare this implementation to the one provided by your distribution run
|
For more systematic measurements that include warm-ups, repetitions and comparisons,
|
||||||
|
[Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful.
|
||||||
|
For example, to compare this implementation to the one provided by your distribution run
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ hyperfine "./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null" "/usr/bin/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null"
|
hyperfine "./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null" "/usr/bin/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null"
|
||||||
```
|
```
|
||||||
|
|
|
@ -7,40 +7,59 @@
|
||||||
* `--max-unchanged-stats`
|
* `--max-unchanged-stats`
|
||||||
|
|
||||||
Note:
|
Note:
|
||||||
There's a stub for `--max-unchanged-stats` so GNU test-suite checks using it can run, however this flag has no functionality yet.
|
There's a stub for `--max-unchanged-stats` so GNU test-suite checks using it
|
||||||
|
can run, however this flag has no functionality yet.
|
||||||
|
|
||||||
### Platform support for `--follow` and `--retry`
|
### Platform support for `--follow` and `--retry`
|
||||||
The `--follow=descriptor`, `--follow=name` and `--retry` flags have very good support on Linux (inotify backend).
|
|
||||||
They work good enough on macOS/BSD (kqueue backend) with some tests failing due to differences of how kqueue works compared to inotify.
|
The `--follow=descriptor`, `--follow=name` and `--retry` flags have very good
|
||||||
Windows support is there in theory due to ReadDirectoryChanges support by the notify-crate, however these flags are completely untested on Windows.
|
support on Linux (inotify backend).
|
||||||
|
They work good enough on macOS/BSD (kqueue backend) with some tests failing due
|
||||||
|
to differences of how kqueue works compared to inotify.
|
||||||
|
Windows support is there in theory due to ReadDirectoryChanges support by the
|
||||||
|
notify-crate, however these flags are completely untested on Windows.
|
||||||
|
|
||||||
Note:
|
Note:
|
||||||
The undocumented `---disable-inotify` flag is used to disable the inotify backend to test polling.
|
The undocumented `---disable-inotify` flag is used to disable the inotify
|
||||||
However inotify is a Linux only backend and polling is now supported also for the other backends.
|
backend to test polling.
|
||||||
Because of this, `disable-inotify` is now an alias to the new and more versatile flag name: `--use-polling`.
|
However inotify is a Linux only backend and polling is now supported also
|
||||||
|
for the other backends.
|
||||||
|
Because of this, `disable-inotify` is now an alias to the new and more versatile
|
||||||
|
flag name: `--use-polling`.
|
||||||
|
|
||||||
## Possible optimizations
|
## Possible optimizations
|
||||||
|
|
||||||
* Don't read the whole file if not using `-f` and input is regular file. Read in chunks from the end going backwards, reading each individual chunk forward.
|
* Don't read the whole file if not using `-f` and input is regular file.
|
||||||
|
Read in chunks from the end going backwards, reading each individual chunk
|
||||||
|
forward.
|
||||||
* Reduce number of system calls to e.g. `fstat`
|
* Reduce number of system calls to e.g. `fstat`
|
||||||
* Improve resource management by adding more system calls to `inotify_rm_watch` when appropriate.
|
* Improve resource management by adding more system calls to `inotify_rm_watch`
|
||||||
|
when appropriate.
|
||||||
|
|
||||||
# GNU test-suite results (9.1.8-e08752)
|
# GNU test-suite results (9.1.8-e08752)
|
||||||
|
|
||||||
The functionality for the test "gnu/tests/tail-2/follow-stdin.sh" is implemented.
|
The functionality for the test "gnu/tests/tail-2/follow-stdin.sh" is implemented.
|
||||||
It fails because it is provoking closing a file descriptor with `tail -f <&-` and as part of a workaround, Rust's stdlib reopens closed FDs as `/dev/null` which means uu_tail cannot detect this.
|
It fails because it is provoking closing a file descriptor with `tail -f <&-`
|
||||||
See also, e.g. the discussion at: https://github.com/uutils/coreutils/issues/2873
|
and as part of a workaround, Rust's stdlib reopens closed FDs as `/dev/null`
|
||||||
|
which means uu_tail cannot detect this.
|
||||||
|
See also, e.g. the discussion at:
|
||||||
|
<https://github.com/uutils/coreutils/issues/2873>
|
||||||
|
|
||||||
The functionality for the test "gnu/tests/tail-2/inotify-rotate-resources.sh" is implemented.
|
The functionality for the test "gnu/tests/tail-2/inotify-rotate-resources.sh"
|
||||||
It fails with an error because it is using `strace` to look for calls to `inotify_add_watch` and `inotify_rm_watch`,
|
is implemented.
|
||||||
|
It fails with an error because it is using `strace` to look for calls to
|
||||||
|
`inotify_add_watch` and `inotify_rm_watch`,
|
||||||
however in uu_tail these system calls are invoked from a separate thread.
|
however in uu_tail these system calls are invoked from a separate thread.
|
||||||
If the GNU test would follow threads, i.e. use `strace -f`, this issue could be resolved.
|
If the GNU test would follow threads, i.e. use `strace -f`, this issue could be
|
||||||
|
resolved.
|
||||||
|
|
||||||
There are 5 tests which are fixed but do not (always) pass the test suite if it's run inside the CI.
|
There are 5 tests which are fixed but do not (always) pass the test suite
|
||||||
|
if it's run inside the CI.
|
||||||
The reason for this is probably related to load/scheduling on the CI test VM.
|
The reason for this is probably related to load/scheduling on the CI test VM.
|
||||||
The tests in question are:
|
The tests in question are:
|
||||||
- [x] `tail-2/F-vs-rename.sh`
|
|
||||||
- [x] `tail-2/follow-name.sh`
|
* [x] `tail-2/F-vs-rename.sh`
|
||||||
- [x] `tail-2/inotify-rotate.sh`
|
* [x] `tail-2/follow-name.sh`
|
||||||
- [x] `tail-2/overlay-headers.sh`
|
* [x] `tail-2/inotify-rotate.sh`
|
||||||
- [x] `tail-2/retry.sh`
|
* [x] `tail-2/overlay-headers.sh`
|
||||||
|
* [x] `tail-2/retry.sh`
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
# truncate
|
# truncate
|
||||||
|
|
||||||
|
```
|
||||||
truncate [OPTION]... [FILE]...
|
truncate [OPTION]... [FILE]...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
|
@ -2,45 +2,59 @@
|
||||||
|
|
||||||
<!-- spell-checker:ignore (words) uuwc uucat largefile somefile Mshortlines moby lwcm cmds tablefmt -->
|
<!-- spell-checker:ignore (words) uuwc uucat largefile somefile Mshortlines moby lwcm cmds tablefmt -->
|
||||||
|
|
||||||
Much of what makes wc fast is avoiding unnecessary work. It has multiple strategies, depending on which data is requested.
|
Much of what makes wc fast is avoiding unnecessary work. It has multiple strategies,
|
||||||
|
depending on which data is requested.
|
||||||
|
|
||||||
## Strategies
|
## Strategies
|
||||||
|
|
||||||
### Counting bytes
|
### Counting bytes
|
||||||
|
|
||||||
In the case of `wc -c` the content of the input doesn't have to be inspected at all, only the size has to be known. That enables a few optimizations.
|
In the case of `wc -c` the content of the input doesn't have to be inspected at all,
|
||||||
|
only the size has to be known. That enables a few optimizations.
|
||||||
|
|
||||||
#### File size
|
#### File size
|
||||||
|
|
||||||
If it can, wc reads the file size directly. This is not interesting to benchmark, except to see if it still works. Try `wc -c largefile`.
|
If it can, wc reads the file size directly. This is not interesting to benchmark,
|
||||||
|
except to see if it still works. Try `wc -c largefile`.
|
||||||
|
|
||||||
#### `splice()`
|
#### `splice()`
|
||||||
|
|
||||||
On Linux `splice()` is used to get the input's length while discarding it directly.
|
On Linux `splice()` is used to get the input's length while discarding it directly.
|
||||||
|
|
||||||
The best way I've found to generate a fast input to test `splice()` is to pipe the output of uutils `cat` into it. Note that GNU `cat` is slower and therefore less suitable, and that if a file is given as its input directly (as in `wc -c < largefile`) the first strategy kicks in. Try `uucat somefile | wc -c`.
|
The best way I've found to generate a fast input to test `splice()` is to pipe the
|
||||||
|
output of uutils `cat` into it. Note that GNU `cat` is slower and therefore less
|
||||||
|
suitable, and that if a file is given as its input directly (as in
|
||||||
|
`wc -c < largefile`) the first strategy kicks in. Try `uucat somefile | wc -c`.
|
||||||
|
|
||||||
### Counting lines
|
### Counting lines
|
||||||
|
|
||||||
In the case of `wc -l` or `wc -cl` the input doesn't have to be decoded. It's read in chunks and the `bytecount` crate is used to count the newlines.
|
In the case of `wc -l` or `wc -cl` the input doesn't have to be decoded. It's
|
||||||
|
read in chunks and the `bytecount` crate is used to count the newlines.
|
||||||
|
|
||||||
It's useful to vary the line length in the input. GNU wc seems particularly bad at short lines.
|
It's useful to vary the line length in the input. GNU wc seems particularly
|
||||||
|
bad at short lines.
|
||||||
|
|
||||||
### Processing unicode
|
### Processing unicode
|
||||||
|
|
||||||
This is the most general strategy, and it's necessary for counting words, characters, and line lengths. Individual steps are still switched on and off depending on what must be reported.
|
This is the most general strategy, and it's necessary for counting words,
|
||||||
|
characters, and line lengths. Individual steps are still switched on and off
|
||||||
|
depending on what must be reported.
|
||||||
|
|
||||||
Try varying which of the `-w`, `-m`, `-l` and `-L` flags are used. (The `-c` flag is unlikely to make a difference.)
|
Try varying which of the `-w`, `-m`, `-l` and `-L` flags are used.
|
||||||
|
(The `-c` flag is unlikely to make a difference.)
|
||||||
|
|
||||||
Passing no flags is equivalent to passing `-wcl`. That case should perhaps be given special attention as it's the default.
|
Passing no flags is equivalent to passing `-wcl`. That case should perhaps be
|
||||||
|
given special attention as it's the default.
|
||||||
|
|
||||||
## Generating files
|
## Generating files
|
||||||
|
|
||||||
To generate a file with many very short lines, run `yes | head -c50000000 > 25Mshortlines`.
|
To generate a file with many very short lines, run
|
||||||
|
`yes | head -c50000000 > 25Mshortlines`.
|
||||||
|
|
||||||
To get a file with less artificial contents, download a book from Project Gutenberg and concatenate it a lot of times:
|
To get a file with less artificial contents, download a book from
|
||||||
|
Project Gutenberg and concatenate it a lot of times:
|
||||||
|
|
||||||
```
|
```shell
|
||||||
wget https://www.gutenberg.org/files/2701/2701-0.txt -O moby.txt
|
wget https://www.gutenberg.org/files/2701/2701-0.txt -O moby.txt
|
||||||
cat moby.txt moby.txt moby.txt moby.txt > moby4.txt
|
cat moby.txt moby.txt moby.txt moby.txt > moby4.txt
|
||||||
cat moby4.txt moby4.txt moby4.txt moby4.txt > moby16.txt
|
cat moby4.txt moby4.txt moby4.txt moby4.txt > moby16.txt
|
||||||
|
@ -49,7 +63,7 @@ cat moby16.txt moby16.txt moby16.txt moby16.txt > moby64.txt
|
||||||
|
|
||||||
And get one with lots of unicode too:
|
And get one with lots of unicode too:
|
||||||
|
|
||||||
```
|
```shell
|
||||||
wget https://www.gutenberg.org/files/30613/30613-0.txt -O odyssey.txt
|
wget https://www.gutenberg.org/files/30613/30613-0.txt -O odyssey.txt
|
||||||
cat odyssey.txt odyssey.txt odyssey.txt odyssey.txt > odyssey4.txt
|
cat odyssey.txt odyssey.txt odyssey.txt odyssey.txt > odyssey4.txt
|
||||||
cat odyssey4.txt odyssey4.txt odyssey4.txt odyssey4.txt > odyssey16.txt
|
cat odyssey4.txt odyssey4.txt odyssey4.txt odyssey4.txt > odyssey16.txt
|
||||||
|
@ -57,11 +71,14 @@ cat odyssey16.txt odyssey16.txt odyssey16.txt odyssey16.txt > odyssey64.txt
|
||||||
cat odyssey64.txt odyssey64.txt odyssey64.txt odyssey64.txt > odyssey256.txt
|
cat odyssey64.txt odyssey64.txt odyssey64.txt odyssey64.txt > odyssey256.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
Finally, it's interesting to try a binary file. Look for one with `du -sh /usr/bin/* | sort -h`. On my system `/usr/bin/docker` is a good candidate as it's fairly large.
|
Finally, it's interesting to try a binary file. Look for one with
|
||||||
|
`du -sh /usr/bin/* | sort -h`. On my system `/usr/bin/docker` is a good
|
||||||
|
candidate as it's fairly large.
|
||||||
|
|
||||||
## Running benchmarks
|
## Running benchmarks
|
||||||
|
|
||||||
Use [`hyperfine`](https://github.com/sharkdp/hyperfine) to compare the performance. For example, `hyperfine 'wc somefile' 'uuwc somefile'`.
|
Use [`hyperfine`](https://github.com/sharkdp/hyperfine) to compare the
|
||||||
|
performance. For example, `hyperfine 'wc somefile' 'uuwc somefile'`.
|
||||||
|
|
||||||
If you want to get fancy and exhaustive, generate a table:
|
If you want to get fancy and exhaustive, generate a table:
|
||||||
|
|
||||||
|
@ -69,6 +86,7 @@ If you want to get fancy and exhaustive, generate a table:
|
||||||
|------------------------|--------------|------------------|-----------------|-------------------|
|
|------------------------|--------------|------------------|-----------------|-------------------|
|
||||||
| `wc <FILE>` | 1.3965 | 1.6182 | 5.2967 | 2.2294 |
|
| `wc <FILE>` | 1.3965 | 1.6182 | 5.2967 | 2.2294 |
|
||||||
| `wc -c <FILE>` | 0.8134 | 1.2774 | 0.7732 | 0.9106 |
|
| `wc -c <FILE>` | 0.8134 | 1.2774 | 0.7732 | 0.9106 |
|
||||||
|
<!-- markdownlint-disable-next-line MD033 -->
|
||||||
| `uucat <FILE> | wc -c` | 2.7760 | 2.5565 | 2.3769 | 2.3982 |
|
| `uucat <FILE> | wc -c` | 2.7760 | 2.5565 | 2.3769 | 2.3982 |
|
||||||
| `wc -l <FILE>` | 1.1441 | 1.2854 | 2.9681 | 1.1493 |
|
| `wc -l <FILE>` | 1.1441 | 1.2854 | 2.9681 | 1.1493 |
|
||||||
| `wc -L <FILE>` | 2.1087 | 1.2551 | 5.4577 | 2.1490 |
|
| `wc -L <FILE>` | 2.1087 | 1.2551 | 5.4577 | 2.1490 |
|
||||||
|
@ -77,12 +95,16 @@ If you want to get fancy and exhaustive, generate a table:
|
||||||
| `wc -lwcmL <FILE>` | 1.1687 | 0.9169 | 4.4092 | 2.0663 |
|
| `wc -lwcmL <FILE>` | 1.1687 | 0.9169 | 4.4092 | 2.0663 |
|
||||||
|
|
||||||
Beware that:
|
Beware that:
|
||||||
|
|
||||||
- Results are fuzzy and change from run to run
|
- Results are fuzzy and change from run to run
|
||||||
- You'll often want to check versions of uutils wc against each other instead of against GNU
|
- You'll often want to check versions of uutils wc against each other instead
|
||||||
|
of against GNU
|
||||||
- This takes a lot of time to generate
|
- This takes a lot of time to generate
|
||||||
- This only shows the relative speedup, not the absolute time, which may be misleading if the time is very short
|
- This only shows the relative speedup, not the absolute time, which may be
|
||||||
|
misleading if the time is very short
|
||||||
|
|
||||||
Created by the following Python script:
|
Created by the following Python script:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import json
|
import json
|
||||||
import subprocess
|
import subprocess
|
||||||
|
@ -121,4 +143,6 @@ for cmd in cmds:
|
||||||
table.append(row)
|
table.append(row)
|
||||||
print(tabulate(table, [""] + files, tablefmt="github"))
|
print(tabulate(table, [""] + files, tablefmt="github"))
|
||||||
```
|
```
|
||||||
(You may have to adjust the `bins` and `files` variables depending on your setup, and please do add other interesting cases to `cmds`.)
|
|
||||||
|
(You may have to adjust the `bins` and `files` variables depending on your
|
||||||
|
setup, and please do add other interesting cases to `cmds`.)
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
# wc
|
# wc
|
||||||
|
|
||||||
|
```
|
||||||
wc [OPTION]... [FILE]...
|
wc [OPTION]... [FILE]...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue