mirror of
https://github.com/uutils/coreutils
synced 2024-12-04 02:19:54 +00:00
parent 9d5dc500e6
author Sylvestre Ledru <sylvestre@debian.org> 1677865358 +0100 committer Sylvestre Ledru <sylvestre@debian.org> 1677951797 +0100 md: Fix a bunch of warnings in the docs
This commit is contained in:
parent
9d5dc500e6
commit
422a27d375
42 changed files with 470 additions and 364 deletions
|
@ -116,7 +116,7 @@ the community.
|
|||
|
||||
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
|
||||
version 2.0, available at
|
||||
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
|
||||
<https://www.contributor-covenant.org/version/2/0/code_of_conduct.html>.
|
||||
|
||||
Community Impact Guidelines were inspired by [Mozilla's code of conduct
|
||||
enforcement ladder](https://github.com/mozilla/diversity).
|
||||
|
@ -124,5 +124,5 @@ enforcement ladder](https://github.com/mozilla/diversity).
|
|||
[homepage]: https://www.contributor-covenant.org
|
||||
|
||||
For answers to common questions about this code of conduct, see the FAQ at
|
||||
https://www.contributor-covenant.org/faq. Translations are available at
|
||||
https://www.contributor-covenant.org/translations.
|
||||
<https://www.contributor-covenant.org/faq>. Translations are available at
|
||||
<https://www.contributor-covenant.org/translations>.
|
||||
|
|
|
@ -38,20 +38,19 @@ search the issues to make sure no one else is working on it.
|
|||
|
||||
## Platforms
|
||||
|
||||
We take pride in supporting many operating systems and architectures.
|
||||
We take pride in supporting many operating systems and architectures.
|
||||
|
||||
**Tip:**
|
||||
For Windows, Microsoft provides some images (VMWare, Hyper-V, VirtualBox and Parallels)
|
||||
For Windows, Microsoft provides some images (VMWare, Hyper-V, VirtualBox and Parallels)
|
||||
for development:
|
||||
https://developer.microsoft.com/windows/downloads/virtual-machines/
|
||||
|
||||
<https://developer.microsoft.com/windows/downloads/virtual-machines/>
|
||||
|
||||
## Commit messages
|
||||
|
||||
To help the project maintainers review pull requests from contributors across
|
||||
numerous utilities, the team has settled on conventions for commit messages.
|
||||
|
||||
From http://git-scm.com/book/ch5-2.html:
|
||||
From <http://git-scm.com/book/ch5-2.html>:
|
||||
|
||||
```
|
||||
Short (50 chars or less) summary of changes
|
||||
|
|
|
@ -1,21 +1,19 @@
|
|||
Documentation
|
||||
-------------
|
||||
# Documentation
|
||||
|
||||
The source of the documentation is available on:
|
||||
|
||||
https://uutils.github.io/dev/coreutils/
|
||||
<https://uutils.github.io/dev/coreutils/>
|
||||
|
||||
The documentation is updated everyday on this repository:
|
||||
|
||||
https://github.com/uutils/uutils.github.io/
|
||||
<https://github.com/uutils/uutils.github.io/>
|
||||
|
||||
Running GNU tests
|
||||
-----------------
|
||||
## Running GNU tests
|
||||
|
||||
<!-- spell-checker:ignore gnulib -->
|
||||
|
||||
- Check out https://github.com/coreutils/coreutils next to your fork as gnu
|
||||
- Check out https://github.com/coreutils/gnulib next to your fork as gnulib
|
||||
- Check out <https://github.com/coreutils/coreutils> next to your fork as gnu
|
||||
- Check out <https://github.com/coreutils/gnulib> next to your fork as gnulib
|
||||
- Rename the checkout of your fork to uutils
|
||||
|
||||
At the end you should have uutils, gnu and gnulib checked out next to each other.
|
||||
|
@ -23,9 +21,7 @@ At the end you should have uutils, gnu and gnulib checked out next to each other
|
|||
- Run `cd uutils && ./util/build-gnu.sh && cd ..` to get everything ready (this may take a while)
|
||||
- Finally, you can run tests with `bash uutils/util/run-gnu-test.sh <tests>`. Instead of `<tests>` insert the tests you want to run, e.g. `tests/misc/wc-proc.sh`.
|
||||
|
||||
|
||||
Code Coverage Report Generation
|
||||
---------------------------------
|
||||
## Code Coverage Report Generation
|
||||
|
||||
<!-- spell-checker:ignore (flags) Ccodegen Coverflow Cpanic Zinstrument Zpanic -->
|
||||
|
||||
|
@ -36,13 +32,13 @@ Code coverage report can be generated using [grcov](https://github.com/mozilla/g
|
|||
To generate [gcov-based](https://github.com/mozilla/grcov#example-how-to-generate-gcda-files-for-a-rust-project) coverage report
|
||||
|
||||
```bash
|
||||
$ export CARGO_INCREMENTAL=0
|
||||
$ export RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Copt-level=0 -Clink-dead-code -Coverflow-checks=off -Zpanic_abort_tests -Cpanic=abort"
|
||||
$ export RUSTDOCFLAGS="-Cpanic=abort"
|
||||
$ cargo build <options...> # e.g., --features feat_os_unix
|
||||
$ cargo test <options...> # e.g., --features feat_os_unix test_pathchk
|
||||
$ grcov . -s . --binary-path ./target/debug/ -t html --branch --ignore-not-existing --ignore build.rs --excl-br-line "^\s*((debug_)?assert(_eq|_ne)?\#\[derive\()" -o ./target/debug/coverage/
|
||||
$ # open target/debug/coverage/index.html in browser
|
||||
export CARGO_INCREMENTAL=0
|
||||
export RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Copt-level=0 -Clink-dead-code -Coverflow-checks=off -Zpanic_abort_tests -Cpanic=abort"
|
||||
export RUSTDOCFLAGS="-Cpanic=abort"
|
||||
cargo build <options...> # e.g., --features feat_os_unix
|
||||
cargo test <options...> # e.g., --features feat_os_unix test_pathchk
|
||||
grcov . -s . --binary-path ./target/debug/ -t html --branch --ignore-not-existing --ignore build.rs --excl-br-line "^\s*((debug_)?assert(_eq|_ne)?\#\[derive\()" -o ./target/debug/coverage/
|
||||
# open target/debug/coverage/index.html in browser
|
||||
```
|
||||
|
||||
if changes are not reflected in the report then run `cargo clean` and run the above commands.
|
||||
|
@ -52,19 +48,17 @@ if changes are not reflected in the report then run `cargo clean` and run the ab
|
|||
If you are using stable version of Rust that doesn't enable code coverage instrumentation by default
|
||||
then add `-Z-Zinstrument-coverage` flag to `RUSTFLAGS` env variable specified above.
|
||||
|
||||
|
||||
pre-commit hooks
|
||||
----------------
|
||||
## pre-commit hooks
|
||||
|
||||
A configuration for `pre-commit` is provided in the repository. It allows automatically checking every git commit you make to ensure it compiles, and passes `clippy` and `rustfmt` without warnings.
|
||||
|
||||
To use the provided hook:
|
||||
|
||||
1. [Install `pre-commit`](https://pre-commit.com/#install)
|
||||
2. Run `pre-commit install` while in the repository directory
|
||||
1. Run `pre-commit install` while in the repository directory
|
||||
|
||||
Your git commits will then automatically be checked. If a check fails, an error message will explain why, and your commit will be canceled. You can then make the suggested changes, and run `git commit ...` again.
|
||||
|
||||
### Using Clippy
|
||||
## Using Clippy
|
||||
|
||||
The `msrv` key in the clippy configuration file `clippy.toml` is used to disable lints pertaining to newer features by specifying the minimum supported Rust version (MSRV). However, this key is only supported on `nightly`. To invoke clippy without errors, use `cargo +nightly clippy`. In order to also check tests and non-default crate features, use `cargo +nightly clippy --all-targets --all-features`.
|
||||
|
|
118
README.md
118
README.md
|
@ -21,11 +21,12 @@ or different behavior might be experienced.
|
|||
|
||||
To install it:
|
||||
|
||||
```
|
||||
$ cargo install coreutils
|
||||
$ ~/.cargo/bin/coreutils
|
||||
```bash
|
||||
cargo install coreutils
|
||||
~/.cargo/bin/coreutils
|
||||
```
|
||||
|
||||
<!-- markdownlint-disable-next-line MD026 -->
|
||||
## Why?
|
||||
|
||||
uutils aims to work on as many platforms as possible, to be able to use the
|
||||
|
@ -35,6 +36,7 @@ chosen not only because it is fast and safe, but is also excellent for
|
|||
writing cross-platform code.
|
||||
|
||||
## Documentation
|
||||
|
||||
uutils has both user and developer documentation available:
|
||||
|
||||
- [User Manual](https://uutils.github.io/user/)
|
||||
|
@ -46,8 +48,8 @@ Both can also be generated locally, the instructions for that can be found in th
|
|||
<!-- ANCHOR: build (this mark is needed for mdbook) -->
|
||||
## Requirements
|
||||
|
||||
* Rust (`cargo`, `rustc`)
|
||||
* GNU Make (optional)
|
||||
- Rust (`cargo`, `rustc`)
|
||||
- GNU Make (optional)
|
||||
|
||||
### Rust Version
|
||||
|
||||
|
@ -65,8 +67,8 @@ or GNU Make.
|
|||
For either method, we first need to fetch the repository:
|
||||
|
||||
```bash
|
||||
$ git clone https://github.com/uutils/coreutils
|
||||
$ cd coreutils
|
||||
git clone https://github.com/uutils/coreutils
|
||||
cd coreutils
|
||||
```
|
||||
|
||||
### Cargo
|
||||
|
@ -75,7 +77,7 @@ Building uutils using Cargo is easy because the process is the same as for
|
|||
every other Rust program:
|
||||
|
||||
```bash
|
||||
$ cargo build --release
|
||||
cargo build --release
|
||||
```
|
||||
|
||||
This command builds the most portable common core set of uutils into a multicall
|
||||
|
@ -86,11 +88,11 @@ expanded sets of uutils for a platform (on that platform) is as simple as
|
|||
specifying it as a feature:
|
||||
|
||||
```bash
|
||||
$ cargo build --release --features macos
|
||||
cargo build --release --features macos
|
||||
# or ...
|
||||
$ cargo build --release --features windows
|
||||
cargo build --release --features windows
|
||||
# or ...
|
||||
$ cargo build --release --features unix
|
||||
cargo build --release --features unix
|
||||
```
|
||||
|
||||
If you don't want to build every utility available on your platform into the
|
||||
|
@ -98,7 +100,7 @@ final binary, you can also specify which ones you want to build manually.
|
|||
For example:
|
||||
|
||||
```bash
|
||||
$ cargo build --features "base32 cat echo rm" --no-default-features
|
||||
cargo build --features "base32 cat echo rm" --no-default-features
|
||||
```
|
||||
|
||||
If you don't want to build the multicall binary and would prefer to build
|
||||
|
@ -108,7 +110,7 @@ is contained in its own package within the main repository, named
|
|||
specific packages (using the `--package` [aka `-p`] option). For example:
|
||||
|
||||
```bash
|
||||
$ cargo build -p uu_base32 -p uu_cat -p uu_echo -p uu_rm
|
||||
cargo build -p uu_base32 -p uu_cat -p uu_echo -p uu_rm
|
||||
```
|
||||
|
||||
### GNU Make
|
||||
|
@ -118,29 +120,29 @@ Building using `make` is a simple process as well.
|
|||
To simply build all available utilities:
|
||||
|
||||
```bash
|
||||
$ make
|
||||
make
|
||||
```
|
||||
|
||||
To build all but a few of the available utilities:
|
||||
|
||||
```bash
|
||||
$ make SKIP_UTILS='UTILITY_1 UTILITY_2'
|
||||
make SKIP_UTILS='UTILITY_1 UTILITY_2'
|
||||
```
|
||||
|
||||
To build only a few of the available utilities:
|
||||
|
||||
```bash
|
||||
$ make UTILS='UTILITY_1 UTILITY_2'
|
||||
make UTILS='UTILITY_1 UTILITY_2'
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
### Cargo
|
||||
### Install with Cargo
|
||||
|
||||
Likewise, installing can simply be done using:
|
||||
|
||||
```bash
|
||||
$ cargo install --path .
|
||||
cargo install --path .
|
||||
```
|
||||
|
||||
This command will install uutils into Cargo's *bin* folder (*e.g.* `$HOME/.cargo/bin`).
|
||||
|
@ -148,49 +150,49 @@ This command will install uutils into Cargo's *bin* folder (*e.g.* `$HOME/.cargo
|
|||
This does not install files necessary for shell completion. For shell completion to work,
|
||||
use `GNU Make` or see `Manually install shell completions`.
|
||||
|
||||
### GNU Make
|
||||
### Install with GNU Make
|
||||
|
||||
To install all available utilities:
|
||||
|
||||
```bash
|
||||
$ make install
|
||||
make install
|
||||
```
|
||||
|
||||
To install using `sudo` switch `-E` must be used:
|
||||
|
||||
```bash
|
||||
$ sudo -E make install
|
||||
sudo -E make install
|
||||
```
|
||||
|
||||
To install all but a few of the available utilities:
|
||||
|
||||
```bash
|
||||
$ make SKIP_UTILS='UTILITY_1 UTILITY_2' install
|
||||
make SKIP_UTILS='UTILITY_1 UTILITY_2' install
|
||||
```
|
||||
|
||||
To install only a few of the available utilities:
|
||||
|
||||
```bash
|
||||
$ make UTILS='UTILITY_1 UTILITY_2' install
|
||||
make UTILS='UTILITY_1 UTILITY_2' install
|
||||
```
|
||||
|
||||
To install every program with a prefix (e.g. uu-echo uu-cat):
|
||||
|
||||
```bash
|
||||
$ make PROG_PREFIX=PREFIX_GOES_HERE install
|
||||
make PROG_PREFIX=PREFIX_GOES_HERE install
|
||||
```
|
||||
|
||||
To install the multicall binary:
|
||||
|
||||
```bash
|
||||
$ make MULTICALL=y install
|
||||
make MULTICALL=y install
|
||||
```
|
||||
|
||||
Set install parent directory (default value is /usr/local):
|
||||
|
||||
```bash
|
||||
# DESTDIR is also supported
|
||||
$ make PREFIX=/my/path install
|
||||
make PREFIX=/my/path install
|
||||
```
|
||||
|
||||
Installing with `make` installs shell completions for all installed utilities
|
||||
|
@ -203,6 +205,7 @@ The `coreutils` binary can generate completions for the `bash`, `elvish`, `fish`
|
|||
and `zsh` shells. It prints the result to stdout.
|
||||
|
||||
The syntax is:
|
||||
|
||||
```bash
|
||||
cargo run completion <utility> <shell>
|
||||
```
|
||||
|
@ -220,106 +223,107 @@ Un-installation differs depending on how you have installed uutils. If you used
|
|||
Cargo to install, use Cargo to uninstall. If you used GNU Make to install, use
|
||||
Make to uninstall.
|
||||
|
||||
### Cargo
|
||||
### Uninstall with Cargo
|
||||
|
||||
To uninstall uutils:
|
||||
|
||||
```bash
|
||||
$ cargo uninstall uutils
|
||||
cargo uninstall uutils
|
||||
```
|
||||
|
||||
### GNU Make
|
||||
### Uninstall with GNU Make
|
||||
|
||||
To uninstall all utilities:
|
||||
|
||||
```bash
|
||||
$ make uninstall
|
||||
make uninstall
|
||||
```
|
||||
|
||||
To uninstall every program with a set prefix:
|
||||
|
||||
```bash
|
||||
$ make PROG_PREFIX=PREFIX_GOES_HERE uninstall
|
||||
make PROG_PREFIX=PREFIX_GOES_HERE uninstall
|
||||
```
|
||||
|
||||
To uninstall the multicall binary:
|
||||
|
||||
```bash
|
||||
$ make MULTICALL=y uninstall
|
||||
make MULTICALL=y uninstall
|
||||
```
|
||||
|
||||
To uninstall from a custom parent directory:
|
||||
|
||||
```bash
|
||||
# DESTDIR is also supported
|
||||
$ make PREFIX=/my/path uninstall
|
||||
make PREFIX=/my/path uninstall
|
||||
```
|
||||
|
||||
<!-- ANCHOR_END: build (this mark is needed for mdbook) -->
|
||||
|
||||
## Testing
|
||||
|
||||
Testing can be done using either Cargo or `make`.
|
||||
|
||||
### Cargo
|
||||
### Testing with Cargo
|
||||
|
||||
Just like with building, we follow the standard procedure for testing using
|
||||
Cargo:
|
||||
|
||||
```bash
|
||||
$ cargo test
|
||||
cargo test
|
||||
```
|
||||
|
||||
By default, `cargo test` only runs the common programs. To run also platform
|
||||
specific tests, run:
|
||||
|
||||
```bash
|
||||
$ cargo test --features unix
|
||||
cargo test --features unix
|
||||
```
|
||||
|
||||
If you would prefer to test a select few utilities:
|
||||
|
||||
```bash
|
||||
$ cargo test --features "chmod mv tail" --no-default-features
|
||||
cargo test --features "chmod mv tail" --no-default-features
|
||||
```
|
||||
|
||||
If you also want to test the core utilities:
|
||||
|
||||
```bash
|
||||
$ cargo test -p uucore -p coreutils
|
||||
cargo test -p uucore -p coreutils
|
||||
```
|
||||
|
||||
To debug:
|
||||
|
||||
```bash
|
||||
$ gdb --args target/debug/coreutils ls
|
||||
gdb --args target/debug/coreutils ls
|
||||
(gdb) b ls.rs:79
|
||||
(gdb) run
|
||||
```
|
||||
|
||||
### GNU Make
|
||||
### Testing with GNU Make
|
||||
|
||||
To simply test all available utilities:
|
||||
|
||||
```bash
|
||||
$ make test
|
||||
make test
|
||||
```
|
||||
|
||||
To test all but a few of the available utilities:
|
||||
|
||||
```bash
|
||||
$ make SKIP_UTILS='UTILITY_1 UTILITY_2' test
|
||||
make SKIP_UTILS='UTILITY_1 UTILITY_2' test
|
||||
```
|
||||
|
||||
To test only a few of the available utilities:
|
||||
|
||||
```bash
|
||||
$ make UTILS='UTILITY_1 UTILITY_2' test
|
||||
make UTILS='UTILITY_1 UTILITY_2' test
|
||||
```
|
||||
|
||||
To include tests for unimplemented behavior:
|
||||
|
||||
```bash
|
||||
$ make UTILS='UTILITY_1 UTILITY_2' SPEC=y test
|
||||
make UTILS='UTILITY_1 UTILITY_2' SPEC=y test
|
||||
```
|
||||
|
||||
### Run Busybox Tests
|
||||
|
@ -330,19 +334,19 @@ requires `make`.
|
|||
To run busybox tests for all utilities for which busybox has tests
|
||||
|
||||
```bash
|
||||
$ make busytest
|
||||
make busytest
|
||||
```
|
||||
|
||||
To run busybox tests for a few of the available utilities
|
||||
|
||||
```bash
|
||||
$ make UTILS='UTILITY_1 UTILITY_2' busytest
|
||||
make UTILS='UTILITY_1 UTILITY_2' busytest
|
||||
```
|
||||
|
||||
To pass an argument like "-v" to the busybox test runtime
|
||||
|
||||
```bash
|
||||
$ make UTILS='UTILITY_1 UTILITY_2' RUNTEST_ARGS='-v' busytest
|
||||
make UTILS='UTILITY_1 UTILITY_2' RUNTEST_ARGS='-v' busytest
|
||||
```
|
||||
|
||||
### Comparing with GNU
|
||||
|
@ -356,14 +360,14 @@ breakdown of the GNU test results of the main branch can be found
|
|||
To run locally:
|
||||
|
||||
```bash
|
||||
$ bash util/build-gnu.sh
|
||||
$ bash util/run-gnu-test.sh
|
||||
bash util/build-gnu.sh
|
||||
bash util/run-gnu-test.sh
|
||||
# To run a single test:
|
||||
$ bash util/run-gnu-test.sh tests/touch/not-owner.sh # for example
|
||||
bash util/run-gnu-test.sh tests/touch/not-owner.sh # for example
|
||||
# To run several tests:
|
||||
$ bash util/run-gnu-test.sh tests/touch/not-owner.sh tests/rm/no-give-up.sh # for example
|
||||
bash util/run-gnu-test.sh tests/touch/not-owner.sh tests/rm/no-give-up.sh # for example
|
||||
# If this is a perl (.pl) test, to run in debug:
|
||||
$ DEBUG=1 bash util/run-gnu-test.sh tests/misc/sm3sum.pl
|
||||
DEBUG=1 bash util/run-gnu-test.sh tests/misc/sm3sum.pl
|
||||
```
|
||||
|
||||
Note that it relies on individual utilities (not the multicall binary).
|
||||
|
@ -387,7 +391,6 @@ To improve the GNU compatibility, the following process is recommended:
|
|||
1. Start to modify the Rust implementation to match the expected behavior
|
||||
1. Add a test to make sure that we don't regress (our test suite is super quick)
|
||||
|
||||
|
||||
## Contributing
|
||||
|
||||
To contribute to uutils, please see [CONTRIBUTING](CONTRIBUTING.md).
|
||||
|
@ -395,11 +398,12 @@ To contribute to uutils, please see [CONTRIBUTING](CONTRIBUTING.md).
|
|||
## Utilities
|
||||
|
||||
Please note that this is not fully accurate:
|
||||
* Some new options can be added / removed in the GNU implementation;
|
||||
* Some error management might be missing;
|
||||
* Some behaviors might be different.
|
||||
|
||||
See https://github.com/uutils/coreutils/issues/3336 for the main meta bugs
|
||||
- Some new options can be added / removed in the GNU implementation;
|
||||
- Some error management might be missing;
|
||||
- Some behaviors might be different.
|
||||
|
||||
See <https://github.com/uutils/coreutils/issues/3336> for the main meta bugs
|
||||
(many are missing).
|
||||
|
||||
| Done | WIP |
|
||||
|
|
|
@ -1,3 +1,3 @@
|
|||
# Build from source
|
||||
|
||||
{{#include ../../README.md:build }}
|
||||
{{#include ../../README.md:build }}
|
||||
|
|
|
@ -1 +1,3 @@
|
|||
{{ #include ../../CONTRIBUTING.md }}
|
||||
<!-- markdownlint-disable MD041 -->
|
||||
|
||||
{{ #include ../../CONTRIBUTING.md }}
|
||||
|
|
|
@ -1,5 +1,9 @@
|
|||
<!-- markdownlint-disable MD041 -->
|
||||
|
||||
{{#include logo.svg}}
|
||||
|
||||
<!-- markdownlint-disable MD033 -->
|
||||
|
||||
<style>
|
||||
/* Make the logo a bit bigger and center */
|
||||
#logo {
|
||||
|
|
|
@ -11,6 +11,7 @@ You can also [build uutils from source](/build.md).
|
|||
<!-- toc -->
|
||||
|
||||
## Cargo
|
||||
|
||||
[![crates.io package](https://repology.org/badge/version-for-repo/crates_io/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||
|
||||
```bash
|
||||
|
@ -23,6 +24,7 @@ cargo install coreutils --features windows
|
|||
```
|
||||
|
||||
## Linux
|
||||
|
||||
### Alpine
|
||||
|
||||
[![Alpine Linux Edge package](https://repology.org/badge/version-for-repo/alpine_edge/uutils-coreutils.svg)](https://pkgs.alpinelinux.org/packages?name=uutils-coreutils)
|
||||
|
@ -62,6 +64,7 @@ emerge -pv sys-apps/uutils-coreutils
|
|||
```
|
||||
|
||||
### Manjaro
|
||||
|
||||
![Manjaro Stable package](https://repology.org/badge/version-for-repo/manjaro_stable/uutils-coreutils.svg)
|
||||
[![Manjaro Testing package](https://repology.org/badge/version-for-repo/manjaro_testing/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||
[![Manjaro Unstable package](https://repology.org/badge/version-for-repo/manjaro_unstable/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||
|
@ -73,6 +76,7 @@ pamac install uutils-coreutils
|
|||
```
|
||||
|
||||
### NixOS
|
||||
|
||||
[![nixpkgs unstable package](https://repology.org/badge/version-for-repo/nix_unstable/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||
|
||||
```bash
|
||||
|
@ -80,6 +84,7 @@ nix-env -iA nixos.uutils-coreutils
|
|||
```
|
||||
|
||||
### OpenMandriva Lx
|
||||
|
||||
[![openmandriva cooker package](https://repology.org/badge/version-for-repo/openmandriva_cooker/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||
|
||||
```bash
|
||||
|
@ -101,6 +106,7 @@ export PATH=/usr/lib/cargo/bin/coreutils:$PATH
|
|||
## MacOS
|
||||
|
||||
### Homebrew
|
||||
|
||||
[![Homebrew package](https://repology.org/badge/version-for-repo/homebrew/uutils-coreutils.svg)](https://formulae.brew.sh/formula/uutils-coreutils)
|
||||
|
||||
```bash
|
||||
|
@ -108,6 +114,7 @@ brew install uutils-coreutils
|
|||
```
|
||||
|
||||
### MacPorts
|
||||
|
||||
[![MacPorts package](https://repology.org/badge/version-for-repo/macports/uutils-coreutils.svg)](https://ports.macports.org/port/coreutils-uutils/)
|
||||
|
||||
```
|
||||
|
@ -115,6 +122,7 @@ port install coreutils-uutils
|
|||
```
|
||||
|
||||
## FreeBSD
|
||||
|
||||
[![FreeBSD port](https://repology.org/badge/version-for-repo/freebsd/uutils-coreutils.svg)](https://repology.org/project/uutils-coreutils/versions)
|
||||
|
||||
```sh
|
||||
|
@ -124,6 +132,7 @@ pkg install uutils
|
|||
## Windows
|
||||
|
||||
### Scoop
|
||||
|
||||
[![Scoop package](https://repology.org/badge/version-for-repo/scoop/uutils-coreutils.svg)](https://scoop.sh/#/apps?q=uutils-coreutils&s=0&d=1&o=true)
|
||||
|
||||
```bash
|
||||
|
@ -136,4 +145,6 @@ scoop install uutils-coreutils
|
|||
|
||||
[![AUR package](https://repology.org/badge/version-for-repo/aur/coreutils-hybrid.svg)](https://aur.archlinux.org/packages/coreutils-hybrid)
|
||||
|
||||
A GNU coreutils / uutils coreutils hybrid package. Uses stable uutils programs mixed with GNU counterparts if uutils counterpart is unfinished or buggy.
|
||||
A GNU coreutils / uutils coreutils hybrid package. Uses stable uutils
|
||||
programs mixed with GNU counterparts if uutils counterpart is
|
||||
unfinished or buggy.
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
# Multi-call binary
|
||||
# Multi-call binary
|
||||
|
||||
uutils includes a multi-call binary from which the utils can be invoked. This
|
||||
reduces the binary size of the binary and can be useful for portability.
|
||||
|
||||
|
@ -12,6 +13,7 @@ coreutils [util] [util options]
|
|||
The `--help` flag will print a list of available utils.
|
||||
|
||||
## Example
|
||||
```
|
||||
|
||||
```shell
|
||||
coreutils ls -l
|
||||
```
|
||||
```
|
||||
|
|
|
@ -1,5 +1,7 @@
|
|||
# GNU Test Coverage
|
||||
|
||||
<!-- markdownlint-disable MD033 -->
|
||||
|
||||
uutils is actively tested against the GNU coreutils test suite. The results
|
||||
below are automatically updated every day.
|
||||
|
||||
|
|
|
@ -4,11 +4,8 @@
|
|||
arch
|
||||
```
|
||||
|
||||
|
||||
Display machine architecture
|
||||
|
||||
|
||||
## After Help
|
||||
|
||||
Determine architecture name for current machine.
|
||||
|
||||
|
|
|
@ -7,8 +7,8 @@ base32 [OPTION]... [FILE]
|
|||
encode/decode data and print to standard output
|
||||
With no FILE, or when FILE is -, read standard input.
|
||||
|
||||
The data are encoded as described for the base32 alphabet in RFC
|
||||
4648. When decoding, the input may contain newlines in addition
|
||||
The data are encoded as described for the base32 alphabet in RFC 4648.
|
||||
When decoding, the input may contain newlines in addition
|
||||
to the bytes of the formal base32 alphabet. Use --ignore-garbage
|
||||
to attempt to recover from any other non-alphabet bytes in the
|
||||
encoded stream.
|
||||
|
|
|
@ -7,8 +7,8 @@ base64 [OPTION]... [FILE]
|
|||
encode/decode data and print to standard output
|
||||
With no FILE, or when FILE is -, read standard input.
|
||||
|
||||
The data are encoded as described for the base64 alphabet in RFC
|
||||
3548. When decoding, the input may contain newlines in addition
|
||||
The data are encoded as described for the base64 alphabet in RFC 3548.
|
||||
When decoding, the input may contain newlines in addition
|
||||
to the bytes of the formal base64 alphabet. Use --ignore-garbage
|
||||
to attempt to recover from any other non-alphabet bytes in the
|
||||
encoded stream.
|
||||
|
|
|
@ -7,5 +7,5 @@ chcon [OPTION]... [-u USER] [-r ROLE] [-l RANGE] [-t TYPE] FILE...
|
|||
chcon [OPTION]... --reference=RFILE FILE...
|
||||
```
|
||||
|
||||
Change the SELinux security context of each FILE to CONTEXT.
|
||||
With --reference, change the security context of each FILE to that of RFILE.
|
||||
Change the SELinux security context of each FILE to CONTEXT.
|
||||
With --reference, change the security context of each FILE to that of RFILE.
|
||||
|
|
|
@ -1,18 +1,18 @@
|
|||
<!-- markdownlint-disable first-line-heading -->
|
||||
<!-- spell-checker:ignore (markdown) markdownlint -->
|
||||
|
||||
## Feature list
|
||||
# Feature list
|
||||
|
||||
<!-- spell-checker:ignore (options) linkgs reflink -->
|
||||
|
||||
### To Do
|
||||
## To Do
|
||||
|
||||
- [ ] cli-symbolic-links
|
||||
- [ ] context
|
||||
- [ ] copy-contents
|
||||
- [ ] sparse
|
||||
|
||||
### Completed
|
||||
## Completed
|
||||
|
||||
- [x] archive
|
||||
- [x] attributes-only
|
||||
|
|
|
@ -1,46 +1,45 @@
|
|||
## Benchmarking cut
|
||||
# Benchmarking cut
|
||||
|
||||
### Performance profile
|
||||
## Performance profile
|
||||
|
||||
In normal use cases a significant amount of the total execution time of `cut`
|
||||
is spent performing I/O. When invoked with the `-f` option (cut fields) some
|
||||
CPU time is spent on detecting fields (in `Searcher::next`). Other than that
|
||||
some small amount of CPU time is spent on breaking the input stream into lines.
|
||||
|
||||
|
||||
### How to
|
||||
## How to
|
||||
|
||||
When fixing bugs or adding features you might want to compare
|
||||
performance before and after your code changes.
|
||||
|
||||
- `hyperfine` can be used to accurately measure and compare the total
|
||||
- `hyperfine` can be used to accurately measure and compare the total
|
||||
execution time of one or more commands.
|
||||
|
||||
```
|
||||
$ cargo build --release --package uu_cut
|
||||
```shell
|
||||
cargo build --release --package uu_cut
|
||||
|
||||
$ hyperfine -w3 "./target/release/cut -f2-4,8 -d' ' input.txt" "cut -f2-4,8 -d' ' input.txt"
|
||||
hyperfine -w3 "./target/release/cut -f2-4,8 -d' ' input.txt" "cut -f2-4,8 -d' ' input.txt"
|
||||
```
|
||||
|
||||
You can put those two commands in a shell script to be sure that you don't
|
||||
forget to build after making any changes.
|
||||
|
||||
When optimizing or fixing performance regressions seeing the number of times a
|
||||
function is called, and the amount of time it takes can be useful.
|
||||
|
||||
- `cargo flamegraph` generates flame graphs from function level metrics it records using `perf` or `dtrace`
|
||||
- `cargo flamegraph` generates flame graphs from function level metrics it records using `perf` or `dtrace`
|
||||
|
||||
```
|
||||
$ cargo flamegraph --bin cut --package uu_cut -- -f1,3-4 input.txt > /dev/null
|
||||
```shell
|
||||
cargo flamegraph --bin cut --package uu_cut -- -f1,3-4 input.txt > /dev/null
|
||||
```
|
||||
|
||||
|
||||
### What to benchmark
|
||||
## What to benchmark
|
||||
|
||||
There are four different performance paths in `cut` to benchmark.
|
||||
|
||||
- Byte ranges `-c`/`--characters` or `-b`/`--bytes` e.g. `cut -c 2,4,6-`
|
||||
- Byte ranges with output delimiters e.g. `cut -c 4- --output-delimiter=/`
|
||||
- Fields e.g. `cut -f -4`
|
||||
- Fields with output delimiters e.g. `cut -f 7-10 --output-delimiter=:`
|
||||
- Byte ranges `-c`/`--characters` or `-b`/`--bytes` e.g. `cut -c 2,4,6-`
|
||||
- Byte ranges with output delimiters e.g. `cut -c 4- --output-delimiter=/`
|
||||
- Fields e.g. `cut -f -4`
|
||||
- Fields with output delimiters e.g. `cut -f 7-10 --output-delimiter=:`
|
||||
|
||||
Choose a test input file with large number of lines so that program startup time does not significantly affect the benchmark.
|
||||
|
|
|
@ -45,7 +45,7 @@ be roughly equivalent to the total bytes copied (`blocksize` x `count`).
|
|||
|
||||
Some useful invocations for testing would be the following:
|
||||
|
||||
```
|
||||
```shell
|
||||
hyperfine "./target/release/dd bs=4k count=1000000 < /dev/zero > /dev/null"
|
||||
hyperfine "./target/release/dd bs=1M count=20000 < /dev/zero > /dev/null"
|
||||
hyperfine "./target/release/dd bs=1G count=10 < /dev/zero > /dev/null"
|
||||
|
@ -57,7 +57,7 @@ Typically you would choose a small blocksize for measuring the performance of
|
|||
typically does some set amount of work per block which only depends on the size
|
||||
of the block if conversions are used.
|
||||
|
||||
As an example, https://github.com/uutils/coreutils/pull/3600 made a change to
|
||||
As an example, <https://github.com/uutils/coreutils/pull/3600> made a change to
|
||||
reuse the same buffer between block copies, avoiding the need to reallocate a
|
||||
new block of memory for each copy. The impact of that change mostly had an
|
||||
impact on large block size copies because those are the circumstances where the
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
<!-- spell-checker:ignore convs iseek oseek -->
|
||||
# dd
|
||||
|
||||
<!-- spell-checker:ignore convs iseek oseek -->
|
||||
|
||||
```
|
||||
dd [OPERAND]...
|
||||
dd OPTION
|
||||
|
@ -19,51 +20,53 @@ OPERANDS:
|
|||
conv=CONVS a comma-separated list of conversion options or
|
||||
(for legacy reasons) file flags.
|
||||
count=N stop reading input after N ibs-sized read operations rather
|
||||
than proceeding until EOF. See iflag=count_bytes if stopping
|
||||
than proceeding until EOF. See iflag=count_bytes if stopping
|
||||
after N bytes is preferred
|
||||
ibs=N the size of buffer used for reads (default: 512)
|
||||
if=FILE the file used for input. When not specified, stdin is used instead
|
||||
iflag=FLAGS a comma-separated list of input flags which specify how the input
|
||||
source is treated. FLAGS may be any of the input-flags or
|
||||
source is treated. FLAGS may be any of the input-flags or
|
||||
general-flags specified below.
|
||||
skip=N (or iseek=N) skip N ibs-sized records into input before beginning
|
||||
copy/convert operations. See iflag=seek_bytes if seeking N bytes
|
||||
skip=N (or iseek=N) skip N ibs-sized records into input before beginning
|
||||
copy/convert operations. See iflag=seek_bytes if seeking N bytes
|
||||
is preferred.
|
||||
obs=N the size of buffer used for writes (default: 512)
|
||||
of=FILE the file used for output. When not specified, stdout is used
|
||||
of=FILE the file used for output. When not specified, stdout is used
|
||||
instead
|
||||
oflag=FLAGS comma separated list of output flags which specify how the output
|
||||
source is treated. FLAGS may be any of the output flags or
|
||||
oflag=FLAGS comma separated list of output flags which specify how the output
|
||||
source is treated. FLAGS may be any of the output flags or
|
||||
general flags specified below
|
||||
seek=N (or oseek=N) seeks N obs-sized records into output before
|
||||
beginning copy/convert operations. See oflag=seek_bytes if
|
||||
seek=N (or oseek=N) seeks N obs-sized records into output before
|
||||
beginning copy/convert operations. See oflag=seek_bytes if
|
||||
seeking N bytes is preferred
|
||||
status=LEVEL controls whether volume and performance stats are written to
|
||||
status=LEVEL controls whether volume and performance stats are written to
|
||||
stderr.
|
||||
|
||||
When unspecified, dd will print stats upon completion. An example is below.
|
||||
|
||||
When unspecified, dd will print stats upon completion. An
|
||||
example is below.
|
||||
6+0 records in
|
||||
16+0 records out
|
||||
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.00057009 s, 14.4 MB/s
|
||||
The first two lines are the 'volume' stats and the final line is
|
||||
the 'performance' stats.
|
||||
The volume stats indicate the number of complete and partial
|
||||
ibs-sized reads, or obs-sized writes that took place during the
|
||||
copy. The format of the volume stats is
|
||||
<complete>+<partial>. If records have been truncated (see
|
||||
conv=block), the volume stats will contain the number of
|
||||
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.00057009 s,
|
||||
14.4 MB/s
|
||||
The first two lines are the 'volume' stats and the final line
|
||||
is the 'performance' stats.
|
||||
The volume stats indicate the number of complete and partial
|
||||
ibs-sized reads, or obs-sized writes that took place during
|
||||
the copy. The format of the volume stats is
|
||||
<complete>+<partial>. If records have been truncated (see
|
||||
conv=block), the volume stats will contain the number of
|
||||
truncated records.
|
||||
|
||||
|
||||
Possible LEVEL values are:
|
||||
progress: Print periodic performance stats as the copy
|
||||
progress: Print periodic performance stats as the copy
|
||||
proceeds.
|
||||
noxfer: Print final volume stats, but not performance stats.
|
||||
none: Do not print any stats.
|
||||
|
||||
Printing performance stats is also triggered by the INFO signal
|
||||
(where supported), or the USR1 signal. Setting the
|
||||
POSIXLY_CORRECT environment variable to any value (including an
|
||||
empty value) will cause the USR1 signal to be ignored.
|
||||
Printing performance stats is also triggered by the INFO signal
|
||||
(where supported), or the USR1 signal. Setting the
|
||||
POSIXLY_CORRECT environment variable to any value (including
|
||||
an empty value) will cause the USR1 signal to be ignored.
|
||||
|
||||
CONVERSION OPTIONS:
|
||||
|
||||
|
@ -71,15 +74,15 @@ CONVERSION OPTIONS:
|
|||
option. Implies conv=unblock.
|
||||
ebcdic convert from ASCII to EBCDIC. This is the inverse of the 'ascii'
|
||||
option. Implies conv=block.
|
||||
ibm convert from ASCII to EBCDIC, applying the conventions for '[', ']'
|
||||
ibm convert from ASCII to EBCDIC, applying the conventions for '[', ']'
|
||||
and '~' specified in POSIX. Implies conv=block.
|
||||
|
||||
ucase convert from lower-case to upper-case
|
||||
lcase converts from upper-case to lower-case.
|
||||
|
||||
block for each newline less than the size indicated by cbs=BYTES, remove
|
||||
the newline and pad with spaces up to cbs. Lines longer than cbs are
|
||||
truncated.
|
||||
block for each newline less than the size indicated by cbs=BYTES, remove
|
||||
the newline and pad with spaces up to cbs. Lines longer than cbs
|
||||
are truncated.
|
||||
unblock for each block of input of the size indicated by cbs=BYTES, remove
|
||||
right-trailing spaces and replace with a newline character.
|
||||
|
||||
|
@ -115,7 +118,7 @@ OUTPUT FLAGS:
|
|||
GENERAL FLAGS:
|
||||
|
||||
direct use direct I/O for data.
|
||||
directory fail unless the given input (if used as an iflag) or output (if used
|
||||
directory fail unless the given input (if used as an iflag) or output (if used
|
||||
as an oflag) is a directory.
|
||||
dsync use synchronized I/O for data.
|
||||
sync use synchronized I/O for data and metadata.
|
||||
|
|
|
@ -1,18 +1,18 @@
|
|||
## How to update the internal database
|
||||
# How to update the internal database
|
||||
|
||||
Create the test fixtures by writing the output of the GNU dircolors commands to the fixtures folder:
|
||||
|
||||
```
|
||||
$ dircolors --print-database > /PATH_TO_COREUTILS/tests/fixtures/dircolors/internal.expected
|
||||
$ dircolors --print-ls-colors > /PATH_TO_COREUTILS/tests/fixtures/dircolors/ls_colors.expected
|
||||
$ dircolors -b > /PATH_TO_COREUTILS/tests/fixtures/dircolors/bash_def.expected
|
||||
$ dircolors -c > /PATH_TO_COREUTILS/tests/fixtures/dircolors/csh_def.expected
|
||||
```shell
|
||||
dircolors --print-database > /PATH_TO_COREUTILS/tests/fixtures/dircolors/internal.expected
|
||||
dircolors --print-ls-colors > /PATH_TO_COREUTILS/tests/fixtures/dircolors/ls_colors.expected
|
||||
dircolors -b > /PATH_TO_COREUTILS/tests/fixtures/dircolors/bash_def.expected
|
||||
dircolors -c > /PATH_TO_COREUTILS/tests/fixtures/dircolors/csh_def.expected
|
||||
```
|
||||
|
||||
Run the tests:
|
||||
|
||||
```
|
||||
$ cargo test --features "dircolors" --no-default-features
|
||||
```shell
|
||||
cargo test --features "dircolors" --no-default-features
|
||||
```
|
||||
|
||||
Edit `/PATH_TO_COREUTILS/src/uu/dircolors/src/colors.rs` until the tests pass.
|
||||
|
|
|
@ -19,6 +19,6 @@ of 1000).
|
|||
|
||||
PATTERN allows some advanced exclusions. For example, the following syntaxes
|
||||
are supported:
|
||||
? will match only one character
|
||||
* will match zero or more characters
|
||||
{a,b} will match a or b
|
||||
`?` will match only one character
|
||||
`*` will match zero or more characters
|
||||
`{a,b}` will match a or b
|
||||
|
|
|
@ -10,7 +10,7 @@ Print the value of `EXPRESSION` to standard output
|
|||
## After help
|
||||
|
||||
Print the value of `EXPRESSION` to standard output. A blank line below
|
||||
separates increasing precedence groups.
|
||||
separates increasing precedence groups.
|
||||
|
||||
`EXPRESSION` may be:
|
||||
|
||||
|
@ -48,11 +48,13 @@ Comparisons are arithmetic if both ARGs are numbers, else lexicographical.
|
|||
Pattern matches return the string matched between \( and \) or null; if
|
||||
\( and \) are not used, they return the number of characters matched or 0.
|
||||
|
||||
Exit status is `0` if `EXPRESSION` is neither null nor `0`, `1` if `EXPRESSION` is null
|
||||
or `0`, `2` if `EXPRESSION` is syntactically invalid, and `3` if an error occurred.
|
||||
Exit status is `0` if `EXPRESSION` is neither null nor `0`, `1` if `EXPRESSION`
|
||||
is null or `0`, `2` if `EXPRESSION` is syntactically invalid, and `3` if an
|
||||
error occurred.
|
||||
|
||||
Environment variables:
|
||||
- `EXPR_DEBUG_TOKENS=1`: dump expression's tokens
|
||||
- `EXPR_DEBUG_RPN=1`: dump expression represented in reverse polish notation
|
||||
- `EXPR_DEBUG_SYA_STEP=1`: dump each parser step
|
||||
- `EXPR_DEBUG_AST=1`: dump expression represented abstract syntax tree
|
||||
|
||||
- `EXPR_DEBUG_TOKENS=1`: dump expression's tokens
|
||||
- `EXPR_DEBUG_RPN=1`: dump expression represented in reverse polish notation
|
||||
- `EXPR_DEBUG_SYA_STEP=1`: dump each parser step
|
||||
- `EXPR_DEBUG_AST=1`: dump expression represented abstract syntax tree
|
||||
|
|
|
@ -53,19 +53,19 @@ which I recommend reading if you want to add benchmarks to `factor`.
|
|||
so each sample takes a very short time, minimizing variability and
|
||||
maximizing the numbers of samples we can take in a given time.
|
||||
|
||||
2. Benchmarks are immutable (once merged in `uutils`)
|
||||
1. Benchmarks are immutable (once merged in `uutils`)
|
||||
|
||||
Modifying a benchmark means previously-collected values cannot meaningfully
|
||||
be compared, silently giving nonsensical results. If you must modify an
|
||||
existing benchmark, rename it.
|
||||
|
||||
3. Test common cases
|
||||
1. Test common cases
|
||||
|
||||
We are interested in overall performance, rather than specific edge-cases;
|
||||
use **reproducibly-randomized inputs**, sampling from either all possible
|
||||
input values or some subset of interest.
|
||||
|
||||
4. Use [`criterion`], `criterion::black_box`, ...
|
||||
1. Use [`criterion`], `criterion::black_box`, ...
|
||||
|
||||
`criterion` isn't perfect, but it is also much better than ad-hoc
|
||||
solutions in each benchmark.
|
||||
|
@ -103,7 +103,7 @@ characteristics:
|
|||
1. integer factoring algorithms are randomized, with large variance in
|
||||
execution time ;
|
||||
|
||||
2. various inputs also have large differences in factoring time, that
|
||||
1. various inputs also have large differences in factoring time, that
|
||||
corresponds to no natural, linear ordering of the inputs.
|
||||
|
||||
If (1) was untrue (i.e. if execution time wasn't random), we could faithfully
|
||||
|
|
|
@ -1,9 +1,11 @@
|
|||
## Benchmarking hashsum
|
||||
# Benchmarking hashsum
|
||||
|
||||
### To bench blake2
|
||||
## To bench blake2
|
||||
|
||||
Taken from: https://github.com/uutils/coreutils/pull/2296
|
||||
Taken from: <https://github.com/uutils/coreutils/pull/2296>
|
||||
|
||||
With a large file:
|
||||
$ hyperfine "./target/release/coreutils hashsum --b2sum large-file" "b2sum large-file"
|
||||
|
||||
```shell
|
||||
hyperfine "./target/release/coreutils hashsum --b2sum large-file" "b2sum large-file"
|
||||
```
|
||||
|
|
|
@ -5,23 +5,31 @@ GNU version of `head`, you can use a benchmarking tool like
|
|||
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
||||
running
|
||||
|
||||
sudo apt-get install hyperfine
|
||||
```shell
|
||||
sudo apt-get install hyperfine
|
||||
```
|
||||
|
||||
Next, build the `head` binary under the release profile:
|
||||
|
||||
cargo build --release -p uu_head
|
||||
```shell
|
||||
cargo build --release -p uu_head
|
||||
```
|
||||
|
||||
Now, get a text file to test `head` on. I used the *Complete Works of
|
||||
William Shakespeare*, which is in the public domain in the United States
|
||||
and most other parts of the world.
|
||||
|
||||
wget -O shakespeare.txt https://www.gutenberg.org/files/100/100-0.txt
|
||||
```shell
|
||||
wget -O shakespeare.txt https://www.gutenberg.org/files/100/100-0.txt
|
||||
```
|
||||
|
||||
This particular file has about 170,000 lines, each of which is no longer
|
||||
than 96 characters:
|
||||
|
||||
$ wc -lL shakespeare.txt
|
||||
170592 96 shakespeare.txt
|
||||
```shell
|
||||
$ wc -lL shakespeare.txt
|
||||
170592 96 shakespeare.txt
|
||||
```
|
||||
|
||||
You could use files of different shapes and sizes to test the
|
||||
performance of `head` in different situations. For a larger file, you
|
||||
|
@ -32,9 +40,11 @@ contains about 130 million lines.
|
|||
Finally, you can compare the performance of the two versions of `head`
|
||||
by running, for example,
|
||||
|
||||
hyperfine \
|
||||
"head -n 100000 shakespeare.txt" \
|
||||
"target/release/head -n 100000 shakespeare.txt"
|
||||
```shell
|
||||
hyperfine \
|
||||
"head -n 100000 shakespeare.txt" \
|
||||
"target/release/head -n 100000 shakespeare.txt"
|
||||
```
|
||||
|
||||
[0]: https://github.com/sharkdp/hyperfine
|
||||
[1]: https://www.wikidata.org/wiki/Wikidata:Database_download
|
||||
|
|
|
@ -17,11 +17,14 @@ A benchmark with `-j` and `-i` shows the following time:
|
|||
| libc | 25% | I/O and memory allocation. |
|
||||
|
||||
More detailed profiles can be obtained via [flame graphs](https://github.com/flamegraph-rs/flamegraph):
|
||||
```
|
||||
|
||||
```shell
|
||||
cargo flamegraph --bin join --package uu_join -- file1 file2 > /dev/null
|
||||
```
|
||||
|
||||
You may need to add the following lines to the top-level `Cargo.toml` to get full stack traces:
|
||||
```
|
||||
|
||||
```toml
|
||||
[profile.release]
|
||||
debug = true
|
||||
```
|
||||
|
@ -34,22 +37,26 @@ in practice many CSV datasets will function well after being sorted.
|
|||
|
||||
Like most of the utils, the recommended tool for benchmarking is [hyperfine](https://github.com/sharkdp/hyperfine).
|
||||
To benchmark your changes:
|
||||
- checkout the main branch (without your changes), do a `--release` build, and back up the executable produced at `target/release/join`
|
||||
- checkout your working branch (with your changes), do a `--release` build
|
||||
- run
|
||||
```
|
||||
hyperfine -w 5 "/path/to/main/branch/build/join file1 file2" "/path/to/working/branch/build/join file1 file2"
|
||||
```
|
||||
- you'll likely need to add additional options to both commands, such as a field separator, or if you're benchmarking some particular behavior
|
||||
- you can also optionally benchmark against GNU's join
|
||||
|
||||
- checkout the main branch (without your changes), do a `--release` build, and back up the executable produced at `target/release/join`
|
||||
- checkout your working branch (with your changes), do a `--release` build
|
||||
- run
|
||||
|
||||
```shell
|
||||
hyperfine -w 5 "/path/to/main/branch/build/join file1 file2" "/path/to/working/branch/build/join file1 file2"
|
||||
```
|
||||
|
||||
- you'll likely need to add additional options to both commands, such as a field separator, or if you're benchmarking some particular behavior
|
||||
- you can also optionally benchmark against GNU's join
|
||||
|
||||
## What to benchmark
|
||||
|
||||
The following options can have a non-trivial impact on performance:
|
||||
- `-a`/`-v` if one of the two files has significantly more lines than the other
|
||||
- `-j`/`-1`/`-2` cause work to be done to grab the appropriate field
|
||||
- `-i` adds a call to `to_ascii_lowercase()` that adds some time for allocating and dropping memory for the lowercase key
|
||||
- `--nocheck-order` causes some calls of `Input::compare` to be skipped
|
||||
|
||||
- `-a`/`-v` if one of the two files has significantly more lines than the other
|
||||
- `-j`/`-1`/`-2` cause work to be done to grab the appropriate field
|
||||
- `-i` adds a call to `to_ascii_lowercase()` that adds some time for allocating and dropping memory for the lowercase key
|
||||
- `--nocheck-order` causes some calls of `Input::compare` to be skipped
|
||||
|
||||
The content of the files being joined has a very significant impact on the performance.
|
||||
Things like how long each line is, how many fields there are, how long the key fields are, how many lines there are, how many lines can be joined, and how many lines each line can be joined with all change the behavior of the hotpaths.
|
||||
|
|
|
@ -9,13 +9,13 @@ Run `cargo build --release` before benchmarking after you make a change!
|
|||
|
||||
## Simple recursive ls
|
||||
|
||||
- Get a large tree, for example linux kernel source tree.
|
||||
- Benchmark simple recursive ls with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -R tree > /dev/null"`.
|
||||
- Get a large tree, for example linux kernel source tree.
|
||||
- Benchmark simple recursive ls with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -R tree > /dev/null"`.
|
||||
|
||||
## Recursive ls with all and long options
|
||||
|
||||
- Same tree as above
|
||||
- Benchmark recursive ls with -al -R options with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/null"`.
|
||||
- Same tree as above
|
||||
- Benchmark recursive ls with -al -R options with hyperfine: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/null"`.
|
||||
|
||||
## Comparing with GNU ls
|
||||
|
||||
|
@ -29,6 +29,7 @@ Example: `hyperfine --warmup 2 "target/release/coreutils ls -al -R tree > /dev/n
|
|||
This can also be used to compare with version of ls built before your changes to ensure your change does not regress this.
|
||||
|
||||
Here is a `bash` script for doing this comparison:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
cargo build --no-default-features --features ls --release
|
||||
|
@ -46,11 +47,13 @@ hyperfine "ls $args" "target/release/coreutils ls $args"
|
|||
## Cargo Flamegraph
|
||||
|
||||
With Cargo Flamegraph you can easily make a flamegraph of `ls`:
|
||||
|
||||
```bash
|
||||
cargo flamegraph --cmd coreutils -- ls [additional parameters]
|
||||
```
|
||||
|
||||
However, if the `-R` option is given, the output becomes pretty much useless due to recursion. We can fix this by merging all the direct recursive calls with `uniq`, below is a `bash` script that does this.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
cargo build --release --no-default-features --features ls
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
<!-- spell-checker:ignore ugoa -->
|
||||
# mkdir
|
||||
|
||||
<!-- spell-checker:ignore ugoa -->
|
||||
|
||||
```
|
||||
mkdir [OPTION]... [USER]
|
||||
```
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
<!-- spell-checker:ignore N'th M'th -->
|
||||
# numfmt
|
||||
|
||||
<!-- spell-checker:ignore N'th M'th -->
|
||||
|
||||
```
|
||||
numfmt [OPTION]... [NUMBER]...
|
||||
```
|
||||
|
@ -10,24 +11,25 @@ Convert numbers from/to human-readable strings
|
|||
## After Help
|
||||
|
||||
`UNIT` options:
|
||||
- `none`: no auto-scaling is done; suffixes will trigger an error
|
||||
- `auto`: accept optional single/two letter suffix:
|
||||
|
||||
1K = 1000, 1Ki = 1024, 1M = 1000000, 1Mi = 1048576,
|
||||
- `none`: no auto-scaling is done; suffixes will trigger an error
|
||||
- `auto`: accept optional single/two letter suffix:
|
||||
|
||||
- `si`: accept optional single letter suffix:
|
||||
1K = 1000, 1Ki = 1024, 1M = 1000000, 1Mi = 1048576,
|
||||
|
||||
1K = 1000, 1M = 1000000, ...
|
||||
- `si`: accept optional single letter suffix:
|
||||
|
||||
- `iec`: accept optional single letter suffix:
|
||||
1K = 1000, 1M = 1000000, ...
|
||||
|
||||
1K = 1024, 1M = 1048576, ...
|
||||
- `iec`: accept optional single letter suffix:
|
||||
|
||||
1K = 1024, 1M = 1048576, ...
|
||||
|
||||
- `iec-i`: accept optional two-letter suffix:
|
||||
|
||||
1Ki = 1024, 1Mi = 1048576, ...
|
||||
1Ki = 1024, 1Mi = 1048576, ...
|
||||
|
||||
`FIELDS` supports `cut(1)` style field ranges:
|
||||
- `FIELDS` supports `cut(1)` style field ranges:
|
||||
|
||||
N N'th field, counted from 1
|
||||
N- from N'th field, to end of line
|
||||
|
|
|
@ -4,4 +4,4 @@
|
|||
realpath [OPTION]... FILE...
|
||||
```
|
||||
|
||||
Print the resolved path
|
||||
Print the resolved path
|
||||
|
|
|
@ -5,15 +5,21 @@ GNU version of `seq`, you can use a benchmarking tool like
|
|||
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
||||
running
|
||||
|
||||
sudo apt-get install hyperfine
|
||||
```shell
|
||||
sudo apt-get install hyperfine
|
||||
```
|
||||
|
||||
Next, build the `seq` binary under the release profile:
|
||||
|
||||
cargo build --release -p uu_seq
|
||||
```shell
|
||||
cargo build --release -p uu_seq
|
||||
```
|
||||
|
||||
Finally, you can compare the performance of the two versions of `head`
|
||||
by running, for example,
|
||||
|
||||
hyperfine "seq 1000000" "target/release/seq 1000000"
|
||||
```shell
|
||||
hyperfine "seq 1000000" "target/release/seq 1000000"
|
||||
```
|
||||
|
||||
[0]: https://github.com/sharkdp/hyperfine
|
||||
|
|
|
@ -21,10 +21,10 @@ To avoid distortions from IO, it is recommended to store input data in tmpfs.
|
|||
|
||||
## Without repetition
|
||||
|
||||
By default, `shuf` samples without repetition.
|
||||
By default, `shuf` samples without repetition.
|
||||
|
||||
To benchmark only the randomization and not IO, we can pass the `-i` flag with
|
||||
a range of numbers to randomly sample from. An example of a command that works
|
||||
To benchmark only the randomization and not IO, we can pass the `-i` flag with
|
||||
a range of numbers to randomly sample from. An example of a command that works
|
||||
well for testing:
|
||||
|
||||
```shell
|
||||
|
|
|
@ -8,4 +8,4 @@ shuf -i LO-HI [OPTION]...;
|
|||
|
||||
Shuffle the input by outputting a random permutation of input lines.
|
||||
Each output permutation is equally likely.
|
||||
With no FILE, or when FILE is -, read standard input.
|
||||
With no FILE, or when FILE is -, read standard input.
|
||||
|
|
|
@ -13,4 +13,4 @@ Pause for NUMBER seconds. SUFFIX may be 's' for seconds (the default),
|
|||
'm' for minutes, 'h' for hours or 'd' for days. Unlike most implementations
|
||||
that require NUMBER be an integer, here NUMBER may be an arbitrary floating
|
||||
point number. Given two or more arguments, pause for the amount of time
|
||||
specified by the sum of their values.
|
||||
specified by the sum of their values.
|
||||
|
|
|
@ -12,64 +12,59 @@ Run `cargo build --release` before benchmarking after you make a change!
|
|||
|
||||
## Sorting a wordlist
|
||||
|
||||
- Get a wordlist, for example with [words](<https://en.wikipedia.org/wiki/Words_(Unix)>) on Linux. The exact wordlist
|
||||
- Get a wordlist, for example with [words](<https://en.wikipedia.org/wiki/Words_(Unix)>) on Linux. The exact wordlist
|
||||
doesn't matter for performance comparisons. In this example I'm using `/usr/share/dict/american-english` as the wordlist.
|
||||
- Shuffle the wordlist by running `sort -R /usr/share/dict/american-english > shuffled_wordlist.txt`.
|
||||
- Benchmark sorting the wordlist with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -o output.txt"`.
|
||||
- Shuffle the wordlist by running `sort -R /usr/share/dict/american-english > shuffled_wordlist.txt`.
|
||||
- Benchmark sorting the wordlist with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -o output.txt"`.
|
||||
|
||||
## Sorting a wordlist with ignore_case
|
||||
|
||||
- Same wordlist as above
|
||||
- Benchmark sorting the wordlist ignoring the case with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -f -o output.txt"`.
|
||||
- Same wordlist as above
|
||||
- Benchmark sorting the wordlist ignoring the case with hyperfine: `hyperfine "target/release/coreutils sort shuffled_wordlist.txt -f -o output.txt"`.
|
||||
|
||||
## Sorting numbers
|
||||
|
||||
- Generate a list of numbers: `seq 0 100000 | sort -R > shuffled_numbers.txt`.
|
||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"`.
|
||||
- Generate a list of numbers: `seq 0 100000 | sort -R > shuffled_numbers.txt`.
|
||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"`.
|
||||
|
||||
## Sorting numbers with -g
|
||||
|
||||
- Same list of numbers as above.
|
||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -g -o output.txt"`.
|
||||
- Same list of numbers as above.
|
||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -g -o output.txt"`.
|
||||
|
||||
## Sorting numbers with SI prefixes
|
||||
|
||||
- Generate a list of numbers:
|
||||
<details>
|
||||
<summary>Rust script</summary>
|
||||
- Generate a list of numbers:
|
||||
|
||||
## Cargo.toml
|
||||
## Cargo.toml
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
rand = "0.8.3"
|
||||
```
|
||||
```toml
|
||||
[dependencies]
|
||||
rand = "0.8.3"
|
||||
```
|
||||
|
||||
## main.rs
|
||||
## main.rs
|
||||
|
||||
```rust
|
||||
use rand::prelude::*;
|
||||
fn main() {
|
||||
let suffixes = ['k', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'];
|
||||
let mut rng = thread_rng();
|
||||
for _ in 0..100000 {
|
||||
println!(
|
||||
"{}{}",
|
||||
rng.gen_range(0..1000000),
|
||||
suffixes.choose(&mut rng).unwrap()
|
||||
)
|
||||
}
|
||||
```rust
|
||||
use rand::prelude::*;
|
||||
fn main() {
|
||||
let suffixes = ['k', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'];
|
||||
let mut rng = thread_rng();
|
||||
for _ in 0..100000 {
|
||||
println!(
|
||||
"{}{}",
|
||||
rng.gen_range(0..1000000),
|
||||
suffixes.choose(&mut rng).unwrap()
|
||||
)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```
|
||||
## running
|
||||
|
||||
## running
|
||||
`cargo run > shuffled_numbers_si.txt`
|
||||
|
||||
`cargo run > shuffled_numbers_si.txt`
|
||||
|
||||
</details>
|
||||
|
||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers_si.txt -h -o output.txt"`.
|
||||
- Benchmark numeric sorting with hyperfine: `hyperfine "target/release/coreutils sort shuffled_numbers_si.txt -h -o output.txt"`.
|
||||
|
||||
## External sorting
|
||||
|
||||
|
@ -83,28 +78,28 @@ Example: Run `hyperfine './target/release/coreutils sort shuffled_wordlist.txt -
|
|||
|
||||
"Merge" sort merges already sorted files. It is a sub-step of external sorting, so benchmarking it separately may be helpful.
|
||||
|
||||
- Splitting `shuffled_wordlist.txt` can be achieved by running `split shuffled_wordlist.txt shuffled_wordlist_slice_ --additional-suffix=.txt`
|
||||
- Sort each part by running `for f in shuffled_wordlist_slice_*; do sort $f -o $f; done`
|
||||
- Benchmark merging by running `hyperfine "target/release/coreutils sort -m shuffled_wordlist_slice_*"`
|
||||
- Splitting `shuffled_wordlist.txt` can be achieved by running `split shuffled_wordlist.txt shuffled_wordlist_slice_ --additional-suffix=.txt`
|
||||
- Sort each part by running `for f in shuffled_wordlist_slice_*; do sort $f -o $f; done`
|
||||
- Benchmark merging by running `hyperfine "target/release/coreutils sort -m shuffled_wordlist_slice_*"`
|
||||
|
||||
## Check
|
||||
|
||||
When invoked with -c, we simply check if the input is already ordered. The input for benchmarking should be an already sorted file.
|
||||
|
||||
- Benchmark checking by running `hyperfine "target/release/coreutils sort -c sorted_wordlist.txt"`
|
||||
- Benchmark checking by running `hyperfine "target/release/coreutils sort -c sorted_wordlist.txt"`
|
||||
|
||||
## Stdout and stdin performance
|
||||
|
||||
Try to run the above benchmarks by piping the input through stdin (standard input) and redirect the
|
||||
output through stdout (standard output):
|
||||
|
||||
- Remove the input file from the arguments and add `cat [input_file] | ` at the beginning.
|
||||
- Remove `-o output.txt` and add `> output.txt` at the end.
|
||||
- Remove the input file from the arguments and add ```cat [input_file] |``` at the beginning.
|
||||
- Remove `-o output.txt` and add `> output.txt` at the end.
|
||||
|
||||
Example: `hyperfine "target/release/coreutils sort shuffled_numbers.txt -n -o output.txt"` becomes
|
||||
`hyperfine "cat shuffled_numbers.txt | target/release/coreutils sort -n > output.txt`
|
||||
|
||||
- Check that performance is similar to the original benchmark.
|
||||
- Check that performance is similar to the original benchmark.
|
||||
|
||||
## Comparing with GNU sort
|
||||
|
||||
|
@ -121,37 +116,34 @@ The above benchmarks use hyperfine to measure the speed of sorting. There are ho
|
|||
resource usage. One way to measure them is the `time` command. This is not to be confused with the `time` that is built in to the bash shell.
|
||||
You may have to install `time` first, then you have to run it with `/bin/time -v` to give it precedence over the built in `time`.
|
||||
|
||||
<details>
|
||||
<summary>Example output</summary>
|
||||
|
||||
Command being timed: "target/release/coreutils sort shuffled_numbers.txt"
|
||||
User time (seconds): 0.10
|
||||
System time (seconds): 0.00
|
||||
Percent of CPU this job got: 365%
|
||||
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
|
||||
Average shared text size (kbytes): 0
|
||||
Average unshared data size (kbytes): 0
|
||||
Average stack size (kbytes): 0
|
||||
Average total size (kbytes): 0
|
||||
Maximum resident set size (kbytes): 25360
|
||||
Average resident set size (kbytes): 0
|
||||
Major (requiring I/O) page faults: 0
|
||||
Minor (reclaiming a frame) page faults: 5802
|
||||
Voluntary context switches: 462
|
||||
Involuntary context switches: 73
|
||||
Swaps: 0
|
||||
File system inputs: 1184
|
||||
File system outputs: 0
|
||||
Socket messages sent: 0
|
||||
Socket messages received: 0
|
||||
Signals delivered: 0
|
||||
Page size (bytes): 4096
|
||||
Exit status: 0
|
||||
|
||||
</details>
|
||||
```plain
|
||||
Command being timed: "target/release/coreutils sort shuffled_numbers.txt"
|
||||
User time (seconds): 0.10
|
||||
System time (seconds): 0.00
|
||||
Percent of CPU this job got: 365%
|
||||
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
|
||||
Average shared text size (kbytes): 0
|
||||
Average unshared data size (kbytes): 0
|
||||
Average stack size (kbytes): 0
|
||||
Average total size (kbytes): 0
|
||||
Maximum resident set size (kbytes): 25360
|
||||
Average resident set size (kbytes): 0
|
||||
Major (requiring I/O) page faults: 0
|
||||
Minor (reclaiming a frame) page faults: 5802
|
||||
Voluntary context switches: 462
|
||||
Involuntary context switches: 73
|
||||
Swaps: 0
|
||||
File system inputs: 1184
|
||||
File system outputs: 0
|
||||
Socket messages sent: 0
|
||||
Socket messages received: 0
|
||||
Signals delivered: 0
|
||||
Page size (bytes): 4096
|
||||
Exit status: 0
|
||||
```
|
||||
|
||||
Useful metrics to look at could be:
|
||||
|
||||
- User time
|
||||
- Percent of CPU this job got
|
||||
- Maximum resident set size
|
||||
- User time
|
||||
- Percent of CPU this job got
|
||||
- Maximum resident set size
|
||||
|
|
|
@ -7,11 +7,15 @@ GNU version of `split`, you can use a benchmarking tool like
|
|||
[hyperfine][0]. On Ubuntu 18.04 or later, you can install `hyperfine` by
|
||||
running
|
||||
|
||||
sudo apt-get install hyperfine
|
||||
```
|
||||
sudo apt-get install hyperfine
|
||||
```
|
||||
|
||||
Next, build the `split` binary under the release profile:
|
||||
|
||||
cargo build --release -p uu_split
|
||||
```
|
||||
cargo build --release -p uu_split
|
||||
```
|
||||
|
||||
Now, get a text file to test `split` on. The `split` program has three
|
||||
main modes of operation: chunk by lines, chunk by bytes, and chunk by
|
||||
|
@ -21,7 +25,9 @@ operation. For example, to test chunking by bytes on a large input file,
|
|||
you can create a file named `testfile.txt` containing one million null
|
||||
bytes like this:
|
||||
|
||||
printf "%0.s\0" {1..1000000} > testfile.txt
|
||||
```
|
||||
printf "%0.s\0" {1..1000000} > testfile.txt
|
||||
```
|
||||
|
||||
For another example, to test chunking by bytes on a large real-world
|
||||
input file, you could download a [database dump of Wikidata][1] or some
|
||||
|
@ -31,10 +37,12 @@ file][2] contains about 130 million lines.
|
|||
Finally, you can compare the performance of the two versions of `split`
|
||||
by running, for example,
|
||||
|
||||
cd /tmp && hyperfine \
|
||||
--prepare 'rm x* || true' \
|
||||
"split -b 1000 testfile.txt" \
|
||||
"target/release/split -b 1000 testfile.txt"
|
||||
```
|
||||
cd /tmp && hyperfine \
|
||||
--prepare 'rm x* || true' \
|
||||
"split -b 1000 testfile.txt" \
|
||||
"target/release/split -b 1000 testfile.txt"
|
||||
```
|
||||
|
||||
Since `split` creates a lot of files on the filesystem, I recommend
|
||||
changing to the `/tmp` directory before running the benchmark. The
|
||||
|
|
|
@ -4,7 +4,8 @@
|
|||
|
||||
### Flags
|
||||
|
||||
* [ ] `--verbose` - created file printing is implemented, don't know if there is anything else
|
||||
* [ ] `--verbose` - created file printing is implemented, don't know
|
||||
if there is anything else
|
||||
|
||||
## Possible Optimizations
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
## Benchmarking `sum`
|
||||
# Benchmarking `sum`
|
||||
|
||||
<!-- spell-checker:ignore wikidatawiki -->
|
||||
|
||||
|
@ -7,17 +7,17 @@ Large sample files can for example be found in the [Wikipedia database dumps](ht
|
|||
After you have obtained and uncompressed such a file, you need to build `sum` in release mode
|
||||
|
||||
```shell
|
||||
$ cargo build --release --package uu_sum
|
||||
cargo build --release --package uu_sum
|
||||
```
|
||||
|
||||
and then you can time how it long it takes to checksum the file by running
|
||||
|
||||
```shell
|
||||
$ time ./target/release/sum wikidatawiki-20211001-pages-logging.xml
|
||||
time ./target/release/sum wikidatawiki-20211001-pages-logging.xml
|
||||
```
|
||||
|
||||
For more systematic measurements that include warm-ups, repetitions and comparisons, [Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful. For example, to compare this implementation to the one provided by your distribution run
|
||||
|
||||
```shell
|
||||
$ hyperfine "./target/release/sum wikidatawiki-20211001-pages-logging.xml" "sum wikidatawiki-20211001-pages-logging.xml"
|
||||
hyperfine "./target/release/sum wikidatawiki-20211001-pages-logging.xml" "sum wikidatawiki-20211001-pages-logging.xml"
|
||||
```
|
||||
|
|
|
@ -1,25 +1,36 @@
|
|||
## Benchmarking `tac`
|
||||
# Benchmarking `tac`
|
||||
|
||||
<!-- spell-checker:ignore wikidatawiki -->
|
||||
|
||||
`tac` is often used to process log files in reverse chronological order, i.e. from newer towards older entries. In this case, the performance target to yield results as fast as possible, i.e. without reading in the whole file that is to be reversed line-by-line. Therefore, a sensible benchmark is to read a large log file containing N lines and measure how long it takes to produce the last K lines from that file.
|
||||
`tac` is often used to process log files in reverse chronological order, i.e.
|
||||
from newer towards older entries. In this case, the performance target to yield
|
||||
results as fast as possible, i.e. without reading in the whole file that is to
|
||||
be reversed line-by-line. Therefore, a sensible benchmark is to read a large log
|
||||
file containing N lines and measure how long it takes to produce the last K
|
||||
lines from that file.
|
||||
|
||||
Large text files can for example be found in the [Wikipedia database dumps](https://dumps.wikimedia.org/wikidatawiki/latest/), usually sized at multiple gigabytes and comprising more than 100M lines.
|
||||
Large text files can for example be found in the
|
||||
[Wikipedia database dumps](https://dumps.wikimedia.org/wikidatawiki/latest/),
|
||||
usually sized at multiple gigabytes and comprising more than 100M lines.
|
||||
|
||||
After you have obtained and uncompressed such a file, you need to build `tac` in release mode
|
||||
After you have obtained and uncompressed such a file, you need to build `tac`
|
||||
in release mode
|
||||
|
||||
```shell
|
||||
$ cargo build --release --package uu_tac
|
||||
cargo build --release --package uu_tac
|
||||
```
|
||||
|
||||
and then you can time how it long it takes to extract the last 10M lines by running
|
||||
and then you can time how it long it takes to extract the last 10M lines by
|
||||
running
|
||||
|
||||
```shell
|
||||
$ /usr/bin/time ./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null
|
||||
/usr/bin/time ./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null
|
||||
```
|
||||
|
||||
For more systematic measurements that include warm-ups, repetitions and comparisons, [Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful. For example, to compare this implementation to the one provided by your distribution run
|
||||
For more systematic measurements that include warm-ups, repetitions and comparisons,
|
||||
[Hyperfine](https://github.com/sharkdp/hyperfine) can be helpful.
|
||||
For example, to compare this implementation to the one provided by your distribution run
|
||||
|
||||
```shell
|
||||
$ hyperfine "./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null" "/usr/bin/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null"
|
||||
hyperfine "./target/release/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null" "/usr/bin/tac wikidatawiki-20211001-pages-logging.xml | head -n10000000 >/dev/null"
|
||||
```
|
||||
|
|
|
@ -7,40 +7,59 @@
|
|||
* `--max-unchanged-stats`
|
||||
|
||||
Note:
|
||||
There's a stub for `--max-unchanged-stats` so GNU test-suite checks using it can run, however this flag has no functionality yet.
|
||||
There's a stub for `--max-unchanged-stats` so GNU test-suite checks using it
|
||||
can run, however this flag has no functionality yet.
|
||||
|
||||
### Platform support for `--follow` and `--retry`
|
||||
The `--follow=descriptor`, `--follow=name` and `--retry` flags have very good support on Linux (inotify backend).
|
||||
They work good enough on macOS/BSD (kqueue backend) with some tests failing due to differences of how kqueue works compared to inotify.
|
||||
Windows support is there in theory due to ReadDirectoryChanges support by the notify-crate, however these flags are completely untested on Windows.
|
||||
|
||||
The `--follow=descriptor`, `--follow=name` and `--retry` flags have very good
|
||||
support on Linux (inotify backend).
|
||||
They work good enough on macOS/BSD (kqueue backend) with some tests failing due
|
||||
to differences of how kqueue works compared to inotify.
|
||||
Windows support is there in theory due to ReadDirectoryChanges support by the
|
||||
notify-crate, however these flags are completely untested on Windows.
|
||||
|
||||
Note:
|
||||
The undocumented `---disable-inotify` flag is used to disable the inotify backend to test polling.
|
||||
However inotify is a Linux only backend and polling is now supported also for the other backends.
|
||||
Because of this, `disable-inotify` is now an alias to the new and more versatile flag name: `--use-polling`.
|
||||
The undocumented `---disable-inotify` flag is used to disable the inotify
|
||||
backend to test polling.
|
||||
However inotify is a Linux only backend and polling is now supported also
|
||||
for the other backends.
|
||||
Because of this, `disable-inotify` is now an alias to the new and more versatile
|
||||
flag name: `--use-polling`.
|
||||
|
||||
## Possible optimizations
|
||||
|
||||
* Don't read the whole file if not using `-f` and input is regular file. Read in chunks from the end going backwards, reading each individual chunk forward.
|
||||
* Don't read the whole file if not using `-f` and input is regular file.
|
||||
Read in chunks from the end going backwards, reading each individual chunk
|
||||
forward.
|
||||
* Reduce number of system calls to e.g. `fstat`
|
||||
* Improve resource management by adding more system calls to `inotify_rm_watch` when appropriate.
|
||||
* Improve resource management by adding more system calls to `inotify_rm_watch`
|
||||
when appropriate.
|
||||
|
||||
# GNU test-suite results (9.1.8-e08752)
|
||||
|
||||
The functionality for the test "gnu/tests/tail-2/follow-stdin.sh" is implemented.
|
||||
It fails because it is provoking closing a file descriptor with `tail -f <&-` and as part of a workaround, Rust's stdlib reopens closed FDs as `/dev/null` which means uu_tail cannot detect this.
|
||||
See also, e.g. the discussion at: https://github.com/uutils/coreutils/issues/2873
|
||||
It fails because it is provoking closing a file descriptor with `tail -f <&-`
|
||||
and as part of a workaround, Rust's stdlib reopens closed FDs as `/dev/null`
|
||||
which means uu_tail cannot detect this.
|
||||
See also, e.g. the discussion at:
|
||||
<https://github.com/uutils/coreutils/issues/2873>
|
||||
|
||||
The functionality for the test "gnu/tests/tail-2/inotify-rotate-resources.sh" is implemented.
|
||||
It fails with an error because it is using `strace` to look for calls to `inotify_add_watch` and `inotify_rm_watch`,
|
||||
The functionality for the test "gnu/tests/tail-2/inotify-rotate-resources.sh"
|
||||
is implemented.
|
||||
It fails with an error because it is using `strace` to look for calls to
|
||||
`inotify_add_watch` and `inotify_rm_watch`,
|
||||
however in uu_tail these system calls are invoked from a separate thread.
|
||||
If the GNU test would follow threads, i.e. use `strace -f`, this issue could be resolved.
|
||||
If the GNU test would follow threads, i.e. use `strace -f`, this issue could be
|
||||
resolved.
|
||||
|
||||
There are 5 tests which are fixed but do not (always) pass the test suite if it's run inside the CI.
|
||||
There are 5 tests which are fixed but do not (always) pass the test suite
|
||||
if it's run inside the CI.
|
||||
The reason for this is probably related to load/scheduling on the CI test VM.
|
||||
The tests in question are:
|
||||
- [x] `tail-2/F-vs-rename.sh`
|
||||
- [x] `tail-2/follow-name.sh`
|
||||
- [x] `tail-2/inotify-rotate.sh`
|
||||
- [x] `tail-2/overlay-headers.sh`
|
||||
- [x] `tail-2/retry.sh`
|
||||
|
||||
* [x] `tail-2/F-vs-rename.sh`
|
||||
* [x] `tail-2/follow-name.sh`
|
||||
* [x] `tail-2/inotify-rotate.sh`
|
||||
* [x] `tail-2/overlay-headers.sh`
|
||||
* [x] `tail-2/retry.sh`
|
||||
|
|
|
@ -1,5 +1,6 @@
|
|||
# truncate
|
||||
|
||||
```
|
||||
truncate [OPTION]... [FILE]...
|
||||
```
|
||||
|
||||
|
@ -22,4 +23,4 @@ file based on its current size:
|
|||
'<' => at most
|
||||
'>' => at least
|
||||
'/' => round down to multiple of
|
||||
'%' => round up to multiple of
|
||||
'%' => round up to multiple of
|
||||
|
|
|
@ -2,45 +2,59 @@
|
|||
|
||||
<!-- spell-checker:ignore (words) uuwc uucat largefile somefile Mshortlines moby lwcm cmds tablefmt -->
|
||||
|
||||
Much of what makes wc fast is avoiding unnecessary work. It has multiple strategies, depending on which data is requested.
|
||||
Much of what makes wc fast is avoiding unnecessary work. It has multiple strategies,
|
||||
depending on which data is requested.
|
||||
|
||||
## Strategies
|
||||
|
||||
### Counting bytes
|
||||
|
||||
In the case of `wc -c` the content of the input doesn't have to be inspected at all, only the size has to be known. That enables a few optimizations.
|
||||
In the case of `wc -c` the content of the input doesn't have to be inspected at all,
|
||||
only the size has to be known. That enables a few optimizations.
|
||||
|
||||
#### File size
|
||||
|
||||
If it can, wc reads the file size directly. This is not interesting to benchmark, except to see if it still works. Try `wc -c largefile`.
|
||||
If it can, wc reads the file size directly. This is not interesting to benchmark,
|
||||
except to see if it still works. Try `wc -c largefile`.
|
||||
|
||||
#### `splice()`
|
||||
|
||||
On Linux `splice()` is used to get the input's length while discarding it directly.
|
||||
|
||||
The best way I've found to generate a fast input to test `splice()` is to pipe the output of uutils `cat` into it. Note that GNU `cat` is slower and therefore less suitable, and that if a file is given as its input directly (as in `wc -c < largefile`) the first strategy kicks in. Try `uucat somefile | wc -c`.
|
||||
The best way I've found to generate a fast input to test `splice()` is to pipe the
|
||||
output of uutils `cat` into it. Note that GNU `cat` is slower and therefore less
|
||||
suitable, and that if a file is given as its input directly (as in
|
||||
`wc -c < largefile`) the first strategy kicks in. Try `uucat somefile | wc -c`.
|
||||
|
||||
### Counting lines
|
||||
|
||||
In the case of `wc -l` or `wc -cl` the input doesn't have to be decoded. It's read in chunks and the `bytecount` crate is used to count the newlines.
|
||||
In the case of `wc -l` or `wc -cl` the input doesn't have to be decoded. It's
|
||||
read in chunks and the `bytecount` crate is used to count the newlines.
|
||||
|
||||
It's useful to vary the line length in the input. GNU wc seems particularly bad at short lines.
|
||||
It's useful to vary the line length in the input. GNU wc seems particularly
|
||||
bad at short lines.
|
||||
|
||||
### Processing unicode
|
||||
|
||||
This is the most general strategy, and it's necessary for counting words, characters, and line lengths. Individual steps are still switched on and off depending on what must be reported.
|
||||
This is the most general strategy, and it's necessary for counting words,
|
||||
characters, and line lengths. Individual steps are still switched on and off
|
||||
depending on what must be reported.
|
||||
|
||||
Try varying which of the `-w`, `-m`, `-l` and `-L` flags are used. (The `-c` flag is unlikely to make a difference.)
|
||||
Try varying which of the `-w`, `-m`, `-l` and `-L` flags are used.
|
||||
(The `-c` flag is unlikely to make a difference.)
|
||||
|
||||
Passing no flags is equivalent to passing `-wcl`. That case should perhaps be given special attention as it's the default.
|
||||
Passing no flags is equivalent to passing `-wcl`. That case should perhaps be
|
||||
given special attention as it's the default.
|
||||
|
||||
## Generating files
|
||||
|
||||
To generate a file with many very short lines, run `yes | head -c50000000 > 25Mshortlines`.
|
||||
To generate a file with many very short lines, run
|
||||
`yes | head -c50000000 > 25Mshortlines`.
|
||||
|
||||
To get a file with less artificial contents, download a book from Project Gutenberg and concatenate it a lot of times:
|
||||
To get a file with less artificial contents, download a book from
|
||||
Project Gutenberg and concatenate it a lot of times:
|
||||
|
||||
```
|
||||
```shell
|
||||
wget https://www.gutenberg.org/files/2701/2701-0.txt -O moby.txt
|
||||
cat moby.txt moby.txt moby.txt moby.txt > moby4.txt
|
||||
cat moby4.txt moby4.txt moby4.txt moby4.txt > moby16.txt
|
||||
|
@ -49,7 +63,7 @@ cat moby16.txt moby16.txt moby16.txt moby16.txt > moby64.txt
|
|||
|
||||
And get one with lots of unicode too:
|
||||
|
||||
```
|
||||
```shell
|
||||
wget https://www.gutenberg.org/files/30613/30613-0.txt -O odyssey.txt
|
||||
cat odyssey.txt odyssey.txt odyssey.txt odyssey.txt > odyssey4.txt
|
||||
cat odyssey4.txt odyssey4.txt odyssey4.txt odyssey4.txt > odyssey16.txt
|
||||
|
@ -57,11 +71,14 @@ cat odyssey16.txt odyssey16.txt odyssey16.txt odyssey16.txt > odyssey64.txt
|
|||
cat odyssey64.txt odyssey64.txt odyssey64.txt odyssey64.txt > odyssey256.txt
|
||||
```
|
||||
|
||||
Finally, it's interesting to try a binary file. Look for one with `du -sh /usr/bin/* | sort -h`. On my system `/usr/bin/docker` is a good candidate as it's fairly large.
|
||||
Finally, it's interesting to try a binary file. Look for one with
|
||||
`du -sh /usr/bin/* | sort -h`. On my system `/usr/bin/docker` is a good
|
||||
candidate as it's fairly large.
|
||||
|
||||
## Running benchmarks
|
||||
|
||||
Use [`hyperfine`](https://github.com/sharkdp/hyperfine) to compare the performance. For example, `hyperfine 'wc somefile' 'uuwc somefile'`.
|
||||
Use [`hyperfine`](https://github.com/sharkdp/hyperfine) to compare the
|
||||
performance. For example, `hyperfine 'wc somefile' 'uuwc somefile'`.
|
||||
|
||||
If you want to get fancy and exhaustive, generate a table:
|
||||
|
||||
|
@ -77,12 +94,16 @@ If you want to get fancy and exhaustive, generate a table:
|
|||
| `wc -lwcmL <FILE>` | 1.1687 | 0.9169 | 4.4092 | 2.0663 |
|
||||
|
||||
Beware that:
|
||||
|
||||
- Results are fuzzy and change from run to run
|
||||
- You'll often want to check versions of uutils wc against each other instead of against GNU
|
||||
- You'll often want to check versions of uutils wc against each other instead
|
||||
of against GNU
|
||||
- This takes a lot of time to generate
|
||||
- This only shows the relative speedup, not the absolute time, which may be misleading if the time is very short
|
||||
- This only shows the relative speedup, not the absolute time, which may be
|
||||
misleading if the time is very short
|
||||
|
||||
Created by the following Python script:
|
||||
|
||||
```python
|
||||
import json
|
||||
import subprocess
|
||||
|
@ -121,4 +142,6 @@ for cmd in cmds:
|
|||
table.append(row)
|
||||
print(tabulate(table, [""] + files, tablefmt="github"))
|
||||
```
|
||||
(You may have to adjust the `bins` and `files` variables depending on your setup, and please do add other interesting cases to `cmds`.)
|
||||
|
||||
(You may have to adjust the `bins` and `files` variables depending on your
|
||||
setup, and please do add other interesting cases to `cmds`.)
|
||||
|
|
|
@ -1,5 +1,6 @@
|
|||
# wc
|
||||
|
||||
```
|
||||
wc [OPTION]... [FILE]...
|
||||
```
|
||||
|
||||
|
|
Loading…
Reference in a new issue