* ls: Implement --zero flag. (#2929)
This flag can be used to provide a easy machine parseable output from
ls, as discussed in the GNU bug report
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=49716.
There are some peculiarities with this flag:
- Current implementation of GNU ls of the `--zero` flag implies some
other flags. Those can be overridden by setting those flags after
`--zero` in the command line.
- This flag is not compatible with `--dired`. This patch is not 100%
compliant with GNU ls: GNU ls `--zero` will fail if `--dired` and
`-l` are set, while with this patch only `--dired` is needed for the
command to fail.
We also add `--dired` flag to the parser, with no additional behaviour
change.
Testing done:
```
$ bash util/build-gnu.sh
[...]
$ bash util/run-gnu-test.sh tests/ls/zero-option.sh
[...]
PASS: tests/ls/zero-option.sh
============================================================================
Testsuite summary for GNU coreutils 9.1.36-8ec11
============================================================================
# TOTAL: 1
# PASS: 1
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
```
* Use the US way to spell Behavior
* Fix formatting with cargo fmt -- tests/by-util/test_ls.rs
* Simplify --zero flag overriding logic by using index_of
Also, allow multiple --zero flags, as this is possible with GNU ls
command. Only the last one is taken into account.
Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
* Speed up sum by using reasonable read buffer sizes.
Use a 4K read buffer for each of the checksum functions, which seems
reasonable. This improves the performance of BSD checksums on
odyssey1024.txt from 399ms to 325ms on my laptop, and of SysV
checksums from 242ms to 67ms.
* Add BENCHMARKING.md for `sum`.
* Add comment regarding block sizes.
* Improve portability of BENCHMARKING.md
* Make `div_ceil` const and enhance comment.
Byte, character, and line counting can all be done on the raw bytes
of the incoming stream without decoding the Unicode characters. This
fact was previously exploited in specific fast paths for counting
characters and counting lines. This change unifies those fast paths into
a single shared fast paths, using const generics to specialize the
function for each use case. This has the benefit of making sure that all
combinations of these Unicode-oblivious fast paths benefit from the same
optimization.
On my laptop, this speeds up `wc -clm odyssey1024.txt` from 840ms to
120ms. I experimented with using a filter loop for line counting, but
continuing to use the bytecount crate came out ahead by a significant
margin.
When wc is invoked with only the -m flag, we only need to count the
number of Unicode characters in the input. In order to do so, we don't
actually need to decode the input bytes into characters. Rather, we can
simply count the number of non-continuation bytes in the UTF-8 stream,
since every character will contain exactly one non-continuation byte.
On my laptop, this speeds up `wc -m odyssey1024.txt` from 745ms to
109ms.
* wc: specialize scanning loop on settings.
The primary computational loop in wc (iterating over all the
characters and computing word lengths, etc) is configured by a
number of boolean options that control the text-scanning behavior.
If we monomorphize the code loop for each possible combination of
scanning configurations, the rustc is able to generate better code
for each instantiation, at the least by removing the conditional
checks on each iteration, and possibly by allowing things like
vectorization.
On my computer (aarch64/macos), I am seeing at least a 5% performance
improvement in release builds on all wc flag configurations
(other than those that were already specialized) against
odyssey1024.txt, with wc -l showing the greatest improvement at 15%.
* Reduce the size of the wc dispatch table by half.
By extracting the handling of hand-written fast-paths to the
same dispatch as the automatic specializations, we can avoid
needing to pass `show_bytes` as a const generic to
`word_count_from_reader_specialized`. Eliminating this parameter
halves the number of arms in the dispatch.