* wc: specialize scanning loop on settings.
The primary computational loop in wc (iterating over all the
characters and computing word lengths, etc) is configured by a
number of boolean options that control the text-scanning behavior.
If we monomorphize the code loop for each possible combination of
scanning configurations, the rustc is able to generate better code
for each instantiation, at the least by removing the conditional
checks on each iteration, and possibly by allowing things like
vectorization.
On my computer (aarch64/macos), I am seeing at least a 5% performance
improvement in release builds on all wc flag configurations
(other than those that were already specialized) against
odyssey1024.txt, with wc -l showing the greatest improvement at 15%.
* Reduce the size of the wc dispatch table by half.
By extracting the handling of hand-written fast-paths to the
same dispatch as the automatic specializations, we can avoid
needing to pass `show_bytes` as a const generic to
`word_count_from_reader_specialized`. Eliminating this parameter
halves the number of arms in the dispatch.
Previously, individual file sizes were used to compute the number width, which
would cause misalignment when the total has a greater number of digits, and is
different from the behavior of GNU wc
```
$ ./target/debug/wc -w -l -m -c -L deny.toml GNUmakefile
95 422 3110 3110 85 deny.toml
349 865 6996 6996 196 GNUmakefile
444 1287 10106 10106 196 total
$ wc -w -l -m -c -L deny.toml GNUmakefile
95 422 3110 3110 85 deny.toml
349 865 6996 6996 196 GNUmakefile
444 1287 10106 10106 196 total
```
When this option is present, the files argument is not processed. This option processes the file list from provided file, splitting them by the ascii NUL (\0) character. When files0-from is '-', the file list is processed from stdin.
Errors are now always shown with the corresponding filename.
Errors are no longer converted into warnings. Previously `wc < .`
would cause a loop.
Checking whether something is a directory is no longer done in
advance. This removes race conditions and the edge case where stdin is
a directory.
The custom error type is removed because io::Error is now enough.
Fix two issues with the string formatting width for counts displayed
by `wc`.
First, the output was previously not using the default minimum width
(seven characters) when reading from `stdin`. This commit corrects
this behavior to match GNU `wc`. For example,
$ cat alice_in_wonderland.txt | wc
5 57 302
Second, if at least 10^7 bytes were read from `stdin` *after* reading
from a smaller regular file, then every output row would have width
8. This disagrees with GNU `wc`, in which only the `stdin` row and the
total row would have width 8. This commit corrects this behavior to
match GNU `wc`. For example,
$ printf "%.0s0" {1..10000000} | wc emptyfile.txt -
0 0 0 emptyfile.txt
0 1 10000000
0 1 10000000 total
Fixes#2186.
Change the error messages that get printed to `stderr` for compatibility
with GNU `wc` when an input is a directory and when an input does not
exist.
Fixes#2211.
* cat: Unrevert splice patch
* cat: Add fifo test
* cat: Add tests for error cases
* cat: Add tests for character devices
* wc: Make sure we handle short splice writes
* cat: Fix tests for 1.40.0 compiler
* cat: Run rustfmt on test_cat.rs
* Run 'cargo +1.40.0 update'
* wc: Don't read() if we only need to count number of bytes
* Resolve a few code review comments
* Use write macros instead of print
* Fix wc tests in case only one thing is printed
* wc: Fix style
* wc: Use return value of first splice rather than second
* wc: Make main loop more readable
* wc: Don't unwrap on failed write to stdout
* wc: Increment error count when stats fail to print
* Re-add Cargo.lock