Commit graph

28 commits

Author SHA1 Message Date
Owen Anderson
0e7f1c4c0d Add missing testcases for wc. 2022-07-24 21:56:07 -07:00
Owen Anderson
735db78b3d
wc: specialize scanning loop on settings. (#3708)
* wc: specialize scanning loop on settings.

The primary computational loop in wc (iterating over all the
characters and computing word lengths, etc) is configured by a
number of boolean options that control the text-scanning behavior.
If we monomorphize the code loop for each possible combination of
scanning configurations, the rustc is able to generate better code
for each instantiation, at the least by removing the conditional
checks on each iteration, and possibly by allowing things like
vectorization.

On my computer (aarch64/macos), I am seeing at least a 5% performance
improvement in release builds on all wc flag configurations
(other than those that were already specialized) against
odyssey1024.txt, with wc -l showing the greatest improvement at 15%.

* Reduce the size of the wc dispatch table by half.

By extracting the handling of hand-written fast-paths to the
same dispatch as the automatic specializations, we can avoid
needing to pass `show_bytes` as a const generic to
`word_count_from_reader_specialized`. Eliminating this parameter
halves the number of arms in the dispatch.
2022-07-18 12:16:52 +02:00
Gergely Kalas
bd96f25ec8 Add new test to verify GNU compatible quotation.
Co-authored-by: anastygnome <noreplygitemail@protonmail.com>
2022-07-01 16:43:09 +02:00
Justin Tracey
2a0d58d060 get android builds to compile and pass tests 2022-04-20 08:44:49 +02:00
Terts Diepraam
18369dc0be all: use array intoiterator 2022-04-05 10:39:31 +02:00
Ackerley Tng
e9131e2b7f wc: compute number widths using total file sizes
Previously, individual file sizes were used to compute the number width, which
would cause misalignment when the total has a greater number of digits, and is
different from the behavior of GNU wc

```
$ ./target/debug/wc -w -l -m -c -L deny.toml GNUmakefile
  95  422 3110 3110   85 deny.toml
 349  865 6996 6996  196 GNUmakefile
 444 1287 10106 10106  196 total
$ wc -w -l -m -c -L deny.toml GNUmakefile
   95   422  3110  3110    85 deny.toml
  349   865  6996  6996   196 GNUmakefile
  444  1287 10106 10106   196 total
```
2022-03-28 18:56:34 +02:00
Allan Silva
6a6875012e wc: implement files0-from option
When this option is present, the files argument is not processed. This option processes the file list from provided file, splitting them by the ascii NUL (\0) character. When files0-from is '-', the file list is processed from stdin.
2022-02-04 10:12:08 -03:00
Terts Diepraam
dd311b294b wc: fix counting files from pseudo-filesystem 2022-01-28 19:08:44 +01:00
Jan Verbeek
f50b004860 wc: Adjust expected report for directories for Windows 2021-08-26 01:38:16 +02:00
Jan Verbeek
4943517327 wc: Adjust expected error messages for Windows 2021-08-26 01:38:16 +02:00
Jan Verbeek
0f1e79fe29 wc: Fix linters 2021-08-26 01:38:16 +02:00
Jan Verbeek
c16e492cd0 wc: Make output width more consistent with GNU 2021-08-26 01:38:16 +02:00
Jan Verbeek
d0c0564947 wc: Swap order of characters and bytes in output
This way it matches GNU and busybox.
2021-08-26 01:38:16 +02:00
Jan Verbeek
657a04f706 wc: Stricter simpler error handling
Errors are now always shown with the corresponding filename.

Errors are no longer converted into warnings. Previously `wc < .`
would cause a loop.

Checking whether something is a directory is no longer done in
advance. This removes race conditions and the edge case where stdin is
a directory.

The custom error type is removed because io::Error is now enough.
2021-08-26 01:38:16 +02:00
Jan Verbeek
6f7d740592 wc: Do a chunked read with proper UTF-8 handling
This brings the results mostly in line with GNU wc and solves nasty
behavior with long lines.
2021-08-26 01:38:16 +02:00
Roy Ivy III
4e20dedf58 tests ~ refactor/polish spelling (comments, names, and exceptions) 2021-05-31 08:23:57 -05:00
Jeffrey Finkelstein
97a49c7c95 wc: compute min width to format counts up front
Fix two issues with the string formatting width for counts displayed
by `wc`.

First, the output was previously not using the default minimum width
(seven characters) when reading from `stdin`. This commit corrects
this behavior to match GNU `wc`. For example,

    $ cat alice_in_wonderland.txt | wc
          5      57     302

Second, if at least 10^7 bytes were read from `stdin` *after* reading
from a smaller regular file, then every output row would have width
8. This disagrees with GNU `wc`, in which only the `stdin` row and the
total row would have width 8. This commit corrects this behavior to
match GNU `wc`. For example,

    $ printf "%.0s0" {1..10000000} | wc emptyfile.txt -
	  0       0       0 emptyfile.txt
	  0       1 10000000
	  0       1 10000000 total

Fixes #2186.
2021-05-15 21:41:47 -04:00
Jeffrey Finkelstein
e8d911d9d5 wc: correct some error messages for invalid inputs
Change the error messages that get printed to `stderr` for compatibility
with GNU `wc` when an input is a directory and when an input does not
exist.

Fixes #2211.
2021-05-15 10:35:21 -04:00
Nicolas Thery
112b042769 wc: emit '-' in ouput when set on command-line
When stdin is explicitly specified on the command-line with '-', emit it
in the output stats to match GNU wc output.

Fixes #2188.
2021-05-09 15:47:05 +02:00
Jeffrey Finkelstein
525f71bada wc: rm leading space when printing multiple counts
Remove the leading space from the output of `wc` when printing two or
more types of counts.

Fixes #2173.
2021-05-08 13:11:09 +02:00
Jeffrey Finkelstein
0cafe2b70d wc: add tests for edge cases for wc on files 2021-05-03 21:07:32 -04:00
Árni Dagur
eb4971e6f4
cat: Unrevert splice patch (#2020)
* cat: Unrevert splice patch

* cat: Add fifo test

* cat: Add tests for error cases

* cat: Add tests for character devices

* wc: Make sure we handle short splice writes

* cat: Fix tests for 1.40.0 compiler

* cat: Run rustfmt on test_cat.rs

* Run 'cargo +1.40.0 update'
2021-04-10 22:19:53 +02:00
Mikadore
a57f17ac5b
Expand CmdResult's API (#1977) 2021-03-31 02:25:23 -07:00
Árni Dagur
698dab12a6
wc: Don't read() if we only need to count number of bytes (Version 2) (#1851)
* wc: Don't read() if we only need to count number of bytes

* Resolve a few code review comments

* Use write macros instead of print

* Fix wc tests in case only one thing is printed

* wc: Fix style

* wc: Use return value of first splice rather than second

* wc: Make main loop more readable

* wc: Don't unwrap on failed write to stdout

* wc: Increment error count when stats fail to print

* Re-add Cargo.lock
2021-03-30 20:53:02 +02:00
Sylvestre Ledru
1d271991af Rustfmt new tests 2021-03-18 10:24:30 +01:00
Chad Brewbaker
05d8cc59c4
bug(wc): Add a test for unexpected behavior (#1723) 2021-02-16 13:36:49 +01:00
Chad Brewbaker
6c2bca110d
Fixed wc -L no end of line LF bug (#1714) 2021-02-08 21:54:48 +01:00
Roy Ivy III
de0375f909 tests ~ reorganize tests 2020-06-01 18:30:04 -05:00
Renamed from tests/test_wc.rs (Browse further)