Commit graph

183 commits

Author SHA1 Message Date
Alexander Kunde
2bae3dd4f2 wc: move help strings to markdown file 2023-03-02 16:29:43 +01:00
Terts Diepraam
ae27c82020 Use workspace inheritance for dependencies 2023-02-11 18:54:46 +01:00
Terts Diepraam
357001dabc fix double dependency of memoffset by upgrading nix, libc and ctrlc 2023-02-09 14:02:40 +01:00
Daniel Hofstetter
f6b646e4e5 clippy: fix warnings introduced with Rust 1.67.0 2023-01-27 17:37:56 +01:00
Terts Diepraam
4d3dc78686 Version 0.0.17 2023-01-21 10:38:18 +01:00
Miles Liu
d505df5369
uu: use normal use declarations to import macros 2022-11-17 11:49:23 +08:00
Terts Diepraam
92c4b32eeb wc: update to clap 4 2022-10-13 17:50:43 +02:00
Terts Diepraam
f15c4f2d3e Version 0.0.16 2022-10-11 23:03:39 +02:00
Sylvestre Ledru
7257adb53b wc: document the long match 2022-10-03 00:57:48 -10:00
Daniel Hofstetter
9e8daf92dd Replace deprecated value_of() with get_one() 2022-09-26 16:42:42 +02:00
Terts Diepraam
975a1d170d change remaining usage codes of 2 to 1 for GNU compat 2022-09-10 20:24:24 +02:00
Daniel Hofstetter
747ed592d9 Replace allow_invalid_utf8() with value_parser() 2022-08-25 15:21:50 +02:00
Terts Diepraam
15180249fc Version 0.0.15 2022-08-20 13:13:22 +02:00
Daniel Hofstetter
62b1b7cfb2 Replace deprecated values_of_os() with get_many() 2022-08-20 08:19:11 +02:00
Niyaz Nigmatullin
9cd898b885 remove nix 0.24.2 dependency 2022-08-17 13:13:27 +03:00
Sylvestre Ledru
9f1219005d fix the significant_drop_in_scrutinee clippy warning 2022-08-10 21:37:48 +02:00
Daniel Hofstetter
7c3116330e Replace deprecated is_present() with contains_id() 2022-08-02 15:21:39 +02:00
Daniel Hofstetter
fc4544c42b bump clap from 3.1.18 to 3.2.15 2022-07-29 14:05:02 +02:00
Owen Anderson
d5f59f23fa Implement wc fast paths that skip Unicode decoding.
Byte, character, and line counting can all be done on the raw bytes
of the incoming stream without decoding the Unicode characters. This
fact was previously exploited in specific fast paths for counting
characters and counting lines. This change unifies those fast paths into
a single shared fast paths, using const generics to specialize the
function for each use case. This has the benefit of making sure that all
combinations of these Unicode-oblivious fast paths benefit from the same
optimization.

On my laptop, this speeds up `wc -clm odyssey1024.txt` from 840ms to
120ms. I experimented with using a filter loop for line counting, but
continuing to use the bytecount crate came out ahead by a significant
margin.
2022-07-23 10:45:26 -07:00
Owen Anderson
417ad0e384 Add rustdoc comment. 2022-07-20 23:32:50 -07:00
Owen Anderson
13762cae05 Implement a fast path for character counting in wc.
When wc is invoked with only the -m flag, we only need to count the
number of Unicode characters in the input. In order to do so, we don't
actually need to decode the input bytes into characters. Rather, we can
simply count the number of non-continuation bytes in the UTF-8 stream,
since every character will contain exactly one non-continuation byte.

On my laptop, this speeds up `wc -m odyssey1024.txt` from 745ms to
109ms.
2022-07-20 22:35:40 -07:00
Andrew Baptist
cc08e1cc3a Update to handle all the latest cargo warnings 2022-07-18 13:20:49 -04:00
Owen Anderson
735db78b3d
wc: specialize scanning loop on settings. (#3708)
* wc: specialize scanning loop on settings.

The primary computational loop in wc (iterating over all the
characters and computing word lengths, etc) is configured by a
number of boolean options that control the text-scanning behavior.
If we monomorphize the code loop for each possible combination of
scanning configurations, the rustc is able to generate better code
for each instantiation, at the least by removing the conditional
checks on each iteration, and possibly by allowing things like
vectorization.

On my computer (aarch64/macos), I am seeing at least a 5% performance
improvement in release builds on all wc flag configurations
(other than those that were already specialized) against
odyssey1024.txt, with wc -l showing the greatest improvement at 15%.

* Reduce the size of the wc dispatch table by half.

By extracting the handling of hand-written fast-paths to the
same dispatch as the automatic specializations, we can avoid
needing to pass `show_bytes` as a const generic to
`word_count_from_reader_specialized`. Eliminating this parameter
halves the number of arms in the dispatch.
2022-07-18 12:16:52 +02:00
dependabot[bot]
d15b95533e
build(deps): bump nix from 0.24.1 to 0.24.2
Bumps [nix](https://github.com/nix-rust/nix) from 0.24.1 to 0.24.2.
- [Release notes](https://github.com/nix-rust/nix/releases)
- [Changelog](https://github.com/nix-rust/nix/blob/v0.24.2/CHANGELOG.md)
- [Commits](https://github.com/nix-rust/nix/compare/v0.24.1...v0.24.2)

---
updated-dependencies:
- dependency-name: nix
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-18 06:41:18 +00:00
Gergely Kalas
4f043ff57f Fix 'wc' gnu test-suite compatibility #3678
This change will extract a utility already present in ls to uucore.
This utility is used by dir and vdir too, which are adjusted to
look it up in uucode. No further changes to ls, dir or dirv intended.

The change here largely fiddles with the output of uu_wc to match
that of GNU wc. This is the case to the extent to make unit tests
pass, however, there are differences remaining. One specific
difference I did not tackle is that GNU wc will not align the
output columns (compute_number_width() -> 1) in the specific case
of the input for --files0-from=- being a named pipe, not real stdin.
This difference can be triggered using the following two invocations.
  - wc --files0-from=- < files0 # use a named pipe, GNU does align
  - cat files0- | wc --files0-from=- # use real stdin, GNU does not
    align.
2022-07-01 16:43:09 +02:00
dependabot[bot]
82e81da967 build(deps): bump bytecount from 0.6.2 to 0.6.3
Bumps [bytecount](https://github.com/llogiq/bytecount) from 0.6.2 to 0.6.3.
- [Release notes](https://github.com/llogiq/bytecount/releases)
- [Commits](https://github.com/llogiq/bytecount/commits)

---
updated-dependencies:
- dependency-name: bytecount
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-06 10:07:55 +02:00
Terts Diepraam
eae07adfb1
Version 0.0.14 (#3553)
Version 0.0.14
2022-05-22 19:57:19 +02:00
Terts Diepraam
0acfa07d77 all: add value hints 2022-05-13 16:15:50 +02:00
Ryan Zoeller
363a2a5611 Upgrade nix to 0.24.1, ctrlc to 3.2.2
Limit nix features, which should help compile times slightly.

Replace usage of deprecated nix functionality with std equivalent.
2022-04-25 08:13:20 +02:00
Justin Tracey
1f025c19af address libc weirdness on 32 bit android 2022-04-20 08:44:49 +02:00
Justin Tracey
2a0d58d060 get android builds to compile and pass tests 2022-04-20 08:44:49 +02:00
Terts Diepraam
af9f718936 Change edition to 2021 2022-04-05 10:39:31 +02:00
Terts Diepraam
b7809bd889 version 0.0.13 2022-04-02 11:04:27 +02:00
Ackerley Tng
e9131e2b7f wc: compute number widths using total file sizes
Previously, individual file sizes were used to compute the number width, which
would cause misalignment when the total has a greater number of digits, and is
different from the behavior of GNU wc

```
$ ./target/debug/wc -w -l -m -c -L deny.toml GNUmakefile
  95  422 3110 3110   85 deny.toml
 349  865 6996 6996  196 GNUmakefile
 444 1287 10106 10106  196 total
$ wc -w -l -m -c -L deny.toml GNUmakefile
   95   422  3110  3110    85 deny.toml
  349   865  6996  6996   196 GNUmakefile
  444  1287 10106 10106   196 total
```
2022-03-28 18:56:34 +02:00
Terts Diepraam
20212be4c8 fix clippy errors related to clap upgrade from 3.0.10 to 3.1.6 2022-03-17 22:46:56 +01:00
dependabot[bot]
59440d35c0
build(deps): bump clap from 3.0.10 to 3.1.6
Bumps [clap](https://github.com/clap-rs/clap) from 3.0.10 to 3.1.6.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.0.10...v3.1.6)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-17 13:06:29 +00:00
Sylvestre Ledru
54a10e955a Update of the cargo.lock url to point to the right branch 2022-03-06 22:13:17 +01:00
Davide Cavalca
19af43222b Include license text in all published crates 2022-03-05 21:21:46 +01:00
Terts Diepraam
53070141c1
all: add format_usage function (#3139)
This should correct the usage strings in both the `--help` and user documentation. Previously, sometimes the name of the utils did not show up correctly.
2022-02-21 17:14:03 +01:00
Allan Silva
e6c94c1cd7 wc: Fix clippy error 2022-02-07 10:20:52 -03:00
Allan Silva
6a6875012e wc: implement files0-from option
When this option is present, the files argument is not processed. This option processes the file list from provided file, splitting them by the ascii NUL (\0) character. When files0-from is '-', the file list is processed from stdin.
2022-02-04 10:12:08 -03:00
Daniel Eades
41e2197188 squash some repeated match blocks 2022-01-30 18:32:09 +01:00
Daniel Eades
ba45fe312a use 'Self' and derive 'Default' where possible 2022-01-30 15:08:26 +01:00
Daniel Eades
784f2e2ea1 use semicolons if nothing returned 2022-01-30 15:08:26 +01:00
Daniel Eades
a2d5f06be4 remove needless pass by value 2022-01-30 15:08:26 +01:00
Terts Diepraam
eb82015b23 all: change macros
- Change the main! proc_macro to a bin! macro_rules macro.
- Reexport uucore_procs from uucore
- Make utils to not import uucore_procs directly
- Remove the `syn` dependency and don't parse proc_macro input (hopefully for faster compile times)
2022-01-29 15:26:32 +01:00
Terts Diepraam
9c8e865b55 all: enable infer long arguments in clap 2022-01-29 02:06:29 +01:00
Sylvestre Ledru
fed5ca4ba9
Merge pull request #2935 from tertsdiepraam/wc-unusual-files
wc: fix counting files from pseudo-filesystem
2022-01-29 01:09:02 +01:00
Sylvestre Ledru
7f79fef2cd fix various doc warnings 2022-01-29 00:09:09 +01:00
Terts Diepraam
dd311b294b wc: fix counting files from pseudo-filesystem 2022-01-28 19:08:44 +01:00
Terts Diepraam
55a47f6fc0
Merge pull request #2863 from tertsdiepraam/clap-3
Clap 3
2022-01-20 23:14:52 +01:00
Roy Ivy III
2e251f91f1 0.0.12 2022-01-19 05:35:00 -06:00
Terts Diepraam
8872485922 Merge branch 'main' into clap-3 2022-01-17 13:25:51 +01:00
Sylvestre Ledru
1fbda8003c coreutils 0.0.8 => 0.0.9, uucore_procs 0.0.7 => 0.0.8, uucore 0.0.10 => 0.0.11 2022-01-16 17:05:48 +01:00
Terts Diepraam
e9e5768591 wc: clap 3 2022-01-11 19:16:48 +01:00
Roy Ivy III
774e72551b change ~ relax 'nix' version and remove 'nix' patch
- code coverage compilation on MacOS latest (MacOS-11+) now works with newer 'nix' versions
2022-01-09 18:57:25 -06:00
Jeffrey Finkelstein
9caf15c44f fixup! wc: return UResult from uumain() function 2022-01-02 19:40:22 -05:00
Jeffrey Finkelstein
e060ac53f2 wc: return UResult from uumain() function 2022-01-02 11:15:30 -05:00
Roy Ivy III
03e0cbb020 update 'nix' within workspace to force patched version 2021-11-19 17:55:03 -06:00
Roy Ivy III
f20aa49821 maint/CICD ~ (GHA) fix cargo-udeps false positives (add 'ignore' exceptions to sub-crates) 2021-11-19 17:55:02 -06:00
Sylvestre Ledru
f28da04fc5
Merge pull request #2722 from sylvestre/version
Prepare version 0.0.8
2021-10-23 20:56:27 +02:00
Sylvestre Ledru
59e9870c56 Prepare version 0.0.8 2021-10-23 19:21:50 +02:00
Sylvestre Ledru
4c8b74797f Silent question_mark clippy warnings 2021-10-23 19:19:36 +02:00
Sylvestre Ledru
7eaae75bfc add a github action job to identify unused deps 2021-09-15 12:06:50 +02:00
Jeffrey Finkelstein
7ec6452265 wc: remove mutable min_width parameter
Remove mutability from the `min_width` parameter to the `print_stats()`
function in `wc`.
2021-09-12 12:25:52 -04:00
Jan Verbeek
c1079e0b1c Move common pipe and splice functions into uucore
This cuts down on repetitive unsafe code and repetitive code in
general.
2021-09-10 21:24:34 +02:00
Jan Verbeek
ffe63945b7 wc: Update path display 2021-08-31 22:07:24 +02:00
Jan Verbeek
94e33c97f3 wc: Add benchmarking documentation 2021-08-26 01:38:16 +02:00
Jan Verbeek
1358aeecdd wc: Avoid unnecessary work in general loop
This gives a big speedup if e.g. only characters are being counted.
2021-08-26 01:38:16 +02:00
Jan Verbeek
0f1e79fe29 wc: Fix linters 2021-08-26 01:38:16 +02:00
Jan Verbeek
9972cd1327 wc: Report counts and failures correctly
If reading fails midway through then a count should be reported for
what could be read.

If opening a file fails then no count should be reported.

The exit code shouldn't report the number of failures, that's fragile
in case of many failures.
2021-08-26 01:38:16 +02:00
Jan Verbeek
c16e492cd0 wc: Make output width more consistent with GNU 2021-08-26 01:38:16 +02:00
Jan Verbeek
d0c0564947 wc: Swap order of characters and bytes in output
This way it matches GNU and busybox.
2021-08-26 01:38:16 +02:00
Jan Verbeek
657a04f706 wc: Stricter simpler error handling
Errors are now always shown with the corresponding filename.

Errors are no longer converted into warnings. Previously `wc < .`
would cause a loop.

Checking whether something is a directory is no longer done in
advance. This removes race conditions and the edge case where stdin is
a directory.

The custom error type is removed because io::Error is now enough.
2021-08-26 01:38:16 +02:00
Jan Verbeek
35793fc260 wc: Accept badly-encoded filenames 2021-08-26 01:38:16 +02:00
Jan Verbeek
6f7d740592 wc: Do a chunked read with proper UTF-8 handling
This brings the results mostly in line with GNU wc and solves nasty
behavior with long lines.
2021-08-26 01:38:16 +02:00
Jan Verbeek
48437fc49d wc: Optimize, improve correctness
- Reuse allocations for read lines
- Increase splice size
- Check if /dev/null was opened correctly
- Do not discard read bytes after I/O error
- Add fast line counting with bytecount
2021-08-26 01:38:16 +02:00
Jan Verbeek
516c5311f7
Close file descriptors of pipes after use (#2591)
* Close file descriptors of pipes after use

* cat: Test file descriptor exhaustion

* fixup! cat: Test file descriptor exhaustion
2021-08-24 12:00:07 +02:00
Michael Debertol
1eb7193ee8 wc: fix clippy lint 2021-08-20 00:00:48 +02:00
Michael Debertol
252220e9eb refactor/uucore ~ make util_name and execution_phrase functions
Since util_name and execution_phrase no longer rely on features that are
only available to macros, they may as well be plain functions.
2021-08-14 17:55:18 +02:00
Roy Ivy III
c0854000d1 refactor ~ use execution_phrase!() for usage messaging 2021-08-14 14:01:33 +02:00
Roy Ivy III
23b68d80ba refactor ~ usage() instead of get_usage() 2021-08-14 13:58:43 +02:00
Roy Ivy III
c5792c2a0f refactor ~ use util_name!() as clap::app::App name argument for all utils 2021-08-14 13:53:13 +02:00
sagu
9702aa6414
Revert "silent buggy clippy warning" 2021-07-25 18:06:41 +02:00
Sylvestre Ledru
26a882551b update the dep to uucore_procs 0.0.6 2021-07-11 21:04:11 +02:00
Sylvestre Ledru
1d8a66b7d3 Update to version 0.0.7 2021-07-11 18:04:56 +02:00
Sylvestre Ledru
f2e12fee0a Silent buggy clippy warnings
Fails with:
```
error: use of irregular braces for `write!` macro
  --> src/uucore/src/lib/features/encoding.rs:19:17
   |
19 | #[derive(Debug, Error)]
   |                 ^^^^^
   |
   = note: `-D clippy::nonstandard-macro-braces` implied by `-D warnings`
help: consider writing `Error`
  --> src/uucore/src/lib/features/encoding.rs:19:17
   |
19 | #[derive(Debug, Error)]
   |                 ^^^^^
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#nonstandard_macro_braces
   = note: this error originates in the derive macro `Error` (in Nightly builds, run with -Z macro-backtrace for more info)
```
2021-07-04 19:06:37 +02:00
Michael Debertol
2ebca384c6 all utils: enable wrap_help
This makes clap wrap the help text according to the terminal width,
which improves readability for terminal widths < 120 chars,
because clap defaults to a width of 120 chars without this feature.
2021-06-27 16:17:10 +02:00
Michael Debertol
0531153fa6 uutils: move clap::App creation to separate functions 2021-06-25 21:23:45 +02:00
Roy Ivy III
8f0d42da39 refactor/wc ~ fix cargo clippy complaint (clippy::needless_borrow) 2021-06-06 19:28:24 -05:00
Sylvestre Ledru
d8c06dd6bb use clap::crate_version macro instead of the env variable 2021-06-02 19:00:19 +02:00
Roy Ivy III
dff33a0edb refactor/wc ~ polish spelling (comments, names, and exceptions) 2021-05-31 08:23:57 -05:00
Roy Ivy III
9c0c8eb59f change ~ remove 'main.rs' spell-checker exceptions 2021-05-31 08:11:31 -05:00
Jeffrey Finkelstein
4521aa2659 wc: print counts for each file as soon as computed
Change the behavior of `wc` to print the counts for a file as soon as
it is computed, instead of waiting to compute the counts for all files
before writing any output to `stdout`. The new behavior matches the
behavior of GNU `wc`.

The old behavior looked like this (the word "hello" is entered on
`stdin`):

    $ wc emptyfile.txt -
    hello
	  0       0       0 emptyfile.txt
	  1       1       6
	  1       1       6 total

The new behavior looks like this:

    $ wc emptyfile.txt -
	  0       0       0 emptyfile.txt
    hello
	  1       1       6
	  1       1       6 total
2021-05-22 14:27:37 -04:00
Jeffrey Finkelstein
97a49c7c95 wc: compute min width to format counts up front
Fix two issues with the string formatting width for counts displayed
by `wc`.

First, the output was previously not using the default minimum width
(seven characters) when reading from `stdin`. This commit corrects
this behavior to match GNU `wc`. For example,

    $ cat alice_in_wonderland.txt | wc
          5      57     302

Second, if at least 10^7 bytes were read from `stdin` *after* reading
from a smaller regular file, then every output row would have width
8. This disagrees with GNU `wc`, in which only the `stdin` row and the
total row would have width 8. This commit corrects this behavior to
match GNU `wc`. For example,

    $ printf "%.0s0" {1..10000000} | wc emptyfile.txt -
	  0       0       0 emptyfile.txt
	  0       1 10000000
	  0       1 10000000 total

Fixes #2186.
2021-05-15 21:41:47 -04:00
Jeffrey Finkelstein
e8d911d9d5 wc: correct some error messages for invalid inputs
Change the error messages that get printed to `stderr` for compatibility
with GNU `wc` when an input is a directory and when an input does not
exist.

Fixes #2211.
2021-05-15 10:35:21 -04:00
Nicolas Thery
112b042769 wc: emit '-' in ouput when set on command-line
When stdin is explicitly specified on the command-line with '-', emit it
in the output stats to match GNU wc output.

Fixes #2188.
2021-05-09 15:47:05 +02:00
Jeffrey Finkelstein
ba8f4ea670 wc: move counting code into WordCount::from_line()
Refactor the counting code from the inner loop of the `wc` program
into the `WordCount::from_line()` associated function. This commit
also splits that function up into other helper functions that
encapsulate decoding characters and finding word boundaries from raw
bytes.

This commit also implements the `Sum` trait for the `WordCount`
struct, so that we can simply call `sum()` on an iterator that yields
`WordCount` instances.
2021-05-08 14:24:07 +02:00
Jeffrey Finkelstein
50f4941d49 wc: refactor WordCount into its own module
Move the `WordCount` struct and its implementations into the
`wordcount.rs`.
2021-05-08 14:24:07 +02:00
Jeffrey Finkelstein
ee43655bdb fixup! wc: rm leading space when printing multiple counts 2021-05-08 13:11:09 +02:00