Commit graph

160 commits

Author SHA1 Message Date
Sylvestre Ledru
9844f1f07d 0.0.20 => 0.0.21 2023-09-03 14:42:54 +02:00
Terts Diepraam
c3f9e19a3b all: normalize license notice in all *.rs files 2023-08-24 12:21:09 +02:00
Daniel Hofstetter
774180bb09 Remove the author copyright notices
from files missed by #5184
2023-08-23 10:54:00 +02:00
Sylvestre Ledru
0b9c829bce
Merge pull request #5182 from sylvestre/clippy_
Fix some of the recent clippy warnings
2023-08-21 16:21:02 -04:00
Sylvestre Ledru
bfca6bf70f Add license headers on all files 2023-08-21 10:49:27 +02:00
Sylvestre Ledru
7c9f4ba92a Fix some clippy warnings 2023-08-21 08:41:40 +02:00
Sylvestre Ledru
74530c0f51 Update the version to 0.0.20 2023-07-14 13:04:17 +02:00
Sylvestre Ledru
6ecef3a0e3 Reformat TOML files with taplo
npx --yes @taplo/cli fmt
2023-06-08 09:07:19 +02:00
Sylvestre Ledru
830b7d5ce1 New release 2023-06-04 09:46:59 +02:00
Jed Denlea
e5b46ea3eb wc: more tests and fixes
My previous commits meant to bring our wc's output and behavior in line
with GNU's. There should be tests that check for these changes!

I found a stupid bug in my own changes, I was not adding 1 to the
indexes produced by .enumerate() when printing errors.
2023-05-22 01:02:35 -07:00
Jed Denlea
c4b53a44b5 wc: make --files0-from work with streams 2023-05-21 23:59:32 -07:00
Jed Denlea
d58ee5a28a wc: skip String to measure number length
Sadly ilog10 isn't available until we use 1.67, but we can get close in
the meantime.
2023-05-20 23:29:48 -07:00
Jed Denlea
36b45e2249 wc: touch up errors, quote more like GNU
WcError should not hold Strings which are clones of `&'static str`s.
thiserror provides a convenient API to declare errors with their Display
messages.

wc quotes the --files0-from source slightly differently when reporting
errors about empty file names.
2023-05-20 23:29:48 -07:00
Jed Denlea
38b4825e7f wc: avoid excess String allocations
print_stats will now take advantage of the buffer built into
io::stdout().

We can also waste fewer lines on show! by making a helper macro.
2023-05-20 23:08:26 -07:00
Jed Denlea
c3b06e10a6 wc: clean up of settings
The Settings object did not need a QuotingStyle member, it was basically
a static.
2023-05-20 22:30:14 -07:00
Sylvestre Ledru
f8d7bebed3 ignore some cognitive_complexity for now 2023-05-06 14:50:55 +02:00
Daniel Hofstetter
98fb8fa144 test,wc: use vars directly in format! strings 2023-04-21 08:33:56 +02:00
publicmatt
084510e499
Copy the UTF8 crate in the tree and remove utf8 dependency. (#4460) 2023-04-14 21:31:11 +02:00
Sylvestre Ledru
3247e1b5e1
Merge pull request #4639 from sylvestre/version
0.0.17 => 0.0.18
2023-04-02 11:31:24 +02:00
Daniel Hofstetter
0fa08757fa wc: implement --total 2023-03-31 08:16:46 +02:00
Sylvestre Ledru
af0a263191 0.0.17 => 0.0.18 2023-03-29 08:11:25 +02:00
Sylvestre Ledru
fe2517041f md: ignore a warning 2023-03-04 18:44:17 +01:00
Sylvestre Ledru
422a27d375 parent 9d5dc500e6
author Sylvestre Ledru <sylvestre@debian.org> 1677865358 +0100
committer Sylvestre Ledru <sylvestre@debian.org> 1677951797 +0100

md: Fix a bunch of warnings in the docs
2023-03-04 18:43:40 +01:00
Sylvestre Ledru
2085d4d4ab
remove the sh in the syntax 2023-03-03 13:52:47 +01:00
Alexander Kunde
aead80efdb cargo fmt 2023-03-02 18:55:11 +01:00
Alexander Kunde
4bae3e0cd9 remove redundant line 2023-03-02 18:14:10 +01:00
Alexander Kunde
4276111222 change ABOUT from static to const 2023-03-02 18:10:53 +01:00
Alexander Kunde
2bae3dd4f2 wc: move help strings to markdown file 2023-03-02 16:29:43 +01:00
Terts Diepraam
ae27c82020 Use workspace inheritance for dependencies 2023-02-11 18:54:46 +01:00
Terts Diepraam
357001dabc fix double dependency of memoffset by upgrading nix, libc and ctrlc 2023-02-09 14:02:40 +01:00
Daniel Hofstetter
f6b646e4e5 clippy: fix warnings introduced with Rust 1.67.0 2023-01-27 17:37:56 +01:00
Terts Diepraam
4d3dc78686 Version 0.0.17 2023-01-21 10:38:18 +01:00
Miles Liu
d505df5369
uu: use normal use declarations to import macros 2022-11-17 11:49:23 +08:00
Terts Diepraam
92c4b32eeb wc: update to clap 4 2022-10-13 17:50:43 +02:00
Terts Diepraam
f15c4f2d3e Version 0.0.16 2022-10-11 23:03:39 +02:00
Sylvestre Ledru
7257adb53b wc: document the long match 2022-10-03 00:57:48 -10:00
Daniel Hofstetter
9e8daf92dd Replace deprecated value_of() with get_one() 2022-09-26 16:42:42 +02:00
Terts Diepraam
975a1d170d change remaining usage codes of 2 to 1 for GNU compat 2022-09-10 20:24:24 +02:00
Daniel Hofstetter
747ed592d9 Replace allow_invalid_utf8() with value_parser() 2022-08-25 15:21:50 +02:00
Terts Diepraam
15180249fc Version 0.0.15 2022-08-20 13:13:22 +02:00
Daniel Hofstetter
62b1b7cfb2 Replace deprecated values_of_os() with get_many() 2022-08-20 08:19:11 +02:00
Niyaz Nigmatullin
9cd898b885 remove nix 0.24.2 dependency 2022-08-17 13:13:27 +03:00
Sylvestre Ledru
9f1219005d fix the significant_drop_in_scrutinee clippy warning 2022-08-10 21:37:48 +02:00
Daniel Hofstetter
7c3116330e Replace deprecated is_present() with contains_id() 2022-08-02 15:21:39 +02:00
Daniel Hofstetter
fc4544c42b bump clap from 3.1.18 to 3.2.15 2022-07-29 14:05:02 +02:00
Owen Anderson
d5f59f23fa Implement wc fast paths that skip Unicode decoding.
Byte, character, and line counting can all be done on the raw bytes
of the incoming stream without decoding the Unicode characters. This
fact was previously exploited in specific fast paths for counting
characters and counting lines. This change unifies those fast paths into
a single shared fast paths, using const generics to specialize the
function for each use case. This has the benefit of making sure that all
combinations of these Unicode-oblivious fast paths benefit from the same
optimization.

On my laptop, this speeds up `wc -clm odyssey1024.txt` from 840ms to
120ms. I experimented with using a filter loop for line counting, but
continuing to use the bytecount crate came out ahead by a significant
margin.
2022-07-23 10:45:26 -07:00
Owen Anderson
417ad0e384 Add rustdoc comment. 2022-07-20 23:32:50 -07:00
Owen Anderson
13762cae05 Implement a fast path for character counting in wc.
When wc is invoked with only the -m flag, we only need to count the
number of Unicode characters in the input. In order to do so, we don't
actually need to decode the input bytes into characters. Rather, we can
simply count the number of non-continuation bytes in the UTF-8 stream,
since every character will contain exactly one non-continuation byte.

On my laptop, this speeds up `wc -m odyssey1024.txt` from 745ms to
109ms.
2022-07-20 22:35:40 -07:00
Andrew Baptist
cc08e1cc3a Update to handle all the latest cargo warnings 2022-07-18 13:20:49 -04:00
Owen Anderson
735db78b3d
wc: specialize scanning loop on settings. (#3708)
* wc: specialize scanning loop on settings.

The primary computational loop in wc (iterating over all the
characters and computing word lengths, etc) is configured by a
number of boolean options that control the text-scanning behavior.
If we monomorphize the code loop for each possible combination of
scanning configurations, the rustc is able to generate better code
for each instantiation, at the least by removing the conditional
checks on each iteration, and possibly by allowing things like
vectorization.

On my computer (aarch64/macos), I am seeing at least a 5% performance
improvement in release builds on all wc flag configurations
(other than those that were already specialized) against
odyssey1024.txt, with wc -l showing the greatest improvement at 15%.

* Reduce the size of the wc dispatch table by half.

By extracting the handling of hand-written fast-paths to the
same dispatch as the automatic specializations, we can avoid
needing to pass `show_bytes` as a const generic to
`word_count_from_reader_specialized`. Eliminating this parameter
halves the number of arms in the dispatch.
2022-07-18 12:16:52 +02:00