Commit graph

4745 commits

Author SHA1 Message Date
Sylvestre Ledru
2fa4d6a2bb
Merge pull request #3740 from resistor/main
Implement wc fast paths that skip Unicode decoding.
2022-07-25 21:15:13 +02:00
Daniel Hofstetter
34b4853890 numfmt: don't round floats if --from is "none" 2022-07-24 16:25:52 +02:00
Sylvestre Ledru
62305e67d1
Merge pull request #3719 from andrewbaptist/main
split: Don't overwrite files
2022-07-23 22:54:57 +02:00
Owen Anderson
d5f59f23fa Implement wc fast paths that skip Unicode decoding.
Byte, character, and line counting can all be done on the raw bytes
of the incoming stream without decoding the Unicode characters. This
fact was previously exploited in specific fast paths for counting
characters and counting lines. This change unifies those fast paths into
a single shared fast paths, using const generics to specialize the
function for each use case. This has the benefit of making sure that all
combinations of these Unicode-oblivious fast paths benefit from the same
optimization.

On my laptop, this speeds up `wc -clm odyssey1024.txt` from 840ms to
120ms. I experimented with using a filter loop for line counting, but
continuing to use the bytecount crate came out ahead by a significant
margin.
2022-07-23 10:45:26 -07:00
Sylvestre Ledru
ec9130a4d7
Merge pull request #3735 from resistor/main
Implement a fast path for character counting in wc.
2022-07-22 13:29:47 +02:00
Sylvestre Ledru
f82ada645e
Merge pull request #3731 from sylvestre/fs-doc
document some common fs functions
2022-07-22 13:29:15 +02:00
Niyaz Nigmatullin
5f3f1112d1
Basename arguments simple format (#3736)
* basename: support simple format

* tests/basename: add tests for simple format

* basename: follow clippy advice
2022-07-22 13:28:54 +02:00
John Eckersberg
282b368b28 nice: Move call to Errno::clear() outside of unsafe block
Minor nitpick (of my own previous patch!), Errno::clear() is a safe
function, it should not be inside of the unsafe block.
2022-07-21 15:57:21 -04:00
Andrew Baptist
f2cfc15a70 split: Don't overwrite files
Check that a file exists by calling create_new and changing the
interface of instantiate_current_writer to return a Result rather
than calling unwrap.
2022-07-21 12:06:13 -04:00
Owen Anderson
417ad0e384 Add rustdoc comment. 2022-07-20 23:32:50 -07:00
Owen Anderson
13762cae05 Implement a fast path for character counting in wc.
When wc is invoked with only the -m flag, we only need to count the
number of Unicode characters in the input. In order to do so, we don't
actually need to decode the input bytes into characters. Rather, we can
simply count the number of non-continuation bytes in the UTF-8 stream,
since every character will contain exactly one non-continuation byte.

On my laptop, this speeds up `wc -m odyssey1024.txt` from 745ms to
109ms.
2022-07-20 22:35:40 -07:00
Sylvestre Ledru
ba24565b60
Merge pull request #3732 from cakebaker/numfmt_auto_suf_si_i
numfmt: show "invalid suffix" error for "i" suffix
2022-07-20 22:51:14 +02:00
Sylvestre Ledru
ba44fd0e2b document some common fs functions 2022-07-20 17:51:53 +02:00
Daniel Hofstetter
74bd9a26d6 numfmt: show "invalid suffix" error for "i" suffix 2022-07-20 17:51:36 +02:00
Niyaz Nigmatullin
9f2a9fa6ff uucore/fs: make function more generic 2022-07-19 17:34:52 +03:00
Niyaz Nigmatullin
b76c53c090 ln: fix windows non-compiling code 2022-07-19 17:34:52 +03:00
Niyaz Nigmatullin
80ff3b3b40 ln: change error messages, extract common code 2022-07-19 17:34:52 +03:00
Andrew Baptist
cc08e1cc3a Update to handle all the latest cargo warnings 2022-07-18 13:20:49 -04:00
Owen Anderson
735db78b3d
wc: specialize scanning loop on settings. (#3708)
* wc: specialize scanning loop on settings.

The primary computational loop in wc (iterating over all the
characters and computing word lengths, etc) is configured by a
number of boolean options that control the text-scanning behavior.
If we monomorphize the code loop for each possible combination of
scanning configurations, the rustc is able to generate better code
for each instantiation, at the least by removing the conditional
checks on each iteration, and possibly by allowing things like
vectorization.

On my computer (aarch64/macos), I am seeing at least a 5% performance
improvement in release builds on all wc flag configurations
(other than those that were already specialized) against
odyssey1024.txt, with wc -l showing the greatest improvement at 15%.

* Reduce the size of the wc dispatch table by half.

By extracting the handling of hand-written fast-paths to the
same dispatch as the automatic specializations, we can avoid
needing to pass `show_bytes` as a const generic to
`word_count_from_reader_specialized`. Eliminating this parameter
halves the number of arms in the dispatch.
2022-07-18 12:16:52 +02:00
dependabot[bot]
d15b95533e
build(deps): bump nix from 0.24.1 to 0.24.2
Bumps [nix](https://github.com/nix-rust/nix) from 0.24.1 to 0.24.2.
- [Release notes](https://github.com/nix-rust/nix/releases)
- [Changelog](https://github.com/nix-rust/nix/blob/v0.24.2/CHANGELOG.md)
- [Commits](https://github.com/nix-rust/nix/compare/v0.24.1...v0.24.2)

---
updated-dependencies:
- dependency-name: nix
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-18 06:41:18 +00:00
Niyaz Nigmatullin
4db08273b3 ln: error on --force when src=dst and dst is regular file 2022-07-18 08:17:37 +03:00
Niyaz Nigmatullin
0ea3a735ca
readlink: symlink loop handling (#3717)
readlink: fix symlink loop handling
2022-07-14 22:32:55 +02:00
Sylvestre Ledru
882cd527ff
Merge pull request #3704 from Sciencentistguy/once_cell
Replace lazy_static with once_cell
2022-07-13 16:15:47 +02:00
Daniel Hofstetter
9e44acf307 numfmt: reject suffix if unit is "none" 2022-07-13 10:47:59 +02:00
Sylvestre Ledru
db2e5fc6ec
Merge pull request #3714 from niyaznigmatullin/canonicalize_windows_symlink_loop_looking
canonicalize: Loop looking in windows
2022-07-13 10:36:18 +02:00
Niyaz Nigmatullin
c829ecfd1d uucore/fs: get back to loop looking in windows as FileInformation
for directory is fixed
2022-07-12 17:15:16 +03:00
Jamie Quigley
1a270361c0
Replace lazy_static with once_cell 2022-07-12 14:08:30 +01:00
Daniel Hofstetter
aef24db90f numfmt: show error if "i" suffix is missing 2022-07-12 10:58:07 +02:00
Niyaz Nigmatullin
de65d4d649
Realpath relative options (#3710)
* realpath: introduce relative options, make correct exit codes, make pass
GNU test mist/realpath.sh
2022-07-12 08:29:20 +02:00
Terts Diepraam
6b00aec48e
Merge pull request #3602 from lendandgit/main
df: better error message when executed in a chroot without /proc #3601
2022-07-11 23:02:39 +02:00
Sylvestre Ledru
e239ed9417
Merge pull request #3692 from jfinkels/cp-preserve-perm-link
cp: correctly copy attributes of a dangling symbolic link
2022-07-11 22:50:24 +02:00
Sylvestre Ledru
8074020a8b
Merge pull request #3705 from cakebaker/numfmt_unit
numfmt: implement "--to-unit" & "--from-unit"
2022-07-11 22:46:56 +02:00
Niyaz Nigmatullin
da5808d4ac
ls: add already listed message (#3707)
* ls: handle looping symlinks infinite printing

* ls: better coloring and printing symlinks when dereferenced

* tests/ls: add dereferencing and symlink loop tests

* ls: reformat changed using rustfmt

* ls: follow clippy advice for cleaner code

* uucore/fs: fix FileInformation to open directory handles in Windows as
well
2022-07-11 17:18:58 +02:00
Niyaz Nigmatullin
9d285e953d
Realpath symlinks handling, solves issue #3669 (#3703) 2022-07-10 16:49:25 +02:00
Daniel Hofstetter
1f292dd834 numfmt: implement "--to-unit" & "--from-unit" 2022-07-09 08:01:27 +02:00
Sylvestre Ledru
05823dd619
Merge pull request #3656 from eds-collabora/eds/tee_p
Implement tee -p
2022-07-08 18:41:34 +02:00
leon
de4cfdbea6 stat: improved error message 2022-07-07 15:24:00 +02:00
leon
97998a64dd df: removed unused import 2022-07-07 15:24:00 +02:00
leon
388e14f208 df: error handling cleanup 2022-07-07 15:24:00 +02:00
leon
72b0ba0b05 df: fixed clippy warning 2022-07-07 15:24:00 +02:00
leon
9d554751ca df: better error message when executed in a chroot without /proc #3601 2022-07-07 15:24:00 +02:00
Ed Smith
607bf3ca4d Terminate on elimination of all writers in tee
tee is supposed to exit when there is nothing left to write to. For
finite inputs, it can be hard to determine whether this functions
correctly, but for tee of infinite streams, it is very important to
exit when there is nothing more to write to.
2022-07-07 15:23:50 +02:00
Ed Smith
5c13e88f8b Do not trap pipe errors in yes
This is part of fixing the tee tests. 'yes' is used by the GNU test
suite to identify what the SIGPIPE exit code is on the target
platform. By trapping SIGPIPE, it creates a requirement that other
utilities also trap SIGPIPE (and exit 0 after SIGPIPE). This is
sometimes at odds with their desired behaviour.
2022-07-07 15:23:50 +02:00
Ed Smith
7a961a94a5 Preserve signal exit statuses in timeout
When the monitored process exits, the GNU version of timeout will
preserve its exit status, including the signal state.

This is a partial fix for timeout to enable the tee tests to pass.  It
removes the default Rust trap for SIGPIPE, and kill itself with the
same signal as its child exited with to preserve the signal state.
2022-07-07 15:23:50 +02:00
Ed Smith
a360504574 Implement tee -p and --output-error
This has the following behaviours. On Unix:

- The default is to exit on pipe errors, and warn on other errors.

- "--output-error=warn" means to warn on all errors

- "--output-error", "--output-error=warn-nopipe" and "-p" all mean
  that pipe errors are suppressed, all other errors warn.

- "--output-error=exit" means to warn and exit on all errors.

- "--output-error=exit-nopipe" means to suppress pipe errors, and to
  warn and exit on all other errors.

On non-Unix platforms, all pipe behaviours are ignored, so the default
is effectively "--output-error=warn" and "warn-nopipe" is identical.
The only meaningful option is "--output-error=exit" which is identical
to "--output-error=exit-nopipe" on these platforms.

Note that warnings give a non-zero exit code, but do not halt writing
to non-erroring targets.
2022-07-07 15:23:50 +02:00
Sylvestre Ledru
922afa29ff
Merge branch 'main' into cp-preserve-perm-link 2022-07-07 15:22:57 +02:00
dependabot[bot]
ea503bf633 build(deps): bump regex from 1.5.6 to 1.6.0
Bumps [regex](https://github.com/rust-lang/regex) from 1.5.6 to 1.6.0.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.5.6...1.6.0)

---
updated-dependencies:
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-07 15:22:48 +02:00
Daniel Hofstetter
ac35a1b985 comm: use NUL if delimiter is empty 2022-07-06 13:50:23 +02:00
Sylvestre Ledru
450bd3b597
Remove the is_symlink function 2022-07-06 11:18:31 +02:00
Sylvestre Ledru
38f5a47f76
Merge pull request #3698 from uutils/dependabot/cargo/once_cell-1.13.0
build(deps): bump once_cell from 1.12.0 to 1.13.0
2022-07-06 08:56:12 +02:00