The previous encoding handling was unnecessarily complex. This commit removes the enum that specifies the handling and instead has two separate methods to collect the strings either with lossy conversion or by ignoring invalidly encoded strings.
Outside of tests, only `accept_any` was used, meaning that this unnecessarily complicated the code. The behaviour of `accept_any` is now the default (and only) option.
* Fix a bug in split where chunking would be skipped when the chunk size
happened to be an exact divisor of the buffer size used to read the
input stream.
The issue here was that file was being split byte-wise in chunks of 1G.
The input stream was being read in chunks of 8KB, which evenly divides
the chunk size. Because the check to allocate the next output chunk was
done at the bottom of the loop previously, it would never occur because
the current input chunk was fully consumed at that point. By moving the
check to the top of the loop (but still late enough that we know we have
bytes to write) we resolve this issue.
This scenario is unfortunately hard to write a test for, since we don't
explicitly control the input chunk size.
Fixes https://github.com/uutils/coreutils/issues/3790
* tail: fix race condition (fix#3765)
There exists a race condition (RC) that can occur if changes to a path
happen after the initial print loop in `uu_tail()`, but before the
path is added to the notify-Watcher thread in `follow()`.
To minimize the window where the RC can occur, this moves starting the
Watcher thread and adding paths to it from `follow()` to the initial
print loop in `uu_tail()`.
Additionally, to make sure the RC cannot happen in
"gnu/tests/tail-2/F-headers.sh", the error message that is used as a trigger
in this test, is delayed until the path is added to the Watcher thread.
* build-gnu: remove workarounds for tail
Remove workarounds for "tests/tail-2/F-headers.sh" which are
(presumably) no longer needed because of the race condition fix.
* build-gnu: remove workarounds for tail
Remove workarounds for "tests/tail-2/F-headers.sh" which are
(presumably) no longer needed because of the race condition fix.
* tail: refactor to minimize chances of RC
Move "adding paths to Watcher thread" to its own loop and run this loop
before the initial tail-print-loop in order to minimize the window for
race conditions.
* cp: Refactor `reflink`/`sparse` handling to enable `--sparse` flag
`--sparse` and `--reflink` options have a lot of similarities:
- They have similar options (`always`, `never`, `auto`)
- Both need OS specific handling
- They can be mutually exclusive
Prior to this change, `sparse` was defined as `CopyMode`, but `reflink`
wasn't. Given the similarities, it makes sense to handle them similarly.
The idea behind this change is to move all OS specific file copy
handling in the `copy_on_write_*` functions. Those function then
dispatch to the correct logic depending on the arguments (at the moment,
the tuple `(reflink, sparse)`).
Also, move the handling of `--reflink=never` from `copy_file` to the
`copy_on_write_*` functions, at the cost of a bit of code duplication,
to allow `copy_on_write_*` to handle all cases (and later handle
`--reflink=never` with `--sparse`).
* cp: Implement `--sparse` flag
This begins to address #3362
At the moment, only the `--sparse=always` logic matches the requirement
form GNU cp info page, i.e. always make holes in destination when
possible.
Sparse copy is done by copying the source to the destination block by
block (blocks being of the destination's fs block size). If the block
only holds NUL bytes, we don't write to the destination.
About `--sparse=auto`: according to GNU cp info page, the destination
file will be made sparse if the source file is sparse as well. The next
step are likely to use `lseek` with `SEEK_HOLE` detect if the source
file has holes. Currently, this has the same behaviour as
`--sparse=never`. This `SEEK_HOLE` logic can also be applied to
`--sparse=always` to improve performance when copying sparse files.
About `--sparse=never`: from my understanding, it is not guaranteed that
Rust's `fs::copy` will always produce a file with no holes, as
["platform-specific behavior may change in the
future"](https://doc.rust-lang.org/std/fs/fn.copy.html#platform-specific-behavior)
About other platforms:
- `macos`: The solution may be to use `fcntl` command `F_PUNCHHOLE`.
- `windows`: I only see `FSCTL_SET_SPARSE`.
This should pass the following GNU tests:
- `tests/cp/sparse.sh`
- `tests/cp/sparse-2.sh`
- `tests/cp/sparse-extents.sh`
- `tests/cp/sparse-extents-2.sh`
`sparse-perf.sh` needs `--sparse=auto`, and in particular a way to skip
holes in the source file.
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
* ls: Implement --zero flag. (#2929)
This flag can be used to provide a easy machine parseable output from
ls, as discussed in the GNU bug report
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=49716.
There are some peculiarities with this flag:
- Current implementation of GNU ls of the `--zero` flag implies some
other flags. Those can be overridden by setting those flags after
`--zero` in the command line.
- This flag is not compatible with `--dired`. This patch is not 100%
compliant with GNU ls: GNU ls `--zero` will fail if `--dired` and
`-l` are set, while with this patch only `--dired` is needed for the
command to fail.
We also add `--dired` flag to the parser, with no additional behaviour
change.
Testing done:
```
$ bash util/build-gnu.sh
[...]
$ bash util/run-gnu-test.sh tests/ls/zero-option.sh
[...]
PASS: tests/ls/zero-option.sh
============================================================================
Testsuite summary for GNU coreutils 9.1.36-8ec11
============================================================================
# TOTAL: 1
# PASS: 1
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
```
* Use the US way to spell Behavior
* Fix formatting with cargo fmt -- tests/by-util/test_ls.rs
* Simplify --zero flag overriding logic by using index_of
Also, allow multiple --zero flags, as this is possible with GNU ls
command. Only the last one is taken into account.
Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
* Speed up sum by using reasonable read buffer sizes.
Use a 4K read buffer for each of the checksum functions, which seems
reasonable. This improves the performance of BSD checksums on
odyssey1024.txt from 399ms to 325ms on my laptop, and of SysV
checksums from 242ms to 67ms.
* Add BENCHMARKING.md for `sum`.
* Add comment regarding block sizes.
* Improve portability of BENCHMARKING.md
* Make `div_ceil` const and enhance comment.
Byte, character, and line counting can all be done on the raw bytes
of the incoming stream without decoding the Unicode characters. This
fact was previously exploited in specific fast paths for counting
characters and counting lines. This change unifies those fast paths into
a single shared fast paths, using const generics to specialize the
function for each use case. This has the benefit of making sure that all
combinations of these Unicode-oblivious fast paths benefit from the same
optimization.
On my laptop, this speeds up `wc -clm odyssey1024.txt` from 840ms to
120ms. I experimented with using a filter loop for line counting, but
continuing to use the bytecount crate came out ahead by a significant
margin.
When wc is invoked with only the -m flag, we only need to count the
number of Unicode characters in the input. In order to do so, we don't
actually need to decode the input bytes into characters. Rather, we can
simply count the number of non-continuation bytes in the UTF-8 stream,
since every character will contain exactly one non-continuation byte.
On my laptop, this speeds up `wc -m odyssey1024.txt` from 745ms to
109ms.
* wc: specialize scanning loop on settings.
The primary computational loop in wc (iterating over all the
characters and computing word lengths, etc) is configured by a
number of boolean options that control the text-scanning behavior.
If we monomorphize the code loop for each possible combination of
scanning configurations, the rustc is able to generate better code
for each instantiation, at the least by removing the conditional
checks on each iteration, and possibly by allowing things like
vectorization.
On my computer (aarch64/macos), I am seeing at least a 5% performance
improvement in release builds on all wc flag configurations
(other than those that were already specialized) against
odyssey1024.txt, with wc -l showing the greatest improvement at 15%.
* Reduce the size of the wc dispatch table by half.
By extracting the handling of hand-written fast-paths to the
same dispatch as the automatic specializations, we can avoid
needing to pass `show_bytes` as a const generic to
`word_count_from_reader_specialized`. Eliminating this parameter
halves the number of arms in the dispatch.
* ls: handle looping symlinks infinite printing
* ls: better coloring and printing symlinks when dereferenced
* tests/ls: add dereferencing and symlink loop tests
* ls: reformat changed using rustfmt
* ls: follow clippy advice for cleaner code
* uucore/fs: fix FileInformation to open directory handles in Windows as
well
tee is supposed to exit when there is nothing left to write to. For
finite inputs, it can be hard to determine whether this functions
correctly, but for tee of infinite streams, it is very important to
exit when there is nothing more to write to.
This is part of fixing the tee tests. 'yes' is used by the GNU test
suite to identify what the SIGPIPE exit code is on the target
platform. By trapping SIGPIPE, it creates a requirement that other
utilities also trap SIGPIPE (and exit 0 after SIGPIPE). This is
sometimes at odds with their desired behaviour.
When the monitored process exits, the GNU version of timeout will
preserve its exit status, including the signal state.
This is a partial fix for timeout to enable the tee tests to pass. It
removes the default Rust trap for SIGPIPE, and kill itself with the
same signal as its child exited with to preserve the signal state.
This has the following behaviours. On Unix:
- The default is to exit on pipe errors, and warn on other errors.
- "--output-error=warn" means to warn on all errors
- "--output-error", "--output-error=warn-nopipe" and "-p" all mean
that pipe errors are suppressed, all other errors warn.
- "--output-error=exit" means to warn and exit on all errors.
- "--output-error=exit-nopipe" means to suppress pipe errors, and to
warn and exit on all other errors.
On non-Unix platforms, all pipe behaviours are ignored, so the default
is effectively "--output-error=warn" and "warn-nopipe" is identical.
The only meaningful option is "--output-error=exit" which is identical
to "--output-error=exit-nopipe" on these platforms.
Note that warnings give a non-zero exit code, but do not halt writing
to non-erroring targets.
Add a `uucore::fs::is_symlink()` function that takes in a
`std::path::Path` and decides whether the given path is a symbolic
link. This is essentially a backport of the `Path::is_symlink()`
function that appears in Rust version 1.58.0. This commit also
replaces some now-duplicate code in `chmod`, `cp`, `ln`, and `rmdir`
that checks whether a path is a symbolic link with a call to
`is_symlink()`.
Technically, this commit slightly changes the behavior of
`cp`. Previously, there was a line of code like this
if fs::symlink_metadata(&source)?.file_type().is_symlink() {
where the `?` operator propagates an error from `symlink_metadata()`
to the caller. Now the line of code is
if is_symlink(source) {
in which any error from `symlink_metadata()` has been converted to
just be a `false` value. I believe this is a satisfactory tradeoff to
make, since an error in accessing the file will likely cause an error
later in the same code path.
Fix a bug in which `cp` incorrectly exited with an error when
attempting to copy the attributes of a dangling symbolic link (that
is, when running `cp -P -p`).
Fixes#3531.
Refactor common code used in several places into a convenience
function `is_symlink()` that behaves like `Path::is_symlink()` added
in Rust 1.58.0. (We support earlier versions of Rust so we cannot use
the standard library version of this function.)
This change will extract a utility already present in ls to uucore.
This utility is used by dir and vdir too, which are adjusted to
look it up in uucode. No further changes to ls, dir or dirv intended.
The change here largely fiddles with the output of uu_wc to match
that of GNU wc. This is the case to the extent to make unit tests
pass, however, there are differences remaining. One specific
difference I did not tackle is that GNU wc will not align the
output columns (compute_number_width() -> 1) in the specific case
of the input for --files0-from=- being a named pipe, not real stdin.
This difference can be triggered using the following two invocations.
- wc --files0-from=- < files0 # use a named pipe, GNU does align
- cat files0- | wc --files0-from=- # use real stdin, GNU does not
align.
* tail: reduce CPU load for polling
This reduces the CPU load for polling drastically (from ~80% down to ~5%)
by removing/fixing several previous workarounds related to polling,
while still passing all related GNU test-suite checks.
* set Notify::PollWatcher delay to: sleep_sec/10 instead of
sleep_sec/100
* set recv_timeout to sleep_sec instead of sleep_sec/100
* remove the manual polling of watched files
Bugs:
* fix an issue with headers to consistently pass
"test_follow_name_retry_headers" and "gnu/tests/tail-2/overlay-headers.sh"
Code clean-up and refactor
* make fields of struct FileHandling private (and add getters/setters)
to ensure that the paths are absolute and match the paths returned by
Notify::Events
* replace calls to "crash!" with "return USimpleError"
* clean-up formatting
When `--backup` is supplied, `cp` will take a backup of *destination* before *source* is copied. When `--backup=simple` is supplied, it is possible for the backup path for *destination* to equal the path for *source*, destroying source before the copy is made. This change prevents this by returning an error instead.
This fixes https://github.com/uutils/coreutils/issues/3629
Update `dd` to only print a concise form of the number of bytes with
an SI prefix (like "1 MB" or "2 GB") if the number is at least
1000. Similarly, only print the concise form with an IEC prefix (like
"1 MiB" or "2 GiB") if the number is at least 1024. For example,
$ head -c 999 /dev/zero | dd > /dev/null
1+1 records in
1+1 records out
999 bytes copied, 0.0 s, 999.0 KB/s
$ head -c 1000 /dev/zero | dd > /dev/null
1+1 records in
1+1 records out
1000 bytes (1000 B) copied, 0.0 s, 1000.0 KB/s
$ head -c 1024 /dev/zero | dd > /dev/null
2+0 records in
2+0 records out
1024 bytes (1 KB, 1024 B) copied, 0.0 s, 1.0 MB/s
unnecessary trailing space was being added. because we were padding for alignment,
which is not required with -m
fixes#3608
Signed-off-by: anastygnome <noreplygitemail@protonmail.com>
Several binaries have been added to `hashsum` that have never been part
of GNU Coreutils:
- `sha3*sum` (uutils/coreutils#869)
- `shake*sum` (uutils/coreutils#987)
- `b3sum` (uutils/coreutils#3108 and uutils/coreutils#3164)
In particular, the `--bits` option, and the `--no-names` option added in
uutils/coreutils#3361, are not valid for any GNU Coreutils `*sum` binary
(as of Coreutils 9.0).
This commit refactors the argument parsing so that `--bits` and
`--no-names` become invalid options for the binaries intended to match
the GNU Coreutils API, instead of being ignored options. It also
refactors the custom binary name handling to distinguish between
binaries intended to match the GNU Coreutils API, and binaries that
don't have that constraint.
Part of uutils/coreutils#2930.
Make `mktemp` exit with an error if the `--suffix` option is the empty
string and the template argument does not end in an "X". Previously,
the program succeeded.
Before this commit,
$ mktemp --suffix= aXXXb
apBEb
After this commit,
$ mktemp --suffix= aXXXb
mktemp: with --suffix, template 'aXXXb' must end in X
This PR reuses the defined --sync flag for df, which was previously a
no-op, assigns its default value and uses the fact that
get_all_filesystems takes CLI options as a parameter to perform a sync
operation before any fs call.
Fixes#3564
Signed-off-by: anastygnome <noreplygitemail@protonmail.com>
Fix a bug in which `mktemp` would replace everything in the template
argument from the first 'X' to the last 'X' with random bytes, instead
of just replacing the last contiguous block of 'X's.
Before this commit,
$ mktemp XXX_XXX
2meCpfM
After this commit,
$ mktemp XXX_XXX
XXX_Rp5
This fixes test cases `suffix2f` and `suffix2d` in
`tests/misc/mktemp.pl` in the GNU coreutils test suite.
Simplify the logic of computing the file path parameters (the
directory, prefix, suffix, and number of random characters) for the
temporary file created by `mktemp`. This commits adds an `Options`
struct as a layer of indirection between the application logic and
`clap`, and a `Params` struct whose associated function is responsible
for determining the file path parameters from the `Options`. This is
an improvement because the previous code had some logic for
determining file path parameters in one place and some in another
place.
Show the "total" label in the "source" column or in the "target"
column if the "source" column is not visible.
Before this commit,
$ df --total --output=target .
Mounted on
/
-
After this commit,
$ df --total --output=target .
Mounted on
/
total
Include the suffix in the error message produced by `mktemp` when
there are too few Xs in the template. Before this commit,
$ mktemp --suffix=X aXX
mktemp: too few X's in template 'aXX'
After this commit,
$ mktemp --suffix=X aXX
mktemp: too few X's in template 'aXXX'
This matches the behavior of GNU `mktemp`.
Correct the error message when the template argument contains a path
separator in its suffix. Before this commit:
$ mktemp aXXX/b
mktemp: too few X's in template 'b'
After this commit:
$ mktemp aXXX/b
mktemp: invalid suffix '/b', contains directory separator
This error message is more appropriate and matches the behavior of GNU
mktemp.
On macOS path.is_dir() can be false for directories
if it was a redirect, e.g. ` tail < DIR`
* fix some tests for macOS
Cleanup:
* fix clippy/spell-checker
* fix build for windows by refactoring stdin_is_pipe_or_fifo()
Correct the error message produced by `mktemp` when `--tmpdir` is
given and the template is an absolute path:
$ mktemp --tmpdir=a /XXX
mktemp: invalid template, '/XXX'; with --tmpdir, it may not be absolute
This is WIP or even WONT-FIX because there's a workaround in Rust's
stdlib which prevents us from detecting a closed FD.
see also the discussion at:
https://github.com/uutils/coreutils/issues/2873
Update `chown` to allow setting the owner of a file to a numeric user
ID regardless of whether a corresponding username exists on the
system.
For example,
$ touch f && sudo chown 12345 f
succeeds even though there is no named user with ID 12345.
Fixes#3380.
Correct the error that arises from a path separator in the prefix
portion of a template argument provided to `mktemp`. Before this
commit, the error message was incorrect:
$ mktemp -t a/bXXX
mktemp: failed to create file via template 'a/bXXX': No such file or directory (os error 2) at path "/tmp/a/bege"
After this commit, the error message is correct:
$ mktemp -t a/bXXX
mktemp: invalid template, 'a/bXXX', contains directory separator
The code was failing to check for a path separator in the prefix
portion of the template.
Change the return type of the `parse_template()` helper function in
the `mktemp` program so that it returns `Result<..., MkTempError>`
instead of `UResult<...>`. This separates the lower level helper
function from the higher level `UResult` abstraction and will make it
easier to refactor this code in future commits.
The refactoring consists of three parts:
1) Introduction of SizeFormat & HumanReadable enums
2) Addition of a size_format field to the options struct
3) Movement of header logic from BlockSize to Header
Fixes an issue with cp not copying symlinks in spite of the -P (no dereference option)
Fix issue #3364
Performance improvements
Avoid useless read from metadata and reuse previous dest information
Signed-off-by: anastygnome <noreplygit@protonmail.com>
Fix a bug in `mktemp` where it was not respecting the path given by
the positional argument. Previously, it would place the temporary file
whose name is induced by a given template in the `/tmp` directory,
like this:
$ mktemp XXX
/tmp/LJr
$ mktemp d/XXX
/tmp/d/IhS
After this commit, it respects the directory given in the template
argument:
$ mktemp XXX
LJr
$ mktemp d/XXX
d/IhS
Fixes#3440.
* Fix a timing related bug with polling (---disable-inotify) where some
Events weren't delivered fast enough by `Notify::PollWatcher` to pass all
of tests/tail-2/retry.sh and test_tail::{test_retry4, retry7}.
* uu_tail now reverts to polling automatically if inotify backend reports
too many open files (this mimics the behavior of GNU's tail).
This makes uu_tail pass the "gnu/tests/tail-2/inotify-only-regular" test
again by adding support for charater devices.
test_tail:
* add test_follow_inotify_only_regular
* add clippy fixes for windows
The code for creating a Passwd from the fields of the raw syscall result
assumed that the syscall would return valid C strings in all non-error
cases. This is not true, and at least one platform (Android) will
populate the fields with null pointers where they are not supported.
To fix this and prevent the error from happening again, this commit
changes `cstr2string(ptr)` to check for a null pointer, and return an
`Option<String>`, with `None` being the null pointer case. While
arguably it should be the caller's job to check for a null pointer
before calling (since the safety precondition is that the pointer is to
a valid C string), relying on the type checker to force remembering this
edge case is safer in the long run.
```
error: writing `&mut Vec` instead of `&mut [_]` involves a new object where a slice will do
--> src\uu\mkdir\src/mkdir.rs:72:33
|
72 | fn strip_minus_from_mode(_args: &mut Vec<String>) -> bool {
| ^^^^^^^^^^^^^^^^ help: change this to: `&mut [String]`
|
= note: `-D clippy::ptr-arg` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
```
* hashsum: add --no-names option from official b3sum tool
The official b3sum tool has a --no-names option for only printing the
hashes, omitting the filenames. This is quite handy when used from
scripts because it spares the postprocessing with "cut" or "awk".
Since the installed b3sum symlink would also serve as a drop-in for the
official tool, the --no-names option is expected to exist for
compatibility.
Add a --no-names option not only for b3sum but for hashsum in general
(and maybe GNU coreutils will also feel inspired to add this option).
Closes https://github.com/uutils/coreutils/issues/3360
Change formula from: "Used/Size * 100" to "Used/(Used + Avail) * 100".
This formula also works if "Used" and "Avail" do not add up to "Size",
which is the case if there are reserved disk blocks.
When doing
ln b b~
ln -f --b=simple a b
First, we create a backup of b
Then, we force the override of a => b but we make sure that the backup is
done.
So, we had a bug in the ordering of the actions.
we were first removing b. Therefore, losing the capability to do a backup of this.
Change formula from: "Used/Size * 100" to "Used/(Used + Avail) * 100".
This formula also works if "Used" and "Avail" do not add up to "Size",
which is the case if there are reserved disk blocks.
Previously, individual file sizes were used to compute the number width, which
would cause misalignment when the total has a greater number of digits, and is
different from the behavior of GNU wc
```
$ ./target/debug/wc -w -l -m -c -L deny.toml GNUmakefile
95 422 3110 3110 85 deny.toml
349 865 6996 6996 196 GNUmakefile
444 1287 10106 10106 196 total
$ wc -w -l -m -c -L deny.toml GNUmakefile
95 422 3110 3110 85 deny.toml
349 865 6996 6996 196 GNUmakefile
444 1287 10106 10106 196 total
```
"df --output ." was treated as "df --output=." and hence "." was
interpreted as a column name. With this commit, "." is treated as
an argument on its own.
Fixes#3324
Print a usage error when duplicat column names are specified to the
`--output` command-line argument. For example,
$ df --output=source,source
df: option --output: field ‘source’ used more than once
Try 'df --help' for more information.
Implement the "File" column in the `df` output table. Before this
commit, a blank entry appeared in the "File" column for each
row. After this commit, a "-" entry appears when `df` is run with no
positional arguments and the filename appears when run with positional
arguments. For example:
$ touch a b c && df --output=target,file a b c
Mounted on File
/ a
/ b
/ c
Move the code for parsing the `ConversionMode` to use up to the
`parseargs` module. This location makes more sense for it because the
conversion mode can be determined entirely from the command-line
arguments at the time of parsing just like the other parameters. Using
an enum for this purpose also eliminates the amount of code we need
later on.
Replace a cascading `if/else if` chain in `conv_block_unblock_helper()`
with a match statement on a new enum, `ConversionMode`, that enumerates
the various modes in which `dd` can operate.
Produce a usage error on an invalid signal argument. For example,
$ timeout --signal=invalid 1 sleep 0
timeout: 'invalid': invalid signal
Try 'timeout --help' for more information.
Fix a bug in the behavior of `split -e -n NUM` when the input file is
empty. Previously, it would panic due to overflow when subtracting 1
from 0. After this change, it will terminate successfully and produce
no output chunks.
These are the first half of changes needed to pass the dd/bytes.sh tests:
- Add iseek and oseek options (additive with skip and seek options)
- Implement tests for the new flags, matching those from dd/bytes.sh
Split the code for getting a list of `Filesystem` objects into two
separate functions: one for getting the list of all filesystems (when
running `df` with no arguments) and one for getting only named
filesystems (when running `df` with one or more arguments). This does
not change the behavior of `df` only the organization of the code.
Add the `Filesystem::from_path()` function which produces the
filesystem on which a given path is mounted. The `Filesystem` is taken
from a list of candidate `MountInfo` objects.
Replace a loop over `Row` objects with a loop over `Filesystem`
objects when it comes time to display each row of the output
table. This helps with code structure, since the `Filesystem` is the
primary object of interest in the `df` program. This makes it easier
for us to independently vary how the list of filesystems is produced
and how the list of filesystems is displayed.
Implement distributing lines of a file in a round-robin manner to a
specified number of chunks. For example,
$ (seq 1 10 | split -n r/3) && head -v xa[abc]
==> xaa <==
1
4
7
10
==> xab <==
2
5
8
==> xac <==
3
6
9
Previously, given 'cp -P a b', where 'a' and 'b' were both symlinks, cp
would end up replacing the target of 'b'.
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Remove the use of the `FsSelector` struct when deciding whether a
filesystem type should be included or excluded from the output
table. Instead, just maintain optional `Vec`s of filesystem types to
exclude and include, and check whether the filesystem type is
contained in one of those. This reduces the amount of code required to
implement these checks, and since the number of types given in the
`include` or `exclude` lists is likely to be small, there should not
be much of a difference in performance.
Fix a bug where `timeout --preserve-status` was not correctly
preserving the status code of the child process if it timed out. When
that happens, the status code of the child process is considered to be
the signal number (in this case, `SIGTERM`). The exit status of
`timeout` is then 128 plus the numeric code associated with `SIGTERM`.
Fix unit tests for the `is_included()` and `filter_mount_list()`
internal helper functions. The function signatures changed but the
tests did not get updated to match.
Factor a helper function `wait_or_kill_process()` out of the main
`timeout()` function. This helper function contains the code to wait
for a child process and send the `SIGKILL` signal if it does not
terminate within a specified amount of time. This does not change the
function of `timeout`, just the organization of the code.
* Adds support for mount path prefix matching and input path
canonicalization
- Sorts mount paths in reverse lexicographical order
- Canonicalize all paths and clear invalid paths
- Checking of mount path prefix matches input path
Implement the `--line-bytes` option to `split`. In this mode, the
program tries to write as many lines of the input as possible to each
chunk of output without exceeding a specified byte limit. The new
`LineBytesChunkWriter` struct represents this functionality.
Implement the `--output` command-line argument, which allows
specifying an exact sequence of columns to display in the output
table. For example,
$ df --output=source,fstype | head -n3
Filesystem Type
udev devtmpfs
tmpfs tmpfs
(The spacing does not exactly match the spacing of GNU `df` yet.)
Fixes#3057.
Replace the `Options.show_fs_type` and `Options.show_inode_instead`
fields with the more general `Options.columns`, a `Vec` of `Column`
variants representing the exact sequence of columns to display in the
output. This makes `Options` able to represent arbitrary output column
display configurations.
Correct the column header printed by `df` when the `--block-size`
argument has a value that is a multiple of 1024. After this commit,
the header looks like "1K" or "4M" or "117G", etc., depending on the
particular value of the block size. For example:
$ df --block-size=1024 | head -n1
Filesystem 1K-blocks Used Available Use% Mounted on
$ df --block-size=2048 | head -n1
Filesystem 2K-blocks Used Available Use% Mounted on
$ df --block-size=3072 | head -n1
Filesystem 3K-blocks Used Available Use% Mounted on
$ df --block-size=4096 | head -n1
Filesystem 4K-blocks Used Available Use% Mounted on
Add support for the `--total` option to `df`, which displays the total
of each numeric column. For example,
$ df --total
Filesystem 1K-blocks Used Available Use% Mounted on
udev 3858016 0 3858016 0% /dev
...
/dev/loop14 63488 63488 0 100% /snap/core20/1361
total 258775268 98099712 148220200 40% -
Implement `-n l/k/N` option, where the `k`th chunk of the input file
is written to stdout. For example,
$ seq -w 0 99 > f; split -n l/3/10 f
20
21
22
23
24
25
26
27
28
29
Change `df` so that it correctly scales numbers of bytes by the
default block size, 1024, when neither -h nor -H are specified on the
command-line. Previously, it was not scaling the number of bytes in
this case.
Fixes#3058.
Add support for `split -n l/NUM`. Previously, `split` only supported
`-n NUM`, which splits a file into `NUM` chunks by byte. The `-n
l/NUM` strategy splits a file into `NUM` chunks without splitting
lines across chunks.
Make the `Strategy::Number` enumeration value more general by
replacing the number parameter with a `NumberType` enum parameter.
This allows a future commit to update `split` to support the various
sub-strategies for the `-n`. (This commit does not add support for the
other sub-strategies.)
https://github.com/uutils/coreutils/pull/3084 (2a333ab391) had some
missing coverage and was merged before I had a chance to fix it.
This PR adds some coverage / improved error messages that were missing
from that previous PR.
If `conv=block,sync` command-line arguments are given and there is at
least one partial record read from the input (for example, if the
length of the input is not divisible by the value of the `ibs`
argument), then output an extra block of `cbs` spaces.
For example, no extra spaces are printed in this example because the
input is of length 10, a multiple of `ibs`:
$ printf "012\nabcde\n" \
> | dd ibs=5 cbs=5 conv=block,sync status=noxfer \
> && echo $
012 abcde$
2+0 records in
0+1 records out
But in this example, 5 extra spaces are printed because the length of
the input is not a multiple of `ibs`:
$ printf "012\nabcdefg\n" \
> | dd ibs=5 cbs=5 conv=block,sync status=noxfer \
> && echo $
012 abcde $
2+1 records in
0+1 records out
1 truncated record
The number of spaces printed is the size of the conversion block,
given by `cbs`.
This should correct the usage strings in both the `--help` and user documentation. Previously, sometimes the name of the utils did not show up correctly.
Create a new module `blocks.rs` to contain the block-related helper
functions. This commit only moves the location of the code and related
tests, it does not change the functionality of `dd`.
Collect structs, implementations, and functions that have to do with
reporting number of blocks read and written into their own new module,
`progress.rs`. This commit also adds docstrings for everything and
unit tests for the significant methods. This commit does not change
the behavior of `dd`, just the organization of the code to make it
more maintainable and testable.
Prevent `dd` from terminating with an error when given the
command-line argument `of=/dev/null`. This commit allows the call to
`File::set_len()` to result in an error without causing the process to
terminate prematurely.
- Configured clap to take crate version, so version is now visible in docs.
- Added ABOUT string from expr help output, so about section is getting rendered in docs.
- Added USAGE section.
- Added HELP section for each args.
Place the "truncated records" line below the "records out" line in the
status report produced by `dd` and properly handle the singularization
of the word "record" in the case of 1 truncated record. This matches
the behavior of GNU `dd`.
For example
$ printf "ab" | dd cbs=1 conv=block status=noxfer > /dev/null
0+1 records in
0+1 records out
1 truncated record
$ printf "ab\ncd\n" | dd cbs=1 conv=block status=noxfer > /dev/null
0+1 records in
0+1 records out
2 truncated records
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.
The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.
For example, contrast
$ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
a
cat: xac: No such file or directory
with
$ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
a
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Add a `NumericHexadecimal` member to the `SuffixType` enum so that a
future commit can add support for hexadecimal filename suffixes to the
`split` program.
Refactor the code to use a `SuffixType` enumeration with two members,
`Alphabetic` and `NumericDecimal`, representing the two currently
supported ways of producing filename suffixes. This prepares the code
to more easily support other formats, like numeric hexadecimal.
Clean up unit tests in the `dd` crate to make them easier to
manage. This commit does a few things.
* move test cases that test the complete functionality of the `dd`
program from the `dd_unit_tests` module up to the
`tests/by-util/test_dd.rs` module so that they can take advantage of
the testing framework and common testing tools provided by uutils,
* move test cases that test internal functions of the `dd`
implementation into the `tests` module within `dd.rs` so that they
live closer to the code they are testing,
* replace test cases defined by macros with test cases defined by
plain old functions to make the test cases easier to read at a
glance.
* include io-blksize parameter
* format changes for including io-blksize
Co-authored-by: DevSabb <devsabb@local>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
This change is needed to fix missing USAGE section for `od` in user docs.
With reference to this issue
https://github.com/uutils/coreutils/issues/2991, and missing USAGE
section from `od docs` at
https://uutils.github.io/coreutils-docs/user/utils/od.html, it was
found that the USAGE for od app was starts with an empty line and uudoc
only takes 1st line for using in USAGE section in docs.
This resulted in empty line in usage section for `od`
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Add a `NumericHexadecimal` member to the `SuffixType` enum so that a
future commit can add support for hexadecimal filename suffixes to the
`split` program.
Refactor the code to use a `SuffixType` enumeration with two members,
`Alphabetic` and `NumericDecimal`, representing the two currently
supported ways of producing filename suffixes. This prepares the code
to more easily support other formats, like numeric hexadecimal.
Correct the accounting for partial records written by `dd` to the
output file. After this commit, if fewer than `obs` bytes are written,
then that is counted as a partial record. For example,
$ printf 'abc' | dd bs=2 status=noxfer > /dev/null
1+1 records in
1+1 records out
That is, one complete record and one partial record are read from the
input, one complete record and one partial record are written to the
output. Previously, `dd` reported two complete records and zero
partial records written to the output in this case.
Change the `filter_mount_list()` function so that it always produces
the same order of `MountInfo` objects. This change ultimately results
in `df` printing its table of filesystems in the same order on each
execution. Previously, the table was in an arbitrary order because the
`MountInfo` objects were read from a `HashMap`.
Fixes#3086.
* ls: add new optional arguments to --classify flag
The --classify flag in ls now takes an option when argument
that may have the values always, auto and none.
Modified clap argument to allow an optional parameter and
changed the classify flag value parsing logic to account for
this change.
* ls: add test for indicator-style, ind and classify with value none
* ls: require option paramter to --classify to use a = to specify flag value
* ls: account for all the undocumented possible values for the --classify flag
Added the other values for the --classify flag along with modifications to tests.
Also documented the inconsistency between GNU coreutils because we accept the
flag value even for the short version of the flag.
The iflag, oflag and conv cli arguments take a list of values
and the correct behavior is to collect all values from multiple
occurences of theme.
For example if we call `dd --iflag=directory --iflag=skip_bytes` this should
collect the two values, `directory` and `skip_bytes` for iflag.
The unittest was added for this case.
Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
Add the `ByteChunkWriter` and `LineChunkWriter` structs and
implementations, but don't use them yet. This structs offer an
alternative approach to writing chunks of output (contrasted with
`ByteSplitter` and `LineSplitter`). The main difference is that
control of which underlying file is being written is inside the writer
instead of outside.
Add some helper functions and adjust some error-handling to make the
`Output::dd_out()` method, containing the main loop of the `dd`
program, more concise. This commit also adds documentation and
comments describing the main loop procedure in more detail.
This lets us use fewer reallocations when parsing each line.
The current guess is set to the maximum fields in a line so far. This is
a free performance win in the common case where each line has the same
number of fields, but comes with some memory overhead in the case where
there is a line with lots of fields at the beginning of the file, and
fewer later, but each of these lines are typically not kept for very
long anyway.
Using indexes into the line instead of Vec<u8>s means we don't have to copy
the line to store the fields (indexes instead of slices because it avoids
self-referential structs). Using memchr also empirically saves a lot of
intermediate allocations.
Refactor the code for representing the `df` data table into `Header`
and `Row` structs. These structs live in a new module `table.rs`. When
combined with the `Options` struct, these structs can be
`Display`ed. Organizing the code this way makes it possible to test
the display settings independently of the machinery for getting the
filesystem data. New unit tests have been added to `table.rs` to
demonstrate this benefit.
Show a warning if the `skip=N` command-line argument would cause `dd`
to skip past the end of the input. For example:
$ printf "abcd" | dd bs=1 skip=5 count=0 status=noxfer
'standard input': cannot skip to specified offset
0+0 records in
0+0 records out
Add some structure to errors that can be created during parsing of
settings from command-line options. This commit creates
`StrategyError` and `SettingsError` enumerations to represent the
various parsing and other errors that can arise when transforming
`ArgMatches` into `Settings`.
Show a warning when a block size includes "0x" since this is
ambiguous: the user may have meant "multiply the next number by zero"
or they may have meant "the following characters should be interpreted
as a hexadecimal number".
When specifying `seek=N` and *not* specifying `conv=notrunc`, truncate
the output file to `N` blocks instead of truncating it to zero before
starting to write output. For example
$ printf "abc" > outfile
$ printf "123" | dd bs=1 skip=1 seek=1 count=1 status=noxfer of=outfile
1+0 records in
1+0 records out
$ cat outfile
a2
Fixes#3068.
When this option is present, the files argument is not processed. This option processes the file list from provided file, splitting them by the ascii NUL (\0) character. When files0-from is '-', the file list is processed from stdin.
Correct the behavior of `dd` when multiple arguments are provided.
Before this commit, if the multiple arguments was provided then
the validation error are returned.
For example
```
$ printf '' | ./target/debug/dd status=none status=noxfer
error: The argument '--status=<LEVEL>' was provided more than once, but cannot be used multiple times
USAGE:
dd [OPTIONS]
For more information try --help
```
The unittest was added for this case.
This avoids hacking around the short options of these command line
arguments that have been introduced by clap. Additionally, we test and
correctly handle the combination of both version and help. The GNU
binary will ignore both arguments in this case while clap would perform
the first one. A test for this edge case was added.
Now treats recognized command line options and ignores unrecognized
command line options instead of returning a special exit status for
them.
There is one point of interest, which is related to an implementation
detail in GNU `true`. It may return a non-true exit status (in
particular EXIT_FAIL) if writing the diagnostics of a GNU specific
option fails. For example `true --version > /dev/full` would fail and
have exit status 1.
This behavior was acknowledged in gnu in commit
<9a6a486e6503520fd2581f2d3356b7149f1b225d>. No further
justification provided for keeping this quirk.
POSIX knows no such options, and requires an exit status of 0 in all
cases. We replicate GNU here which is a consistency improvement over the
prior implementation. Adds documentation to clarify the intended
behavior more properly.
Exit with status code 1 for argument parsing errors in `truncate`. When
`clap` encounters an error during argument parsing, it exits with status
code 2. This causes some GNU tests to fail since they expect status code
1.
Refactor the `Mode` enum in the `head.rs` module so that it includes
not only the mode type---lines or bytes---but also whether to read the
first NUM items of that type or all but the last NUM. Before this
commit, these two pieces of information were stored separately. This
made it difficult to read the code through several function calls and
understand at a glance which strategy was being employed.
This allows for `-t` to take invalid unicode (but still single-byte) values
on unix-like platforms. Other platforms, which as of the time of this commit
do not support `OsStr::as_bytes()`, could possibly be supported in the future,
but would require design decisions as to what that means.