Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
Correct the `test_split::test_suffixes_exhausted` test case so that it
actually exercises the intended behavior of `split`. Previously, the
test fixture contained 26 bytes. After this commit, the test fixture
contains 27 bytes. When using a suffix width of one, only 26 filenames
should be available when naming chunk files---one for each lowercase
ASCII letter. This commit ensures that the filenames will be exhausted
as intended by the test.
Show a warning if the `skip=N` command-line argument would cause `dd`
to skip past the end of the input. For example:
$ printf "abcd" | dd bs=1 skip=5 count=0 status=noxfer
'standard input': cannot skip to specified offset
0+0 records in
0+0 records out
Show a warning when a block size includes "0x" since this is
ambiguous: the user may have meant "multiply the next number by zero"
or they may have meant "the following characters should be interpreted
as a hexadecimal number".
When specifying `seek=N` and *not* specifying `conv=notrunc`, truncate
the output file to `N` blocks instead of truncating it to zero before
starting to write output. For example
$ printf "abc" > outfile
$ printf "123" | dd bs=1 skip=1 seek=1 count=1 status=noxfer of=outfile
1+0 records in
1+0 records out
$ cat outfile
a2
Fixes#3068.
When this option is present, the files argument is not processed. This option processes the file list from provided file, splitting them by the ascii NUL (\0) character. When files0-from is '-', the file list is processed from stdin.
Using this escaped character will cause `printf` to stop generating characters.
For instance,
```rust
hbina@akarin ~/g/uutils (hbina-add-test-for-additional-escape)> cargo run --quiet -- printf "%s\c%s" a b
a⏎
```
Signed-off-by: Hanif Ariffin <hanif.ariffin.4326@gmail.com>
* test_sort: Output sorted files to a file with different name
Signed-off-by: Hanif Bin Ariffin <hanif.ariffin.43262@gmail.com>
* Fix the test by saving the environment variable
Signed-off-by: Hanif Bin Ariffin <hanif.ariffin.43262@gmail.com>
Co-authored-by: Hanif Bin Ariffin <hanif.ariffin.43262@gmail.com>
This avoids hacking around the short options of these command line
arguments that have been introduced by clap. Additionally, we test and
correctly handle the combination of both version and help. The GNU
binary will ignore both arguments in this case while clap would perform
the first one. A test for this edge case was added.
This allows for `-t` to take invalid unicode (but still single-byte) values
on unix-like platforms. Other platforms, which as of the time of this commit
do not support `OsStr::as_bytes()`, could possibly be supported in the future,
but would require design decisions as to what that means.
Prevent usize underflow when reducing the size of a file by more than
its current size. For example, if `f` is a file with 3 bytes, then
truncate -s-5 f
will now set the size of the file to 0 instead of causing a panic.
Improve the error message that gets printed when a directory does not
exist. After this commit, the error message is
truncate: cannot open '{file}' for writing: No such file or directory
where `{file}` is the name of a file in a directory that does not
exist.
Change a word in the error message displayed when an increment value
of 0 is provided to `seq`. This commit changes the message from "Zero
increment argument" to "Zero increment value" to match the GNU `seq`
error message.
Add an error for division by zero. Previously, running `truncate -s /0
file` or `-s %0` would panic due to division by zero. After this
change, it writes an error message "division by zero" to stderr and
terminates with an error code.
Add support for the `-f FORMAT` option to `seq`. This option instructs
the program to render each value in the generated sequence using a
given `printf`-style floating point format. For example,
$ seq -f %.2f 0.0 0.1 0.5
0.00
0.10
0.20
0.30
0.40
0.50
Fixes issue #2616.
Fix a bug where `tail -f` would terminate with an error due to failing
to parse a UTF-8 string from a sequence of bytes read from the
followed file. This commit replaces the call to `BufRead::read_line()`
with a call to `BufRead::read_until()` so that any sequence of bytes
regardless of encoding can be read.
Fixes#1050.
Correct the behavior of `dd` with the `status=noxfer` option. Before
this commit, the status output was entirely suppressed (as happens
with `status=none`). This was incorrect behavior. After this commit,
the input/output counts are printed to stderr as expected.
For example,
$ printf "" | dd status=noxfer
0+0 records in
0+0 records out
This commit also updates a unit test that was enforcing the wrong
behavior.
Fix the behavior of truncate when given a non-existent file so that it
correctly creates the file before truncating it (unless the
`--no-create` option is also given).
Fix a bug when getting all but the first NUM lines or bytes of a file
via `tail -n +NUM <file>` or `tail -c +NUM <file>`. The bug only
existed when a file is given as an argument; it did not exist when the
input data came from stdin.
Support `-z` option when the input is not a seekable file. Previously,
the option was accepted by the argument parser, but it was being
ignored by the application logic.
This expands the error message that is printed if either input file has
an unsorted line. Both the program name (join) and the offending line
are printed out with the message to match the behaviour of the GNU
utility.
This commit replaces generic Results with UResults in some key
functions in numfmt. As a result of this, we can provide different
exit codes for different errors, which resolves ~70 failing test
cases in the GNU numfmt.pl test suite.
Fix two issues with the filename creation algorithm. First, this
corrects the behavior of the `-a` option. This commit ensures a
failure occurs when the number of chunks exceeds the number of
filenames representable with the specified fixed width:
$ printf "%0.sa" {1..11} | split -d -b 1 -a 1
split: output file suffixes exhausted
Second, this corrects the behavior of the default behavior when `-a`
is not specified on the command line. Previously, it was always
settings the filenames to have length 2 suffixes. This commit corrects
the behavior to follow the algorithm implied by GNU split, where the
filename lengths grow dynamically by two characters once the number of
chunks grows sufficiently large:
$ printf "%0.sa" {1..91} | ./target/debug/coreutils split -d -b 1 \
> && ls x* | tail
x81
x82
x83
x84
x85
x86
x87
x88
x89
x9000
* print error in the correct order by flushing the stdout buffer before printing an error
* print correct GNU error codes
* correct formatting for config.inode, and for dangling links
* correct padding for Format::Long
* remove colors after the -> link symbol as this doesn't match GNU
* correct the major, minor #s for char devices, and correct padding
* improve speed for all metadata intensive ops by not allocating metadata unless in a Sort mode
* new tests, have struggled with how to deal with stderr, stdout ordering in a test though
* tried to implement UIoError, but am still having issues matching the formatting of GNU
Co-authored-by: electricboogie <32370782+electricboogie@users.noreply.github.com>
Fix a bug in which a negative decimal input would not be displayed with
the correct width in the output. Before this commit, the output was
incorrectly
$ seq -w -.1 .1 .11
-0.1
0.0
0.1
After this commit, the output is correctly
$ seq -w -.1 .1 .11
-0.1
00.0
00.1
The code was failing to take into account that the input decimal "-.1"
needs to be displayed with a leading zero, like "-0.1".
Pad infinity and negative infinity values with spaces when using the
`-w` option to `seq`. This corrects the behavior of `seq` to match that
of the GNU version:
$ seq -w 1.000 inf inf | head -n 4
1.000
inf
inf
inf
Previously, it incorrectly padded with 0s instead of spaces.
* seq: use BigDecimal to represent floats
Use `BigDecimal` to represent arbitrary precision floats in order to
prevent numerical precision issues when iterating over a sequence of
numbers. This commit makes several changes at once to accomplish this
goal.
First, it creates a new struct, `PreciseNumber`, that is responsible for
storing not only the number itself but also the number of digits (both
integer and decimal) needed to display it. This information is collected
at the time of parsing the number, which lives in the new
`numberparse.rs` module.
Second, it uses the `BigDecimal` struct to store arbitrary precision
floating point numbers instead of the previous `f64` primitive
type. This protects against issues of numerical precision when
repeatedly accumulating a very small increment.
Third, since neither the `BigDecimal` nor `BigInt` types have a
representation of infinity, minus infinity, minus zero, or NaN, we add
the `ExtendedBigDecimal` and `ExtendedBigInt` enumerations which extend
the basic types with these concepts.
* fixup! seq: use BigDecimal to represent floats
* fixup! seq: use BigDecimal to represent floats
* fixup! seq: use BigDecimal to represent floats
* fixup! seq: use BigDecimal to represent floats
* fixup! seq: use BigDecimal to represent floats