These are the first half of changes needed to pass the dd/bytes.sh tests:
- Add iseek and oseek options (additive with skip and seek options)
- Implement tests for the new flags, matching those from dd/bytes.sh
Implement the `--line-bytes` option to `split`. In this mode, the
program tries to write as many lines of the input as possible to each
chunk of output without exceeding a specified byte limit. The new
`LineBytesChunkWriter` struct represents this functionality.
Implement `-n l/k/N` option, where the `k`th chunk of the input file
is written to stdout. For example,
$ seq -w 0 99 > f; split -n l/3/10 f
20
21
22
23
24
25
26
27
28
29
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.
The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.
For example, contrast
$ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
a
cat: xac: No such file or directory
with
$ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
a
Clean up unit tests in the `dd` crate to make them easier to
manage. This commit does a few things.
* move test cases that test the complete functionality of the `dd`
program from the `dd_unit_tests` module up to the
`tests/by-util/test_dd.rs` module so that they can take advantage of
the testing framework and common testing tools provided by uutils,
* move test cases that test internal functions of the `dd`
implementation into the `tests` module within `dd.rs` so that they
live closer to the code they are testing,
* replace test cases defined by macros with test cases defined by
plain old functions to make the test cases easier to read at a
glance.
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
Correct the `test_split::test_suffixes_exhausted` test case so that it
actually exercises the intended behavior of `split`. Previously, the
test fixture contained 26 bytes. After this commit, the test fixture
contains 27 bytes. When using a suffix width of one, only 26 filenames
should be available when naming chunk files---one for each lowercase
ASCII letter. This commit ensures that the filenames will be exhausted
as intended by the test.
When this option is present, the files argument is not processed. This option processes the file list from provided file, splitting them by the ascii NUL (\0) character. When files0-from is '-', the file list is processed from stdin.
This allows for `-t` to take invalid unicode (but still single-byte) values
on unix-like platforms. Other platforms, which as of the time of this commit
do not support `OsStr::as_bytes()`, could possibly be supported in the future,
but would require design decisions as to what that means.
Fix two issues with the filename creation algorithm. First, this
corrects the behavior of the `-a` option. This commit ensures a
failure occurs when the number of chunks exceeds the number of
filenames representable with the specified fixed width:
$ printf "%0.sa" {1..11} | split -d -b 1 -a 1
split: output file suffixes exhausted
Second, this corrects the behavior of the default behavior when `-a`
is not specified on the command line. Previously, it was always
settings the filenames to have length 2 suffixes. This commit corrects
the behavior to follow the algorithm implied by GNU split, where the
filename lengths grow dynamically by two characters once the number of
chunks grows sufficiently large:
$ printf "%0.sa" {1..91} | ./target/debug/coreutils split -d -b 1 \
> && ls x* | tail
x81
x82
x83
x84
x85
x86
x87
x88
x89
x9000
* hashsum: support --check for var. length outputs
Add the ability for `hashsum --check` to work with algorithms with
variable output length. Previously, the program would terminate with an
error due to constructing an invalid regular expression.
* fixup! hashsum: support --check for var. length outputs
* tac: correct behavior of -b option
Correct the behavior of `tac -b` to match that of GNU coreutils
`tac`. Specifically, this changes `tac -b` to assume *leading* line
separators instead of the default *trailing* line separators.
Before this commit, the (incorrect) behavior was
$ printf "/abc/def" | tac -b -s "/"
def/abc/
After this commit, the behavior is
$ printf "/abc/def" | tac -b -s "/"
/def/abc
Fixes#2262.
* fixup! tac: correct behavior of -b option
* fixup! tac: correct behavior of -b option
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
- Adds words to cspell exceptions
- Converts test macros to use Default trait.
- Converts parser to use Default trait.
- Adds Windows-friendly test files for block/unblock when nl is present
in test/spec file.
This reimplements version_cmp, which is used in sort and ls to sort
according to versions.
However, it is not bug-for-bug identical with GNU's implementation.
I reported a bug with GNU here:
https://lists.gnu.org/archive/html/bug-coreutils/2021-06/msg00045.html
This implementation does not contain the bugs regarding the handling of
file extensions and null bytes.