While the rust coreutils semantics were arguably more correct,
they were different than the gnu split semantics when handling a
file without a trailing EOF. This patch addresses that difference
and allows passing one more GNU test suite.
* Fix a bug in split where chunking would be skipped when the chunk size
happened to be an exact divisor of the buffer size used to read the
input stream.
The issue here was that file was being split byte-wise in chunks of 1G.
The input stream was being read in chunks of 8KB, which evenly divides
the chunk size. Because the check to allocate the next output chunk was
done at the bottom of the loop previously, it would never occur because
the current input chunk was fully consumed at that point. By moving the
check to the top of the loop (but still late enough that we know we have
bytes to write) we resolve this issue.
This scenario is unfortunately hard to write a test for, since we don't
explicitly control the input chunk size.
Fixes https://github.com/uutils/coreutils/issues/3790
Fix a bug in the behavior of `split -e -n NUM` when the input file is
empty. Previously, it would panic due to overflow when subtracting 1
from 0. After this change, it will terminate successfully and produce
no output chunks.
Implement distributing lines of a file in a round-robin manner to a
specified number of chunks. For example,
$ (seq 1 10 | split -n r/3) && head -v xa[abc]
==> xaa <==
1
4
7
10
==> xab <==
2
5
8
==> xac <==
3
6
9
Implement the `--line-bytes` option to `split`. In this mode, the
program tries to write as many lines of the input as possible to each
chunk of output without exceeding a specified byte limit. The new
`LineBytesChunkWriter` struct represents this functionality.
Implement `-n l/k/N` option, where the `k`th chunk of the input file
is written to stdout. For example,
$ seq -w 0 99 > f; split -n l/3/10 f
20
21
22
23
24
25
26
27
28
29
Add support for `split -n l/NUM`. Previously, `split` only supported
`-n NUM`, which splits a file into `NUM` chunks by byte. The `-n
l/NUM` strategy splits a file into `NUM` chunks without splitting
lines across chunks.
Make the `Strategy::Number` enumeration value more general by
replacing the number parameter with a `NumberType` enum parameter.
This allows a future commit to update `split` to support the various
sub-strategies for the `-n`. (This commit does not add support for the
other sub-strategies.)
https://github.com/uutils/coreutils/pull/3084 (2a333ab391) had some
missing coverage and was merged before I had a chance to fix it.
This PR adds some coverage / improved error messages that were missing
from that previous PR.
This should correct the usage strings in both the `--help` and user documentation. Previously, sometimes the name of the utils did not show up correctly.
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.
The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.
For example, contrast
$ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
a
cat: xac: No such file or directory
with
$ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
a