Fix a bug in the behavior of `split -e -n NUM` when the input file is
empty. Previously, it would panic due to overflow when subtracting 1
from 0. After this change, it will terminate successfully and produce
no output chunks.
Implement distributing lines of a file in a round-robin manner to a
specified number of chunks. For example,
$ (seq 1 10 | split -n r/3) && head -v xa[abc]
==> xaa <==
1
4
7
10
==> xab <==
2
5
8
==> xac <==
3
6
9
Implement the `--line-bytes` option to `split`. In this mode, the
program tries to write as many lines of the input as possible to each
chunk of output without exceeding a specified byte limit. The new
`LineBytesChunkWriter` struct represents this functionality.
Implement `-n l/k/N` option, where the `k`th chunk of the input file
is written to stdout. For example,
$ seq -w 0 99 > f; split -n l/3/10 f
20
21
22
23
24
25
26
27
28
29
Add support for `split -n l/NUM`. Previously, `split` only supported
`-n NUM`, which splits a file into `NUM` chunks by byte. The `-n
l/NUM` strategy splits a file into `NUM` chunks without splitting
lines across chunks.
Make the `Strategy::Number` enumeration value more general by
replacing the number parameter with a `NumberType` enum parameter.
This allows a future commit to update `split` to support the various
sub-strategies for the `-n`. (This commit does not add support for the
other sub-strategies.)
https://github.com/uutils/coreutils/pull/3084 (2a333ab391) had some
missing coverage and was merged before I had a chance to fix it.
This PR adds some coverage / improved error messages that were missing
from that previous PR.
This should correct the usage strings in both the `--help` and user documentation. Previously, sometimes the name of the utils did not show up correctly.
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.
The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.
For example, contrast
$ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
a
cat: xac: No such file or directory
with
$ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
a
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Add a `NumericHexadecimal` member to the `SuffixType` enum so that a
future commit can add support for hexadecimal filename suffixes to the
`split` program.
Refactor the code to use a `SuffixType` enumeration with two members,
`Alphabetic` and `NumericDecimal`, representing the two currently
supported ways of producing filename suffixes. This prepares the code
to more easily support other formats, like numeric hexadecimal.
* include io-blksize parameter
* format changes for including io-blksize
Co-authored-by: DevSabb <devsabb@local>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Add a `NumericHexadecimal` member to the `SuffixType` enum so that a
future commit can add support for hexadecimal filename suffixes to the
`split` program.
Refactor the code to use a `SuffixType` enumeration with two members,
`Alphabetic` and `NumericDecimal`, representing the two currently
supported ways of producing filename suffixes. This prepares the code
to more easily support other formats, like numeric hexadecimal.
Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
Add the `ByteChunkWriter` and `LineChunkWriter` structs and
implementations, but don't use them yet. This structs offer an
alternative approach to writing chunks of output (contrasted with
`ByteSplitter` and `LineSplitter`). The main difference is that
control of which underlying file is being written is inside the writer
instead of outside.
Add some structure to errors that can be created during parsing of
settings from command-line options. This commit creates
`StrategyError` and `SettingsError` enumerations to represent the
various parsing and other errors that can arise when transforming
`ArgMatches` into `Settings`.
Replace the `FilenameFactory` with `FilenameIterator` and calls to
`FilenameFactory::make()` with calls to `FilenameIterator::next()`. We
did not need the fully generality of being able to produce the
filename for an arbitrary chunk index. Instead we need only iterate
over filenames one after another. This allows for a less
mathematically dense algorithm that is easier to understand and
maintain. Furthermore, it can be connected to some familiar concepts
from the representation of numbers as a sequence of digits.
This does not change the behavior of the `split` program, just the
implementation of how filenames are produced.
Co-authored-by: Terts Diepraam <terts.diepraam@gmail.com>