* Fix a bug in split where chunking would be skipped when the chunk size
happened to be an exact divisor of the buffer size used to read the
input stream.
The issue here was that file was being split byte-wise in chunks of 1G.
The input stream was being read in chunks of 8KB, which evenly divides
the chunk size. Because the check to allocate the next output chunk was
done at the bottom of the loop previously, it would never occur because
the current input chunk was fully consumed at that point. By moving the
check to the top of the loop (but still late enough that we know we have
bytes to write) we resolve this issue.
This scenario is unfortunately hard to write a test for, since we don't
explicitly control the input chunk size.
Fixes https://github.com/uutils/coreutils/issues/3790
Fix a bug in the behavior of `split -e -n NUM` when the input file is
empty. Previously, it would panic due to overflow when subtracting 1
from 0. After this change, it will terminate successfully and produce
no output chunks.
Implement distributing lines of a file in a round-robin manner to a
specified number of chunks. For example,
$ (seq 1 10 | split -n r/3) && head -v xa[abc]
==> xaa <==
1
4
7
10
==> xab <==
2
5
8
==> xac <==
3
6
9
Implement the `--line-bytes` option to `split`. In this mode, the
program tries to write as many lines of the input as possible to each
chunk of output without exceeding a specified byte limit. The new
`LineBytesChunkWriter` struct represents this functionality.
Implement `-n l/k/N` option, where the `k`th chunk of the input file
is written to stdout. For example,
$ seq -w 0 99 > f; split -n l/3/10 f
20
21
22
23
24
25
26
27
28
29
Add support for `split -n l/NUM`. Previously, `split` only supported
`-n NUM`, which splits a file into `NUM` chunks by byte. The `-n
l/NUM` strategy splits a file into `NUM` chunks without splitting
lines across chunks.
Make the `Strategy::Number` enumeration value more general by
replacing the number parameter with a `NumberType` enum parameter.
This allows a future commit to update `split` to support the various
sub-strategies for the `-n`. (This commit does not add support for the
other sub-strategies.)
https://github.com/uutils/coreutils/pull/3084 (2a333ab391) had some
missing coverage and was merged before I had a chance to fix it.
This PR adds some coverage / improved error messages that were missing
from that previous PR.
This should correct the usage strings in both the `--help` and user documentation. Previously, sometimes the name of the utils did not show up correctly.
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.
The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.
For example, contrast
$ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
a
cat: xac: No such file or directory
with
$ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
a
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Add a `NumericHexadecimal` member to the `SuffixType` enum so that a
future commit can add support for hexadecimal filename suffixes to the
`split` program.
Refactor the code to use a `SuffixType` enumeration with two members,
`Alphabetic` and `NumericDecimal`, representing the two currently
supported ways of producing filename suffixes. This prepares the code
to more easily support other formats, like numeric hexadecimal.
* include io-blksize parameter
* format changes for including io-blksize
Co-authored-by: DevSabb <devsabb@local>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Add a `NumericHexadecimal` member to the `SuffixType` enum so that a
future commit can add support for hexadecimal filename suffixes to the
`split` program.
Refactor the code to use a `SuffixType` enumeration with two members,
`Alphabetic` and `NumericDecimal`, representing the two currently
supported ways of producing filename suffixes. This prepares the code
to more easily support other formats, like numeric hexadecimal.
Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
Add the `ByteChunkWriter` and `LineChunkWriter` structs and
implementations, but don't use them yet. This structs offer an
alternative approach to writing chunks of output (contrasted with
`ByteSplitter` and `LineSplitter`). The main difference is that
control of which underlying file is being written is inside the writer
instead of outside.
Add some structure to errors that can be created during parsing of
settings from command-line options. This commit creates
`StrategyError` and `SettingsError` enumerations to represent the
various parsing and other errors that can arise when transforming
`ArgMatches` into `Settings`.
Replace the `FilenameFactory` with `FilenameIterator` and calls to
`FilenameFactory::make()` with calls to `FilenameIterator::next()`. We
did not need the fully generality of being able to produce the
filename for an arbitrary chunk index. Instead we need only iterate
over filenames one after another. This allows for a less
mathematically dense algorithm that is easier to understand and
maintain. Furthermore, it can be connected to some familiar concepts
from the representation of numbers as a sequence of digits.
This does not change the behavior of the `split` program, just the
implementation of how filenames are produced.
Co-authored-by: Terts Diepraam <terts.diepraam@gmail.com>
- Change the main! proc_macro to a bin! macro_rules macro.
- Reexport uucore_procs from uucore
- Make utils to not import uucore_procs directly
- Remove the `syn` dependency and don't parse proc_macro input (hopefully for faster compile times)
Create a `Settings::from` method that converts a `clap::ArgMatches`
instance into a `Settings` instance. This eliminates the unnecessary
use of a mutable variable when initializing the settings.
Fix two issues with the filename creation algorithm. First, this
corrects the behavior of the `-a` option. This commit ensures a
failure occurs when the number of chunks exceeds the number of
filenames representable with the specified fixed width:
$ printf "%0.sa" {1..11} | split -d -b 1 -a 1
split: output file suffixes exhausted
Second, this corrects the behavior of the default behavior when `-a`
is not specified on the command line. Previously, it was always
settings the filenames to have length 2 suffixes. This commit corrects
the behavior to follow the algorithm implied by GNU split, where the
filename lengths grow dynamically by two characters once the number of
chunks grows sufficiently large:
$ printf "%0.sa" {1..91} | ./target/debug/coreutils split -d -b 1 \
> && ls x* | tail
x81
x82
x83
x84
x85
x86
x87
x88
x89
x9000
Move the parsing of the output chunk size from inside
`ByteSplitter::new()` and `LineSplitter::new()` to outside. This
eliminates duplicate code and reduces the responsibilities of the
`ByteSplitter` and `LineSplitter` implementations.
This makes clap wrap the help text according to the terminal width,
which improves readability for terminal widths < 120 chars,
because clap defaults to a width of 120 chars without this feature.
`LANGUAGE=C` is not enough, `LC_ALL=C` is needed as the environment
variable that overrides all the other localization settings.
e.g.
```bash
$ LANGUAGE=C id foobar
id: ‘foobar’: no such user
$ LC_ALL=C id foobar
id: 'foobar': no such user
```
* replace `LANGUAGE` with `LC_ALL` as environment variable in the tests
* fix the the date string of affected uutils
* replace `‘` and `’` with `'`
consistent
* add tests for each flag that takes NUM/SIZE arguments
* fix bug in tail where 'quiet' and 'verbose' flags did not override each other POSIX style