Commit graph

35 commits

Author SHA1 Message Date
Jeffrey Finkelstein
77d92883c7 split: implement --line-bytes option
Implement the `--line-bytes` option to `split`. In this mode, the
program tries to write as many lines of the input as possible to each
chunk of output without exceeding a specified byte limit. The new
`LineBytesChunkWriter` struct represents this functionality.
2022-03-10 22:51:49 -05:00
Sylvestre Ledru
f3bd1f3020 Add onehundredlines in the spell ignore 2022-03-05 10:27:51 +01:00
Jeffrey Finkelstein
ee36dea1a9 split: implement outputting kth chunk of file
Implement `-n l/k/N` option, where the `k`th chunk of the input file
is written to stdout. For example,

    $ seq -w 0 99 > f; split -n l/3/10 f
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
2022-03-05 10:27:51 +01:00
Sylvestre Ledru
346cfa060b
Merge pull request #2980 from jfinkels/split-lines-2
split: add support for "-n l/NUM" option to split
2022-03-01 10:13:44 +01:00
Jeffrey Finkelstein
dbbee573ab split: add support for "-n l/NUM" option to split
Add support for `split -n l/NUM`. Previously, `split` only supported
`-n NUM`, which splits a file into `NUM` chunks by byte. The `-n
l/NUM` strategy splits a file into `NUM` chunks without splitting
lines across chunks.
2022-02-22 18:44:08 -05:00
Omer Tuchfeld
0ce22f3a08 Improve coverage / error messages from parse_size PR
https://github.com/uutils/coreutils/pull/3084 (2a333ab391) had some
missing coverage and was merged before I had a chance to fix it.

This PR adds some coverage / improved error messages that were missing
from that previous PR.
2022-02-22 22:09:45 +01:00
Omer Tuchfeld
fa60898354 Adjust 32-bit tests for tail,split,truncate,head 2022-02-22 13:49:20 +01:00
Jeffrey Finkelstein
6718d97f97 split: add support for -e argument
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.

The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.

For example, contrast

    $ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
    a
    cat: xac: No such file or directory

with

    $ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
    a
2022-02-17 19:03:51 -05:00
Terts Diepraam
e1a611374a
Merge pull request #2981 from jfinkels/split-hex-numbers
split: add support for -x option (hex suffixes)
2022-02-17 23:20:58 +01:00
DevSabb
63fa3c81ed fix failure in test_split 2022-02-14 20:41:58 -05:00
DevSabb
6d6371741a
include io-blksize parameter (#3064)
* include io-blksize parameter

* format changes for including io-blksize

Co-authored-by: DevSabb <devsabb@local>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
2022-02-14 19:47:18 +01:00
Jeffrey Finkelstein
a4955b4e06 split: add support for -x option (hex suffixes)
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
2022-02-13 11:18:37 -05:00
Sylvestre Ledru
6b6d5ee7db
Merge pull request #2827 from jfinkels/split-std-io-copy
split: use std::io::copy() with new writer implementation to improve maintainability and speed
2022-02-12 11:33:12 +01:00
Jeffrey Finkelstein
2f65b29866 split: error when --additional-suffix contains /
Make `split` terminate with a usage error when the
`--additional-suffix` argument contains a directory separator
character.
2022-02-10 19:33:33 -05:00
Jeffrey Finkelstein
1d7e1b8732 split: use ByteChunkWriter and LineChunkWriter
Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
2022-02-08 22:57:57 -05:00
Jeffrey Finkelstein
ca7af808d5 tests: correct a test case for split
Correct the `test_split::test_suffixes_exhausted` test case so that it
actually exercises the intended behavior of `split`. Previously, the
test fixture contained 26 bytes. After this commit, the test fixture
contains 27 bytes. When using a suffix width of one, only 26 filenames
should be available when naming chunk files---one for each lowercase
ASCII letter. This commit ensures that the filenames will be exhausted
as intended by the test.
2022-02-08 22:53:57 -05:00
Jeffrey Finkelstein
e5361a8c11 split: correct error message on invalid arg. to -a
Correct the error message displayed on an invalid parameter to the
`--suffix-length` or `-a` command-line option.
2022-02-06 20:09:29 -05:00
Daniel Eades
ba45fe312a use 'Self' and derive 'Default' where possible 2022-01-30 15:08:26 +01:00
Jeffrey Finkelstein
b636ff04a0 split: implement -n option
Implement the `-n` command-line option to `split`, which splits a file
into a specified number of chunks by byte.
2022-01-27 21:16:27 -05:00
Greg Guthe
771c9f5d9c tests: update random_chars generator to map u8 to char
Fix 'value of type `char` cannot be built from `std::iter::Iterator<Item=u8>`' for split test.

refs: https://docs.rs/rand/0.8.4/rand/distributions/struct.Alphanumeric.html#example
2022-01-24 20:40:31 -05:00
Jeffrey Finkelstein
7af3007204 split: add --verbose option 2022-01-16 09:34:28 -05:00
Jeffrey Finkelstein
cfe5a0d82c split: correct filename creation algorithm
Fix two issues with the filename creation algorithm. First, this
corrects the behavior of the `-a` option. This commit ensures a
failure occurs when the number of chunks exceeds the number of
filenames representable with the specified fixed width:

    $ printf "%0.sa" {1..11} | split -d -b 1 -a 1
    split: output file suffixes exhausted

Second, this corrects the behavior of the default behavior when `-a`
is not specified on the command line. Previously, it was always
settings the filenames to have length 2 suffixes. This commit corrects
the behavior to follow the algorithm implied by GNU split, where the
filename lengths grow dynamically by two characters once the number of
chunks grows sufficiently large:

    $ printf "%0.sa" {1..91} | ./target/debug/coreutils split -d -b 1 \
    >   && ls x* | tail
    x81
    x82
    x83
    x84
    x85
    x86
    x87
    x88
    x89
    x9000
2022-01-10 20:43:22 -05:00
Jan Scheer
c0be979611 fix some issues with locale (replace "LANGUAGE" with "LC_ALL")
`LANGUAGE=C` is not enough, `LC_ALL=C` is needed as the environment
variable that overrides all the other localization settings.

e.g.
```bash
$ LANGUAGE=C id foobar
id: ‘foobar’: no such user

$ LC_ALL=C id foobar
id: 'foobar': no such user
```

* replace `LANGUAGE` with `LC_ALL` as environment variable in the tests
* fix the the date string of affected uutils
* replace `‘` and `’` with `'`
2021-06-23 11:30:28 +02:00
Jan Scheer
f8e96150f8 fix clippy warnings and spelling
* add some missing LICENSE headers
2021-06-04 15:39:34 +02:00
Jan Scheer
130bf49e5d Merge branch 'master' of github.com:uutils/coreutils into refactoring_parse_size 2021-06-03 22:32:34 +02:00
Jan Scheer
2f5f7c6fa1 split: use "parse_size" from uucore
* make stderr of parsing SIZE/NUMBER argument consistent with GNU's behavior
* add error handling
* add tests
2021-06-02 21:32:41 +02:00
Roy Ivy III
4e20dedf58 tests ~ refactor/polish spelling (comments, names, and exceptions) 2021-05-31 08:23:57 -05:00
Jan Scheer
3aeccfd802 fix a lot of clippy warnings 2021-05-29 15:11:22 +02:00
Samuel Ainsworth
b8a3a8995f Fix test_split_bytes_prime_part_size 2021-05-08 14:25:21 +02:00
Samuel Ainsworth
7c1395366e Fix split's handling of non-UTF-8 files 2021-05-08 14:25:21 +02:00
Felipe Lema
35a7f01d15
Refactor(split) - migrate from getopts to clap (#1712) 2021-02-11 20:45:23 +01:00
Felipe Lema
88911be6e0
--filter argument for split (#1681) 2021-01-18 14:42:44 +01:00
Jens Humrich
bfca334ec1 style issues 2020-09-17 12:40:48 +02:00
Jens Humrich
5a75905476 Add additional-suffix option to split 2020-09-16 17:59:39 +02:00
Roy Ivy III
de0375f909 tests ~ reorganize tests 2020-06-01 18:30:04 -05:00
Renamed from tests/test_split.rs (Browse further)