Commit graph

63 commits

Author SHA1 Message Date
zhitkoff
6f37b4b4cf split: hyphenated values + tests 2023-08-30 19:29:57 -04:00
zhitkoff
7f905a3b8d split: edge case for obs lines within combined shorts + test 2023-08-29 16:35:00 -04:00
zhitkoff
15c7170d20 split: fix for GNU Tests regression + tests 2023-08-29 15:49:47 -04:00
zhitkoff
fa8d18b826 split: refactor obsolete lines 2023-08-28 18:24:21 -04:00
Yury Zhytkou
1eae064e5c
split: better handle numeric and hex suffixes, short and long, with and without values (#5198)
* split: better handle numeric and hex suffixes, short and long, with and without values
Fixes #5171

* refactoring with overrides_with_all() in args definitions

* fixed comments

* updated help on suffixes to match GNU

* comments

* refactor to remove value_parser()

* split: refactor suffix processing + updated tests

* split: minor formatting
2023-08-28 10:09:52 +02:00
zhitkoff
84d96f9d02 split: refactor for more common use case 2023-08-26 11:11:46 -04:00
Terts Diepraam
c3f9e19a3b all: normalize license notice in all *.rs files 2023-08-24 12:21:09 +02:00
John Shin
631b9892f6 split: loop over chars and remove char_from_digit function 2023-08-11 18:36:08 -07:00
Sylvestre Ledru
d033db3573 split: reject some invalid values
Matches what is done in tests/split/fail.sh
(still doesn't work)
2023-07-03 22:57:37 +02:00
Daniel Hofstetter
7a888da409 tests: remove all "extern crate" statements 2023-04-10 08:31:31 +02:00
Daniel Hofstetter
6988eb7ec6 tests: expand wildcard imports 2023-03-20 15:32:35 +01:00
Daniel Hofstetter
f6b646e4e5 clippy: fix warnings introduced with Rust 1.67.0 2023-01-27 17:37:56 +01:00
Joining7943
1fadeb43b2 tests/util: Do not trim stderr in CmdResult::stderr_is. Add method stderr_trimmed_is.
Fix tests assert whitespace instead of trimming it. Disable some tests in `test_tr` because `tr`
produces too many newlines.
2023-01-22 14:56:19 +01:00
Niyaz Nigmatullin
fdd6a05259 chore: run cargo +nightly clippy --fix 2022-11-16 11:09:44 +02:00
Jeffrey Finkelstein
7dc96697c9 split: implement round-robin arg to --number
Implement distributing lines of a file in a round-robin manner to a
specified number of chunks. For example,

    $ (seq 1 10 | split -n r/3) && head -v xa[abc]
    ==> xaa <==
    1
    4
    7
    10

    ==> xab <==
    2
    5
    8

    ==> xac <==
    3
    6
    9
2022-10-22 23:15:55 -04:00
Sylvestre Ledru
969f821830
Merge pull request #4009 from andrewbaptist/fix_eof_linebytes
Match GNU semantics for missing EOF
2022-10-17 16:23:20 +02:00
Sylvestre Ledru
6e14dea73b Fix some clippy warnings
Fixed with `cargo clippy --features unix  --fix`
and manually
2022-10-13 09:07:22 +02:00
Andrew Baptist
4922d34177 Match GNU semantics for missing EOF
While the rust coreutils semantics were arguably more correct,
they were different than the gnu split semantics when handling a
file without a trailing EOF. This patch addresses that difference
and allows passing one more GNU test suite.
2022-10-07 17:50:26 -04:00
Andrew Baptist
49e1cc6c71 Add support for starting suffix numbers
This commit now allows split to pass split/numeric.sh
2022-10-05 09:52:20 -04:00
Terts Diepraam
9177cb7b24 all: add tests for usage error exit code 2022-09-10 20:59:42 +02:00
Jeffrey Finkelstein
8458bf1387 Clippy fixes in multiple crates 2022-08-23 18:30:43 -04:00
Owen Anderson
9fad6fde35
Fix a bug in split where chunking would be skipped when the chunk size (#3800)
* Fix a bug in split where chunking would be skipped when the chunk size
happened to be an exact divisor of the buffer size used to read the
input stream.

The issue here was that file was being split byte-wise in chunks of 1G.
The input stream was being read in chunks of 8KB, which evenly divides
the chunk size. Because the check to allocate the next output chunk was
done at the bottom of the loop previously, it would never occur because
the current input chunk was fully consumed at that point. By moving the
check to the top of the loop (but still late enough that we know we have
bytes to write) we resolve this issue.

This scenario is unfortunately hard to write a test for, since we don't
explicitly control the input chunk size.

Fixes https://github.com/uutils/coreutils/issues/3790
2022-08-16 11:02:52 +02:00
Andrew Baptist
f2cfc15a70 split: Don't overwrite files
Check that a file exists by calling create_new and changing the
interface of instantiate_current_writer to return a Result rather
than calling unwrap.
2022-07-21 12:06:13 -04:00
Jeffrey Finkelstein
b79ff6b4fd split: avoid writing final empty chunk with -C
Fix a bug in which a final empty file was written when using `split
--line-bytes` mode.
2022-03-20 09:30:58 -04:00
Jeffrey Finkelstein
95f58fbf3c split: handle no final newline with --line-bytes
Fix a panic due to out-of-bounds indexing when using `split
--line-bytes` with an input that had no trailing newline.
2022-03-19 23:50:02 -04:00
Jeffrey Finkelstein
0a226524a6 split: elide all chunks when input file is empty
Fix a bug in the behavior of `split -e -n NUM` when the input file is
empty. Previously, it would panic due to overflow when subtracting 1
from 0. After this change, it will terminate successfully and produce
no output chunks.
2022-03-19 14:32:28 -04:00
Sylvestre Ledru
9796e01df6
Revert "split: implement round-robin arg to --number" 2022-03-18 14:45:29 +01:00
Jeffrey Finkelstein
18bfd1ac68 split: implement round-robin arg to --number
Implement distributing lines of a file in a round-robin manner to a
specified number of chunks. For example,

    $ (seq 1 10 | split -n r/3) && head -v xa[abc]
    ==> xaa <==
    1
    4
    7
    10

    ==> xab <==
    2
    5
    8

    ==> xac <==
    3
    6
    9
2022-03-15 18:22:44 -04:00
Jeffrey Finkelstein
77d92883c7 split: implement --line-bytes option
Implement the `--line-bytes` option to `split`. In this mode, the
program tries to write as many lines of the input as possible to each
chunk of output without exceeding a specified byte limit. The new
`LineBytesChunkWriter` struct represents this functionality.
2022-03-10 22:51:49 -05:00
Sylvestre Ledru
f3bd1f3020 Add onehundredlines in the spell ignore 2022-03-05 10:27:51 +01:00
Jeffrey Finkelstein
ee36dea1a9 split: implement outputting kth chunk of file
Implement `-n l/k/N` option, where the `k`th chunk of the input file
is written to stdout. For example,

    $ seq -w 0 99 > f; split -n l/3/10 f
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
2022-03-05 10:27:51 +01:00
Sylvestre Ledru
346cfa060b
Merge pull request #2980 from jfinkels/split-lines-2
split: add support for "-n l/NUM" option to split
2022-03-01 10:13:44 +01:00
Jeffrey Finkelstein
dbbee573ab split: add support for "-n l/NUM" option to split
Add support for `split -n l/NUM`. Previously, `split` only supported
`-n NUM`, which splits a file into `NUM` chunks by byte. The `-n
l/NUM` strategy splits a file into `NUM` chunks without splitting
lines across chunks.
2022-02-22 18:44:08 -05:00
Omer Tuchfeld
0ce22f3a08 Improve coverage / error messages from parse_size PR
https://github.com/uutils/coreutils/pull/3084 (2a333ab391) had some
missing coverage and was merged before I had a chance to fix it.

This PR adds some coverage / improved error messages that were missing
from that previous PR.
2022-02-22 22:09:45 +01:00
Omer Tuchfeld
fa60898354 Adjust 32-bit tests for tail,split,truncate,head 2022-02-22 13:49:20 +01:00
Jeffrey Finkelstein
6718d97f97 split: add support for -e argument
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.

The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.

For example, contrast

    $ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
    a
    cat: xac: No such file or directory

with

    $ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
    a
2022-02-17 19:03:51 -05:00
Terts Diepraam
e1a611374a
Merge pull request #2981 from jfinkels/split-hex-numbers
split: add support for -x option (hex suffixes)
2022-02-17 23:20:58 +01:00
DevSabb
63fa3c81ed fix failure in test_split 2022-02-14 20:41:58 -05:00
DevSabb
6d6371741a
include io-blksize parameter (#3064)
* include io-blksize parameter

* format changes for including io-blksize

Co-authored-by: DevSabb <devsabb@local>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
2022-02-14 19:47:18 +01:00
Jeffrey Finkelstein
a4955b4e06 split: add support for -x option (hex suffixes)
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
2022-02-13 11:18:37 -05:00
Sylvestre Ledru
6b6d5ee7db
Merge pull request #2827 from jfinkels/split-std-io-copy
split: use std::io::copy() with new writer implementation to improve maintainability and speed
2022-02-12 11:33:12 +01:00
Jeffrey Finkelstein
2f65b29866 split: error when --additional-suffix contains /
Make `split` terminate with a usage error when the
`--additional-suffix` argument contains a directory separator
character.
2022-02-10 19:33:33 -05:00
Jeffrey Finkelstein
1d7e1b8732 split: use ByteChunkWriter and LineChunkWriter
Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
2022-02-08 22:57:57 -05:00
Jeffrey Finkelstein
ca7af808d5 tests: correct a test case for split
Correct the `test_split::test_suffixes_exhausted` test case so that it
actually exercises the intended behavior of `split`. Previously, the
test fixture contained 26 bytes. After this commit, the test fixture
contains 27 bytes. When using a suffix width of one, only 26 filenames
should be available when naming chunk files---one for each lowercase
ASCII letter. This commit ensures that the filenames will be exhausted
as intended by the test.
2022-02-08 22:53:57 -05:00
Jeffrey Finkelstein
e5361a8c11 split: correct error message on invalid arg. to -a
Correct the error message displayed on an invalid parameter to the
`--suffix-length` or `-a` command-line option.
2022-02-06 20:09:29 -05:00
Daniel Eades
ba45fe312a use 'Self' and derive 'Default' where possible 2022-01-30 15:08:26 +01:00
Jeffrey Finkelstein
b636ff04a0 split: implement -n option
Implement the `-n` command-line option to `split`, which splits a file
into a specified number of chunks by byte.
2022-01-27 21:16:27 -05:00
Greg Guthe
771c9f5d9c tests: update random_chars generator to map u8 to char
Fix 'value of type `char` cannot be built from `std::iter::Iterator<Item=u8>`' for split test.

refs: https://docs.rs/rand/0.8.4/rand/distributions/struct.Alphanumeric.html#example
2022-01-24 20:40:31 -05:00
Jeffrey Finkelstein
7af3007204 split: add --verbose option 2022-01-16 09:34:28 -05:00
Jeffrey Finkelstein
cfe5a0d82c split: correct filename creation algorithm
Fix two issues with the filename creation algorithm. First, this
corrects the behavior of the `-a` option. This commit ensures a
failure occurs when the number of chunks exceeds the number of
filenames representable with the specified fixed width:

    $ printf "%0.sa" {1..11} | split -d -b 1 -a 1
    split: output file suffixes exhausted

Second, this corrects the behavior of the default behavior when `-a`
is not specified on the command line. Previously, it was always
settings the filenames to have length 2 suffixes. This commit corrects
the behavior to follow the algorithm implied by GNU split, where the
filename lengths grow dynamically by two characters once the number of
chunks grows sufficiently large:

    $ printf "%0.sa" {1..91} | ./target/debug/coreutils split -d -b 1 \
    >   && ls x* | tail
    x81
    x82
    x83
    x84
    x85
    x86
    x87
    x88
    x89
    x9000
2022-01-10 20:43:22 -05:00