coreutils

mirror of https://github.com/uutils/coreutils synced 2024-12-14 07:12:44 +00:00

Author	SHA1	Message	Date
Jeffrey Finkelstein	7dc96697c9	split: implement round-robin arg to --number Implement distributing lines of a file in a round-robin manner to a specified number of chunks. For example, $ (seq 1 10 \| split -n r/3) && head -v xa[abc] ==> xaa <== 1 4 7 10 ==> xab <== 2 5 8 ==> xac <== 3 6 9	2022-10-22 23:15:55 -04:00
Sylvestre Ledru	969f821830	Merge pull request #4009 from andrewbaptist/fix_eof_linebytes Match GNU semantics for missing EOF	2022-10-17 16:23:20 +02:00
Sylvestre Ledru	6e14dea73b	Fix some clippy warnings Fixed with `cargo clippy --features unix --fix` and manually	2022-10-13 09:07:22 +02:00
Andrew Baptist	4922d34177	Match GNU semantics for missing EOF While the rust coreutils semantics were arguably more correct, they were different than the gnu split semantics when handling a file without a trailing EOF. This patch addresses that difference and allows passing one more GNU test suite.	2022-10-07 17:50:26 -04:00
Andrew Baptist	49e1cc6c71	Add support for starting suffix numbers This commit now allows split to pass split/numeric.sh	2022-10-05 09:52:20 -04:00
Terts Diepraam	9177cb7b24	all: add tests for usage error exit code	2022-09-10 20:59:42 +02:00
Jeffrey Finkelstein	8458bf1387	Clippy fixes in multiple crates	2022-08-23 18:30:43 -04:00
Owen Anderson	9fad6fde35	Fix a bug in split where chunking would be skipped when the chunk size (#3800 ) * Fix a bug in split where chunking would be skipped when the chunk size happened to be an exact divisor of the buffer size used to read the input stream. The issue here was that file was being split byte-wise in chunks of 1G. The input stream was being read in chunks of 8KB, which evenly divides the chunk size. Because the check to allocate the next output chunk was done at the bottom of the loop previously, it would never occur because the current input chunk was fully consumed at that point. By moving the check to the top of the loop (but still late enough that we know we have bytes to write) we resolve this issue. This scenario is unfortunately hard to write a test for, since we don't explicitly control the input chunk size. Fixes https://github.com/uutils/coreutils/issues/3790	2022-08-16 11:02:52 +02:00
Andrew Baptist	f2cfc15a70	split: Don't overwrite files Check that a file exists by calling create_new and changing the interface of instantiate_current_writer to return a Result rather than calling unwrap.	2022-07-21 12:06:13 -04:00
Jeffrey Finkelstein	b79ff6b4fd	split: avoid writing final empty chunk with -C Fix a bug in which a final empty file was written when using `split --line-bytes` mode.	2022-03-20 09:30:58 -04:00
Jeffrey Finkelstein	95f58fbf3c	split: handle no final newline with --line-bytes Fix a panic due to out-of-bounds indexing when using `split --line-bytes` with an input that had no trailing newline.	2022-03-19 23:50:02 -04:00
Jeffrey Finkelstein	0a226524a6	split: elide all chunks when input file is empty Fix a bug in the behavior of `split -e -n NUM` when the input file is empty. Previously, it would panic due to overflow when subtracting 1 from 0. After this change, it will terminate successfully and produce no output chunks.	2022-03-19 14:32:28 -04:00
Sylvestre Ledru	9796e01df6	Revert "split: implement round-robin arg to --number"	2022-03-18 14:45:29 +01:00
Jeffrey Finkelstein	18bfd1ac68	split: implement round-robin arg to --number Implement distributing lines of a file in a round-robin manner to a specified number of chunks. For example, $ (seq 1 10 \| split -n r/3) && head -v xa[abc] ==> xaa <== 1 4 7 10 ==> xab <== 2 5 8 ==> xac <== 3 6 9	2022-03-15 18:22:44 -04:00
Jeffrey Finkelstein	77d92883c7	split: implement --line-bytes option Implement the `--line-bytes` option to `split`. In this mode, the program tries to write as many lines of the input as possible to each chunk of output without exceeding a specified byte limit. The new `LineBytesChunkWriter` struct represents this functionality.	2022-03-10 22:51:49 -05:00
Sylvestre Ledru	f3bd1f3020	Add onehundredlines in the spell ignore	2022-03-05 10:27:51 +01:00
Jeffrey Finkelstein	ee36dea1a9	split: implement outputting kth chunk of file Implement `-n l/k/N` option, where the `k`th chunk of the input file is written to stdout. For example, $ seq -w 0 99 > f; split -n l/3/10 f 20 21 22 23 24 25 26 27 28 29	2022-03-05 10:27:51 +01:00
Sylvestre Ledru	346cfa060b	Merge pull request #2980 from jfinkels/split-lines-2 split: add support for "-n l/NUM" option to split	2022-03-01 10:13:44 +01:00
Jeffrey Finkelstein	dbbee573ab	split: add support for "-n l/NUM" option to split Add support for `split -n l/NUM`. Previously, `split` only supported `-n NUM`, which splits a file into `NUM` chunks by byte. The `-n l/NUM` strategy splits a file into `NUM` chunks without splitting lines across chunks.	2022-02-22 18:44:08 -05:00
Omer Tuchfeld	0ce22f3a08	Improve coverage / error messages from `parse_size` PR https://github.com/uutils/coreutils/pull/3084 (`2a333ab391`) had some missing coverage and was merged before I had a chance to fix it. This PR adds some coverage / improved error messages that were missing from that previous PR.	2022-02-22 22:09:45 +01:00
Omer Tuchfeld	fa60898354	Adjust 32-bit tests for tail,split,truncate,head	2022-02-22 13:49:20 +01:00
Jeffrey Finkelstein	6718d97f97	split: add support for -e argument Add the `-e` flag, which indicates whether to elide (that is, remove) empty files that would have been created by the `-n` option. The `-n` command-line argument gives a specific number of chunks into which the input files will be split. If the number of chunks is greater than the number of bytes, then empty files will be created for the excess chunks. But if `-e` is given, then empty files will not be created. For example, contrast $ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac a cat: xac: No such file or directory with $ printf 'a\n' > f && split -n 3 f && cat xaa xab xac a	2022-02-17 19:03:51 -05:00
Terts Diepraam	e1a611374a	Merge pull request #2981 from jfinkels/split-hex-numbers split: add support for -x option (hex suffixes)	2022-02-17 23:20:58 +01:00
DevSabb	63fa3c81ed	fix failure in test_split	2022-02-14 20:41:58 -05:00
DevSabb	6d6371741a	include io-blksize parameter (#3064 ) * include io-blksize parameter * format changes for including io-blksize Co-authored-by: DevSabb <devsabb@local> Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>	2022-02-14 19:47:18 +01:00
Jeffrey Finkelstein	a4955b4e06	split: add support for -x option (hex suffixes) Add support for the `-x` command-line option to `split`. This option causes `split` to produce filenames with hexadecimal suffixes instead of the default alphabetic suffixes.	2022-02-13 11:18:37 -05:00
Sylvestre Ledru	6b6d5ee7db	Merge pull request #2827 from jfinkels/split-std-io-copy split: use std::io::copy() with new writer implementation to improve maintainability and speed	2022-02-12 11:33:12 +01:00
Jeffrey Finkelstein	2f65b29866	split: error when --additional-suffix contains / Make `split` terminate with a usage error when the `--additional-suffix` argument contains a directory separator character.	2022-02-10 19:33:33 -05:00
Jeffrey Finkelstein	1d7e1b8732	split: use ByteChunkWriter and LineChunkWriter Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and `LineChunkWriter` respectively. This results in a more maintainable design and an increase in the speed of splitting by lines.	2022-02-08 22:57:57 -05:00
Jeffrey Finkelstein	ca7af808d5	tests: correct a test case for split Correct the `test_split::test_suffixes_exhausted` test case so that it actually exercises the intended behavior of `split`. Previously, the test fixture contained 26 bytes. After this commit, the test fixture contains 27 bytes. When using a suffix width of one, only 26 filenames should be available when naming chunk files---one for each lowercase ASCII letter. This commit ensures that the filenames will be exhausted as intended by the test.	2022-02-08 22:53:57 -05:00
Jeffrey Finkelstein	e5361a8c11	split: correct error message on invalid arg. to -a Correct the error message displayed on an invalid parameter to the `--suffix-length` or `-a` command-line option.	2022-02-06 20:09:29 -05:00
Daniel Eades	ba45fe312a	use 'Self' and derive 'Default' where possible	2022-01-30 15:08:26 +01:00
Jeffrey Finkelstein	b636ff04a0	split: implement -n option Implement the `-n` command-line option to `split`, which splits a file into a specified number of chunks by byte.	2022-01-27 21:16:27 -05:00
Greg Guthe	771c9f5d9c	tests: update random_chars generator to map u8 to char Fix 'value of type `char` cannot be built from `std::iter::Iterator<Item=u8>`' for split test. refs: https://docs.rs/rand/0.8.4/rand/distributions/struct.Alphanumeric.html#example	2022-01-24 20:40:31 -05:00
Jeffrey Finkelstein	7af3007204	split: add --verbose option	2022-01-16 09:34:28 -05:00
Jeffrey Finkelstein	cfe5a0d82c	split: correct filename creation algorithm Fix two issues with the filename creation algorithm. First, this corrects the behavior of the `-a` option. This commit ensures a failure occurs when the number of chunks exceeds the number of filenames representable with the specified fixed width: $ printf "%0.sa" {1..11} \| split -d -b 1 -a 1 split: output file suffixes exhausted Second, this corrects the behavior of the default behavior when `-a` is not specified on the command line. Previously, it was always settings the filenames to have length 2 suffixes. This commit corrects the behavior to follow the algorithm implied by GNU split, where the filename lengths grow dynamically by two characters once the number of chunks grows sufficiently large: $ printf "%0.sa" {1..91} \| ./target/debug/coreutils split -d -b 1 \ > && ls x* \| tail x81 x82 x83 x84 x85 x86 x87 x88 x89 x9000	2022-01-10 20:43:22 -05:00
Jan Scheer	c0be979611	fix some issues with locale (replace "LANGUAGE" with "LC_ALL") `LANGUAGE=C` is not enough, `LC_ALL=C` is needed as the environment variable that overrides all the other localization settings. e.g. ```bash $ LANGUAGE=C id foobar id: ‘foobar’: no such user $ LC_ALL=C id foobar id: 'foobar': no such user ``` * replace `LANGUAGE` with `LC_ALL` as environment variable in the tests * fix the the date string of affected uutils * replace `‘` and `’` with `'`	2021-06-23 11:30:28 +02:00
Jan Scheer	f8e96150f8	fix clippy warnings and spelling * add some missing LICENSE headers	2021-06-04 15:39:34 +02:00
Jan Scheer	130bf49e5d	Merge branch 'master' of github.com:uutils/coreutils into refactoring_parse_size	2021-06-03 22:32:34 +02:00
Jan Scheer	2f5f7c6fa1	split: use "parse_size" from uucore * make stderr of parsing SIZE/NUMBER argument consistent with GNU's behavior * add error handling * add tests	2021-06-02 21:32:41 +02:00
Roy Ivy III	4e20dedf58	tests ~ refactor/polish spelling (comments, names, and exceptions)	2021-05-31 08:23:57 -05:00
Jan Scheer	3aeccfd802	fix a lot of clippy warnings	2021-05-29 15:11:22 +02:00
Samuel Ainsworth	b8a3a8995f	Fix test_split_bytes_prime_part_size	2021-05-08 14:25:21 +02:00
Samuel Ainsworth	7c1395366e	Fix split's handling of non-UTF-8 files	2021-05-08 14:25:21 +02:00
Felipe Lema	35a7f01d15	Refactor(split) - migrate from getopts to clap (#1712 )	2021-02-11 20:45:23 +01:00
Felipe Lema	88911be6e0	`--filter` argument for `split` (#1681 )	2021-01-18 14:42:44 +01:00
Jens Humrich	bfca334ec1	style issues	2020-09-17 12:40:48 +02:00
Jens Humrich	5a75905476	Add additional-suffix option to split	2020-09-16 17:59:39 +02:00
Roy Ivy III	de0375f909	tests ~ reorganize tests	2020-06-01 18:30:04 -05:00

49 commits