Commit graph

217 commits

Author SHA1 Message Date
chordtoll
b77b3cba55 dd: implement iseek + oseek flags
These are the first half of changes needed to pass the dd/bytes.sh tests:
- Add iseek and oseek options (additive with skip and seek options)
- Implement tests for the new flags, matching those from dd/bytes.sh
2022-03-18 20:45:04 +01:00
Jeffrey Finkelstein
77d92883c7 split: implement --line-bytes option
Implement the `--line-bytes` option to `split`. In this mode, the
program tries to write as many lines of the input as possible to each
chunk of output without exceeding a specified byte limit. The new
`LineBytesChunkWriter` struct represents this functionality.
2022-03-10 22:51:49 -05:00
Jeffrey Finkelstein
ee36dea1a9 split: implement outputting kth chunk of file
Implement `-n l/k/N` option, where the `k`th chunk of the input file
is written to stdout. For example,

    $ seq -w 0 99 > f; split -n l/3/10 f
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
2022-03-05 10:27:51 +01:00
Jeffrey Finkelstein
6718d97f97 split: add support for -e argument
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.

The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.

For example, contrast

    $ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
    a
    cat: xac: No such file or directory

with

    $ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
    a
2022-02-17 19:03:51 -05:00
Terts Diepraam
e1a611374a
Merge pull request #2981 from jfinkels/split-hex-numbers
split: add support for -x option (hex suffixes)
2022-02-17 23:20:58 +01:00
Jeffrey Finkelstein
ba1ce7179b dd: move unit tests into dd.rs and test_dd.rs
Clean up unit tests in the `dd` crate to make them easier to
manage. This commit does a few things.

* move test cases that test the complete functionality of the `dd`
  program from the `dd_unit_tests` module up to the
  `tests/by-util/test_dd.rs` module so that they can take advantage of
  the testing framework and common testing tools provided by uutils,
* move test cases that test internal functions of the `dd`
  implementation into the `tests` module within `dd.rs` so that they
  live closer to the code they are testing,
* replace test cases defined by macros with test cases defined by
  plain old functions to make the test cases easier to read at a
  glance.
2022-02-15 21:50:48 -05:00
Jeffrey Finkelstein
a4955b4e06 split: add support for -x option (hex suffixes)
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
2022-02-13 11:18:37 -05:00
Sylvestre Ledru
f9e04ae5ef
Merge pull request #2966 from allan-silva/wc-files0-from-opt
wc: implement files0-from option
2022-02-12 19:05:05 +01:00
Sylvestre Ledru
6b6d5ee7db
Merge pull request #2827 from jfinkels/split-std-io-copy
split: use std::io::copy() with new writer implementation to improve maintainability and speed
2022-02-12 11:33:12 +01:00
Shreyans Jain
3176ad5c1b
tests/hashsum: Fix missing space in checkfile 2022-02-10 13:55:53 +05:30
Shreyans Jain
30d7a4b167
hashsum: Add BLAKE3 to Hashing Algorithms
Signed-off-by: Shreyans Jain <shreyansthebest2007@gmail.com>
2022-02-10 12:46:44 +05:30
Jeffrey Finkelstein
1d7e1b8732 split: use ByteChunkWriter and LineChunkWriter
Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
2022-02-08 22:57:57 -05:00
Jeffrey Finkelstein
ca7af808d5 tests: correct a test case for split
Correct the `test_split::test_suffixes_exhausted` test case so that it
actually exercises the intended behavior of `split`. Previously, the
test fixture contained 26 bytes. After this commit, the test fixture
contains 27 bytes. When using a suffix width of one, only 26 filenames
should be available when naming chunk files---one for each lowercase
ASCII letter. This commit ensures that the filenames will be exhausted
as intended by the test.
2022-02-08 22:53:57 -05:00
Allan Silva
6a6875012e wc: implement files0-from option
When this option is present, the files argument is not processed. This option processes the file list from provided file, splitting them by the ascii NUL (\0) character. When files0-from is '-', the file list is processed from stdin.
2022-02-04 10:12:08 -03:00
Terts Diepraam
7fc82cd376
Merge pull request #2902 from jtracey/join-non-unicode-sep
join: add support for non-unicode field separators
2022-01-31 21:54:56 +01:00
Terts Diepraam
7477761428
Merge pull request #2882 from jtracey/join-bigfields-compat
join: "support" field numbers larger than usize::MAX
2022-01-31 21:52:13 +01:00
Justin Tracey
58d65fb953 join: add support for non-unicode field separators
This allows for `-t` to take invalid unicode (but still single-byte) values
on unix-like platforms. Other platforms, which as of the time of this commit
do not support `OsStr::as_bytes()`, could possibly be supported in the future,
but would require design decisions as to what that means.
2022-01-30 20:04:22 -05:00
Sylvestre Ledru
7c1abdb7d9
Merge pull request #2866 from jfinkels/split-number-2
split: implement -n option
2022-01-30 09:58:04 +01:00
Sylvestre Ledru
52ab6325a0
Merge pull request #2881 from jtracey/join-null-field-sep
join: add support for `-t '\0'`
2022-01-29 10:55:04 +01:00
Jeffrey Finkelstein
b636ff04a0 split: implement -n option
Implement the `-n` command-line option to `split`, which splits a file
into a specified number of chunks by byte.
2022-01-27 21:16:27 -05:00
Cecylia Bocovich
c8f9ea5b15
tests/join: test default check order behaviour 2022-01-22 17:51:29 -05:00
Justin Tracey
ce3df12eaa join: "support" field numbers larger than usize::MAX
They silently get folded to usize::MAX, which is the official GNU behavior.
2022-01-17 17:49:41 -05:00
Terts Diepraam
08efa1fe5a
Merge branch 'main' into join-null-field-sep 2022-01-17 12:59:52 +01:00
Justin Tracey
109277d405 join: add support for -t '\0' 2022-01-16 18:05:58 -05:00
Justin Tracey
346415e1d2 join: add support for -z option 2022-01-16 17:56:07 -05:00
Sylvestre Ledru
00c11b184f
Merge pull request #2851 from jtracey/join-strless
join: operate on bytes instead of Strings
2022-01-16 16:24:38 +01:00
Jeffrey Finkelstein
cfe5a0d82c split: correct filename creation algorithm
Fix two issues with the filename creation algorithm. First, this
corrects the behavior of the `-a` option. This commit ensures a
failure occurs when the number of chunks exceeds the number of
filenames representable with the specified fixed width:

    $ printf "%0.sa" {1..11} | split -d -b 1 -a 1
    split: output file suffixes exhausted

Second, this corrects the behavior of the default behavior when `-a`
is not specified on the command line. Previously, it was always
settings the filenames to have length 2 suffixes. This commit corrects
the behavior to follow the algorithm implied by GNU split, where the
filename lengths grow dynamically by two characters once the number of
chunks grows sufficiently large:

    $ printf "%0.sa" {1..91} | ./target/debug/coreutils split -d -b 1 \
    >   && ls x* | tail
    x81
    x82
    x83
    x84
    x85
    x86
    x87
    x88
    x89
    x9000
2022-01-10 20:43:22 -05:00
Justin Tracey
4df2f3c148 join: add test for non-Unicode files 2022-01-08 21:28:29 -05:00
Justin Tracey
cdfe64369d join: add test for non-linefeed newline characters 2022-01-08 19:51:16 -05:00
Jan Scheer
94cc966535
tail: change notify backend on macOS from FSEvents to kqueue
On macOS only `kqueue` is suitable for our use case because `FSEvents`
waits until file close to delivers modify events.
2021-10-01 21:33:30 +02:00
Jan Scheer
5615ba9fe1
test_tail: add tests for --follow=name 2021-09-27 23:18:00 +02:00
Jan Verbeek
6f7d740592 wc: Do a chunked read with proper UTF-8 handling
This brings the results mostly in line with GNU wc and solves nasty
behavior with long lines.
2021-08-26 01:38:16 +02:00
jfinkels
bdc0f4b7c3
hashsum: support --check for algorithms with variable output length (#2583)
* hashsum: support --check for var. length outputs

Add the ability for `hashsum --check` to work with algorithms with
variable output length. Previously, the program would terminate with an
error due to constructing an invalid regular expression.

* fixup! hashsum: support --check for var. length outputs
2021-08-23 18:35:19 +02:00
jfinkels
4ef35d4a96
tac: correct behavior of -b option (#2523)
* tac: correct behavior of -b option

Correct the behavior of `tac -b` to match that of GNU coreutils
`tac`. Specifically, this changes `tac -b` to assume *leading* line
separators instead of the default *trailing* line separators.

Before this commit, the (incorrect) behavior was

    $ printf "/abc/def" | tac -b -s "/"
    def/abc/

After this commit, the behavior is

    $ printf "/abc/def" | tac -b -s "/"
    /def/abc

Fixes #2262.

* fixup! tac: correct behavior of -b option

* fixup! tac: correct behavior of -b option

Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
2021-08-22 21:01:17 +02:00
Justin Tracey
1bb0237281 join: add support for full outer joins 2021-08-12 23:52:35 -04:00
Tyler
601c9fc620 Merge branch 'master' of https://github.com/uutils/coreutils into uutils-master-2 2021-08-03 17:33:43 -07:00
Michael Debertol
418f5b7692 sort: handle empty merge inputs 2021-07-31 21:02:20 +02:00
Tyler
076ff32e85 Removes project-specific cspell files. 2021-07-23 14:53:24 -07:00
Tyler
885a875552 Addresses build errors
- Adds words to cspell exceptions
- Converts test macros to use Default trait.
- Converts parser to use Default trait.
- Adds Windows-friendly test files for block/unblock when nl is present
  in test/spec file.
2021-07-22 16:04:35 -07:00
Tyler
88363858d5 Minor changes. 2021-07-12 10:13:47 -07:00
Tyler
2e9e984b3a Adds test with unicode filename
Filenames should be the only spot where unicode can appear in dd's cli.
2021-07-06 18:17:30 -07:00
backwaterred
9c38583c6b
Merge pull request #2 from uutils/master
catchup with uutils main
2021-07-02 11:34:22 -07:00
Tyler
92281585a7 Merge branch 'master' of https://github.com/uutils/coreutils into uutils-master 2021-07-01 14:33:30 -07:00
Tyler
17cfba41cc Implements project testfing from root.
- conv=FLAG testing. (1) WIP conv=nocreat
- iflag & oflag testing.
- conv=CONV ascii,...,ucase,...,block,...sync tests at unit-test-level
  (project root is todo)
2021-06-30 14:47:48 -07:00
Michael Debertol
233a778963 sort/ls: implement version cmp matching GNU spec
This reimplements version_cmp, which is used in sort and ls to sort
according to versions.
However, it is not bug-for-bug identical with GNU's implementation.
I reported a bug with GNU here:
https://lists.gnu.org/archive/html/bug-coreutils/2021-06/msg00045.html
This implementation does not contain the bugs regarding the handling of
file extensions and null bytes.
2021-06-27 15:29:17 +02:00
Michael Debertol
548a895cd6 sort: compatibility of human-numeric sort
Closes #1985.
This makes human-numeric sort follow the same algorithm as GNU's/FreeBSD's sort.
As documented by GNU in https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html,
we first compare by sign, then by si unit and finally by the numeric value.
2021-06-25 18:19:00 +02:00
Syukron Rifail M
bc8415c9db du: add --dereference 2021-06-17 14:06:41 +07:00
Terts Diepraam
8afc923796
Merge pull request #2237 from wfscheper/wfscheper/issue2118
chgrp: replace getopts with clap (#2118)
2021-06-12 11:20:24 +02:00
Walter Scheper
cff75f242a chgrp: replace getopts with clap (#2118) 2021-06-10 16:38:44 -04:00
Michael Debertol
66359a0f56 sort: insert line separators after non-empty files
If files don't end witht a line separator we have to insert one,
otherwise the last line will be combined with the first line of the next
file.
2021-06-06 18:01:08 +02:00
Michael Debertol
7ffc7d073c cp: test that file descriptors are closed 2021-06-02 19:21:16 +02:00
Michael Debertol
06b3092f5f sort: fix debug output for zeros / invalid numbers
We were reporting "no match" when sorting something like "0 ". This is
because we don't distinguish between 0 and invalid lines when sorting.
For debug output we have to get this information back.
2021-06-01 18:18:51 +02:00
Sylvestre Ledru
badf7aacb7
Merge pull request #2300 from tertsdiepraam/pr
Implement `pr` (resurrection of the resurrected PR)
2021-05-31 21:14:57 +02:00
Roy Ivy III
4e20dedf58 tests ~ refactor/polish spelling (comments, names, and exceptions) 2021-05-31 08:23:57 -05:00
Terts Diepraam
7690dc018f Merge branch 'master' into pr 2021-05-31 15:23:06 +02:00
Michael Debertol
dc63133f14
sort: correctly inherit global flags for keys (#2302)
Closes #2254. We should only inherit global settings for keys when there
are absolutely no options attached to the key.

The default key (matching the whole line) is implicitly added only if no
keys are supplied.

Improved some error messages by including more context.
2021-05-29 23:25:56 +02:00
Terts Diepraam
bc1870c0a7 Merge branch 'master' into pr 2021-05-29 19:21:31 +02:00
Michael Debertol
e9656a6c32
sort: make GNU test sort-debug-keys pass (#2269)
* sort: disable support for thousand separators

In order to be compatible with GNU, we have to disable thousands
separators. GNU does not enable them for the C locale, either.

Once we add support for locales we can add this feature back.

* sort: delete unused fixtures

* sort: compare -0 and 0 equal

I must have misunderstood this when implementing, but GNU considers
-0, 0, and invalid numbers to be equal.

* sort: strip blanks before applying the char index

* sort: don't crash when key start is after key end

* sort: add "no match" for months at the first non-whitespace char

We should put the "^ no match for key" indicator at the first
non-whitespace character of a field.

* sort: improve support for e notation

* sort: use maches! macros
2021-05-28 22:38:29 +02:00
Jeffrey Finkelstein
659bf58a4c head: print headings when reading multiple files
Fix a bug in which `head` failed to print headings for `stdin` inputs
when reading from multiple files, and fix another bug in which `head`
failed to print a blank line between the contents of a file and the
heading for the next file when reading multiple files. The output now
matches that of GNU `head`.
2021-05-16 12:03:10 -04:00
Michael Debertol
e0ebf907a4 sort: make merging stable
When merging files we need to prioritize files that occur earlier in the
command line arguments with -m.

This also makes the extsort merge step (and thus extsort itself) stable again.
2021-05-09 11:43:38 +02:00
Jeffrey Finkelstein
0cafe2b70d wc: add tests for edge cases for wc on files 2021-05-03 21:07:32 -04:00
Daniel Rocco
c3912d53ac test: add tests for basic tests & edge cases
Some edge cases covered:

- no args
- operator by itself (!, -a, etc.)
- string, file tests of nothing
- compound negations
2021-05-01 22:40:47 -04:00
Ricardo Iglesias
05b20c32a9 base64: Moved argument parsing to clap.
Moved argument parsing to clap and added tests to cover using "-" as
stdin, passing in too many file arguments, and updated the "wrap" error
message in the tests.
2021-05-01 11:36:46 -07:00
Sylvestre Ledru
a37e3181a2
Merge pull request #2130 from electricboogie/master
sort: implement --buffer-size and --temporary-directory (external sort)
2021-04-28 09:21:14 +02:00
Ricardo Iglesias
d56462a4b3 base32: Fixed style violations. Added tests
Tests now cover using "-" as standard input and reading from a file.
2021-04-26 08:00:55 -07:00
electricboogie
4c395146dd Merge branch 'master' of https://github.com/uutils/coreutils 2021-04-25 10:11:27 -05:00
Michael Debertol
e6f6b109a5 sort: implement --debug
This adds a --debug flag, which, when activated, will draw lines below
the characters that are actually used for comparisons.

This is not a complete implementation of --debug. It should, quoting the man page
for GNU sort: "annotate the part of the line used to sort, and warn
about questionable usage to stderr". Warning about "questionable usage"
is not part of this patch.

This change required some adjustments to be able to get the range that
is actually used for comparisons. Most notably, general numeric comparisons
were rewritten, fixing some bugs along the lines.

Testing is mostly done by adding fixtures for the expected debug output of
existing tests.
2021-04-23 22:36:15 +02:00
electricboogie
25021f31eb Incorporate overhead of Line struct 2021-04-19 21:24:52 -05:00
Michael Debertol
4bbbe3a3f2
sort: implement numeric string comparison (#2070)
* sort: implement numeric string comparison

This implements -n and -h using a string comparison algorithm instead
of parsing each number to a f64 and comparing those.

This should result in a moderate performance increase and eliminate loss
of precision.

* cache parsed f64 numbers

For general numeric comparisons we have to parse numbers as f64,
as this behavior is explicitly documented by GNU coreutils.
We can however cache the parsed value to speed up comparisons.

* fix leading zeroes for negative numbers

* use more appropriate name for exponent

* improvements to the parse function

* move checks into main loop and fix thousands separator condition

* remove unneeded checks

* rustfmt
2021-04-17 13:49:35 +02:00
electricboogie
a76d452f75
Sort: More small fixes (#2065)
* Various fixes and performance improvements

* fix a typo

Co-authored-by: Michael Debertol <michael.debertol@gmail.com>

* Fix month parse for months with leading whitespace

* Implement test for months whitespace fix

* Confirm human numeric works as expected with whitespace with a test

* Correct arg help value name for --parallel

* Fix SemVer non version lines/empty line sorting with a test

Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
2021-04-17 10:06:19 +02:00
Árni Dagur
eb4971e6f4
cat: Unrevert splice patch (#2020)
* cat: Unrevert splice patch

* cat: Add fifo test

* cat: Add tests for error cases

* cat: Add tests for character devices

* wc: Make sure we handle short splice writes

* cat: Fix tests for 1.40.0 compiler

* cat: Run rustfmt on test_cat.rs

* Run 'cargo +1.40.0 update'
2021-04-10 22:19:53 +02:00
Sylvestre Ledru
bf1944271c remove .DS_Store 2021-04-10 21:57:03 +02:00
Michael Debertol
69f4410a8a
sort: dedup using compare_by (#2064)
compare_by is the function used for sorting, we should use it for dedup
as well.
2021-04-10 19:49:10 +02:00
electricboogie
e5113ad00e
Sort: Various fixes and performance improvements (#2057)
* Various fixes and performance improvements

* fix a typo

Co-authored-by: Michael Debertol <michael.debertol@gmail.com>

Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
2021-04-10 11:56:20 +02:00
Sivachandran
ee070028e4
install: implement stripping symbol table (#2047) 2021-04-10 11:53:29 +02:00
Sylvestre Ledru
844e318a67
Merge branch 'master' into pr 2021-04-09 22:02:25 +02:00
electricboogie
8474249e5f
Sort: Implement stable sort, ignore non-printing, month sort dedup, auto parallel sort through rayon, zero terminated sort, check silent (#2008) 2021-04-08 22:07:09 +02:00
Yagiz Degirmenci
c965effe07
fold: move to clap, add tests (#2015) 2021-04-06 22:51:27 +02:00
Sylvestre Ledru
f57eb0fdfa
Merge pull request #1993 from cbjadwani/master
uniq: Implement --group option
2021-04-05 22:33:04 +02:00
Yagiz Degirmenci
cbe07c93c6
cksum: add tests and fixtures (#1923) 2021-04-05 22:21:21 +02:00
Daniel Rocco
e5c61a28be fold: variable width tabs, guard treating tab as whitespace
Treat tab chars as advancing to the next tab stop rather than having a fixed
8-column width.

Also treat tab as a whitespace split target only when splitting on word
boundaries.
2021-04-05 08:55:07 -04:00
Chirag Jadwani
19c6a42de5 uniq: implement group option 2021-04-04 15:22:17 +05:30
Paul Otten
7859bf885f Consistency with GNU version of du when doing du -h on an empty file 2021-04-01 19:42:43 -04:00
Mikadore
8320b1ec5f
Rewrote head (#1911)
See https://github.com/uutils/coreutils/pull/1911
for the details
2021-03-29 13:08:48 +02:00
jaggededgedjustice
88d0bb01c0
Add shuf tests (#1958)
* Add tests for shuf

* Fixup GNU tests for shuf
2021-03-28 17:52:01 +02:00
Max Semenik
62fe68850e pr: Fixes after rebasing
Only the minimum needed to:
* Make everything compile without warnings
* Move files according to the new project structure
* Make tests pass
2021-03-26 17:57:19 +03:00
tilakpatidar
75b35e6002 pr: remove not required tests 2021-03-26 14:11:15 +03:00
tilakpatidar
054c05d5d8 pr: refactor get_lines_for_printing, write_columns, recreate_arguments
pr: extract recreate_arguments

pr: refactor get_line_for_printing

pr: refactor get_lines_for_printing

pr: refactor fetch_indexes generate for write_columns

pr: refactor write_columns

pr: refactor write_columns
2021-03-26 14:11:15 +03:00
tilakpatidar
40e7f3d900 pr: add -J and -S option
pr: add -J option

pr: add -S option
2021-03-26 14:11:15 +03:00
tilakpatidar
a4b723233a pr: add more tests for form feed and include page_width option W 2021-03-26 14:11:15 +03:00
tilakpatidar
3be5dc6923 pr: fix form feed
pr: fix form feed

pr: Rustfmt

pr: add test for ff and -l option
2021-03-26 14:11:15 +03:00
Tilak Patidar
5956894d00 pr: add -m and -o option
pr: Add -o option
2021-03-26 14:11:14 +03:00
Tilak Patidar
dd07aed4d1 pr: add column separator option 2021-03-26 14:11:14 +03:00
Tilak Patidar
69371ce3ce pr: add tests for --column with across option 2021-03-26 14:11:14 +03:00
Tilak Patidar
f497fb9d88 pr: read from stdin 2021-03-26 14:11:14 +03:00
Tilak Patidar
d9084a7399 pr: implement across option and fix tests 2021-03-26 14:11:14 +03:00
Tilak Patidar
f3676573b5 pr: print padded string for each column and handle tab issues
pr: Print fixed padded string for each column

pr: Fix display length vs str length due to tabs
2021-03-26 14:11:14 +03:00
Tilak Patidar
b578bb6563 pr: add test for -t -l -r option
pr: Add test for -l option

pr: Add test for -r suppress error option
2021-03-26 14:11:14 +03:00
Tilak Patidar
b742230dbb pr: fix page ranges
pr: Fix page ranges
2021-03-26 14:11:14 +03:00
Tilak Patidar
88ec02a61c pr: add suport for -n [char][width] and -N
pr: Fix long name for -n

pr: Add -N first line number option

pr: Add -n[char][width] support
2021-03-26 14:11:14 +03:00