This reimplements version_cmp, which is used in sort and ls to sort
according to versions.
However, it is not bug-for-bug identical with GNU's implementation.
I reported a bug with GNU here:
https://lists.gnu.org/archive/html/bug-coreutils/2021-06/msg00045.html
This implementation does not contain the bugs regarding the handling of
file extensions and null bytes.
We were reporting "no match" when sorting something like "0 ". This is
because we don't distinguish between 0 and invalid lines when sorting.
For debug output we have to get this information back.
Closes#2254. We should only inherit global settings for keys when there
are absolutely no options attached to the key.
The default key (matching the whole line) is implicitly added only if no
keys are supplied.
Improved some error messages by including more context.
* sort: disable support for thousand separators
In order to be compatible with GNU, we have to disable thousands
separators. GNU does not enable them for the C locale, either.
Once we add support for locales we can add this feature back.
* sort: delete unused fixtures
* sort: compare -0 and 0 equal
I must have misunderstood this when implementing, but GNU considers
-0, 0, and invalid numbers to be equal.
* sort: strip blanks before applying the char index
* sort: don't crash when key start is after key end
* sort: add "no match" for months at the first non-whitespace char
We should put the "^ no match for key" indicator at the first
non-whitespace character of a field.
* sort: improve support for e notation
* sort: use maches! macros
Fix a bug in which `head` failed to print headings for `stdin` inputs
when reading from multiple files, and fix another bug in which `head`
failed to print a blank line between the contents of a file and the
heading for the next file when reading multiple files. The output now
matches that of GNU `head`.
When merging files we need to prioritize files that occur earlier in the
command line arguments with -m.
This also makes the extsort merge step (and thus extsort itself) stable again.
Moved argument parsing to clap and added tests to cover using "-" as
stdin, passing in too many file arguments, and updated the "wrap" error
message in the tests.
This adds a --debug flag, which, when activated, will draw lines below
the characters that are actually used for comparisons.
This is not a complete implementation of --debug. It should, quoting the man page
for GNU sort: "annotate the part of the line used to sort, and warn
about questionable usage to stderr". Warning about "questionable usage"
is not part of this patch.
This change required some adjustments to be able to get the range that
is actually used for comparisons. Most notably, general numeric comparisons
were rewritten, fixing some bugs along the lines.
Testing is mostly done by adding fixtures for the expected debug output of
existing tests.
* sort: implement numeric string comparison
This implements -n and -h using a string comparison algorithm instead
of parsing each number to a f64 and comparing those.
This should result in a moderate performance increase and eliminate loss
of precision.
* cache parsed f64 numbers
For general numeric comparisons we have to parse numbers as f64,
as this behavior is explicitly documented by GNU coreutils.
We can however cache the parsed value to speed up comparisons.
* fix leading zeroes for negative numbers
* use more appropriate name for exponent
* improvements to the parse function
* move checks into main loop and fix thousands separator condition
* remove unneeded checks
* rustfmt
* Various fixes and performance improvements
* fix a typo
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
* Fix month parse for months with leading whitespace
* Implement test for months whitespace fix
* Confirm human numeric works as expected with whitespace with a test
* Correct arg help value name for --parallel
* Fix SemVer non version lines/empty line sorting with a test
Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
* cat: Unrevert splice patch
* cat: Add fifo test
* cat: Add tests for error cases
* cat: Add tests for character devices
* wc: Make sure we handle short splice writes
* cat: Fix tests for 1.40.0 compiler
* cat: Run rustfmt on test_cat.rs
* Run 'cargo +1.40.0 update'
* Various fixes and performance improvements
* fix a typo
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
Treat tab chars as advancing to the next tab stop rather than having a fixed
8-column width.
Also treat tab as a whitespace split target only when splitting on word
boundaries.