* Change unchecked unwrapping to unwrap_or_default for argument parsing (resolving #1845)
* Added unit-testing for the collect_str function on invalid utf8 OsStrs
* Added a warning-message for identification purpose to the collect_str method.
* - Add removal of wrongly encoded empty strings to basename
- Add testing of broken encoding to basename
- Changed UCommand to use collect_str in args method to allow for integration testing of that method
- Change UCommand to use unwarp_or_default in arg method to match the behaviour of collect_str
* Trying out a new pattern for convert_str for getting a feeling of how the API feels with more control
* Adding convenience API for compact calls
* Add new API to everywhere, fix test for basename
* Added unit-testing for the conversion options
* Added unit-testing for the conversion options for windows
* fixed compilation and some merge hiccups
* Remove windows tests in order to make merge request build
* Fix formatting to match rustfmt for the merged file
* Improve documentation of the collect_str method and the unit-tests
* Fix compilation problems with test
Co-authored-by: Christopher Regali <chris.vdop@gmail.com>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
* ls: ignore leading period when sorting by name
ls now behaves like GNU ls with respect to sorting files by ignoring
leading periods when sorting by main.
Added tests to ensure "touch a .a b .b ; ls" returns ".a a .b b"
* Replaced clone/collect calls.
* Use buffered stdout to reduce write sys calls.
This simple change yielded the biggest performace gain.
* Use `for_byte_record_with_terminator` from the `bstr` crate.
This is to minimize the per line copying needed by
`BufReader::read_until`. The `cut_fields` and `cut_fields_delimiter`
functions used `read_until` to iterate over lines. That required copying
each input line to the line buffer. With
`for_byte_record_with_terminator` copying is minimized as it calls our
closure with a reference to BufReader's buffer most of the time. It
needs to copy (internally) only to process any incomplete lines at the
end of the buffer.
* Re-write `Searcher` to use `memchr`.
Switch from the naive implementation to one that uses `memchr`.
* Rewrite `cut_bytes` almost entirely.
This was already well optimized. The performance gain in this case is
not from avoiding copying. In fact, it needed zero copying whereas new
implementation introduces some copying similar to `cut_fields` described
above. But the occassional copying cost is more than offset by the use
of the very fast `memchr` inside `for_byte_record_with_terminator`.
This change also simplifies the code significantly. Removed the `buffer`
module.
This adds a --debug flag, which, when activated, will draw lines below
the characters that are actually used for comparisons.
This is not a complete implementation of --debug. It should, quoting the man page
for GNU sort: "annotate the part of the line used to sort, and warn
about questionable usage to stderr". Warning about "questionable usage"
is not part of this patch.
This change required some adjustments to be able to get the range that
is actually used for comparisons. Most notably, general numeric comparisons
were rewritten, fixing some bugs along the lines.
Testing is mostly done by adding fixtures for the expected debug output of
existing tests.
* ls: Remove allocations by eliminating collect/clones
* ls: Introduce PathData structure
- PathData will hold Path related metadata / strings that are required
frequently in subsequent functions
- All data is precomputed and cached and subsequent functions just
use cached data
* ls: Cache more data related to paths
- Cache filename and sort by filename instead of full path
- Cache uid->usr and gid->grp mappings
https://github.com/uutils/coreutils/pull/2099/files
* ls: Add BENCHMARKING.md
* ls: Document PathData structure
* tests/ls: Add testcase for error paths with width option
* ls: Fix unused import warning
cached will be only used for unix currently as current use of
caching gid/uid mappings is only relevant on unix
* ls: Suggest checking syscall count in BENCHMARKING.md
* ls: Remove mentions of sort in BENCHMARKING.md
* ls: Remove dependency on cached
Implement caching using HashMap and lazy_static
* ls: Fix MSRV error related to map_or
Rust 1.40 did not support map_or for result types
Trailing separators were included at the end of the last token, but they
should not be.
This changes tokenize_with_separator as suggested by @cbjadwani.
GNU sort disallows these combinations, presumably because they are
likely not what the user really wants.
Ignoring characters would cause things to be put together that aren't
together in the input. For example, -dn would cause "0.12" or "0,12" to
be parsed as "12" which is highly unexpected and confusing.
This reduces memory usage by only storing two lines of the input file at
a time. The current implementation first builds a list of all duplicate
lines ('group') and then decides which lines of the group should be
printed.