In such cases we have to create a temporary copy of the input file to prevent
overwriting the input with the output. This only affects merge sort, because it
is the only mode where we start writing to the output before having read all inputs.
Since lines that compare equal should be sorted together, we need to first
compare the lines (taking settings into account). Only if they do not compare
equal we should compare the hashes.
Use new macro to construct UIoError types from `std::io::Error` before
printing. This gives consistent error messages across all utilities as it
prepends custom errors to the error description for the respective application
and error type. This saves the developers from manually appending the
`std::io::Error`-specific error messages.
Drop the previous flags that would tell whether a noncritical error occured
during execution in favor of the `show!` macro from the error submodule.
This allows us to generate regular error types during execution to signify
failures inside the program, but without prematurely aborting program execution
if not needed or specified.
Also make verbose outputs use `print!` and friends instead of `show_error!` to
ensure verbose output is redirected to stdout, not stderr.
Add support for
* space-separated list of tab stops, like `--tabs="2 4 6"`,
* mixed comma- and space-separated lists, like `--tabs="2,4 6"`,
* the slash specifier in the last tab stop, like `--tabs=1,/5`,
* the plus specifier in the last tab stop, like `--tabs=1,+5`.
For whatever reason, the following is equivalent,
cargo run -- rm --presume-input-tty
cargo run -- rm ---presume-input-tty
cargo run -- rm -----presume-input-tty
cargo run -- rm ---------presume-input-tty
Signed-off-by: Hanif Bin Ariffin <hanif.ariffin.4326@gmail.com>
The custom usage string does not have to include the "sort\nUsage:" part,
because this part is already printed by clap.
It prevents the following duplication:
USAGE:
sort
Usage:
sort [OPTION]... [FILE]..
Now, only the following is printed:
USAGE:
sort [OPTION]... [FILE]...
- Adds more jargon to the spellchecker jargon file.
- Adds more ignores to the per-project ignores.
- Fixes clippy warnigns about uneeded clones on non-linux.
- Block/Unblock now account for system dependend newlines.
Change the behavior of `tac` when there are no line separators in the
input. Previously, it was appending an extra line separator; this commit
prevents that from happening. The input is now writted directly to
stdout.
Correct the error message produced by `tac` when trying to read from a
directory. Previously if the path 'a' referred to a directory, then
running `tac a` would produce the error message
dir: read error: Invalid argument
after this commit it produces
a: read error: Invalid argument
which matches GNU `tac`.
Handle additional edge cases arising from test(1)’s lack of reserved words.
For example, left parenthesis followed by an operator could indicate
either
* string comparison of a literal left parenthesis, e.g. `( = foo`
* parenthesized expression using an operator as a literal, e.g. `( = != foo )`
- Adds words to cspell exceptions
- Converts test macros to use Default trait.
- Converts parser to use Default trait.
- Adds Windows-friendly test files for block/unblock when nl is present
in test/spec file.
- Removes dd from feat_require_unix (keeps it in feat_common_core)
- Changes name of make_linux_iflags parameter from oflags to iflags.
- Removes roughed out SIGINFO impl.
- Renames plen -> target_len.
- Removes internal fn def for build_blocks and replaces with a call to
chunks from std.
- Renames apply_ct to the more descriptive apply_conversion
- Replaces manual swap steps in perform_swab with call to swap from std.
- Replaces manual solution with chunks where appropriate (in Read, and Write impl).
- Renames xfer -> transfer (in local variable names).
- Improves documentation for dd_filout/dd_stdout issue.
- Removes commented debug_assert statements.
- Modifies ProdUpdate to contain ReadStat and WriteStat rather than
copying their fields.
- Addresses verbose return in `Output<File>::new(...)`
- Resoves compiler warning when built as release when signal handler fails to
register.
- Derives _Default_ trait on ReadStat.
- Adds comments for truncated lines in block unblock tests
- Removes `as u8` in block unblock tests.
- Removes unecessary `#[inline]`
- Delegates multiplier string parsing to uucore::parse_size.
- Renames 'unfailed' -> 'succeeded' for clairity.
- Removes #dead_code warnings. No clippy warnings on my local machine.
- Reworks signal handler to better accomodate platform-specific signals.
- Removes explicit references to "if" and "of" in dd.rs.
- Removes explicit references to "bs", "ibs", "cbs" and "status" in
parseargs.rs.
- Removes `#[allow(deadcode)]` for OFlags, and IFlags.
- Removes spellchecker ignore from all dd files.
- Adds tests for 'traditional' and 'modern' CLI.
- Address Rust fmt issue
- ignores run-without-noatime test which fails on some build machines
- adds cspell:disable to all project files.
- adds .../dd/fixtures/cspell.json to ignore test fixtures.
- Adds cspell.json. Hopefully this will make you happy, spellchecker.
- Removes non-functional spellchecker-ignore tags
- Adds a sleep call to the no noatime test. Some systems were did not notice a changed
atime without the option present.
- Adds a test with a unicode filename.
- Addresses clippy lints and rustfmt issues.
- adds project header to multiple files
- updates spell check skip words
- removes linux only flags direct,noatime from mac_os build
- applies rustfmt to test_dd
- runs rustfmt on test_dd.rs
- eliminates compiler warnings
- adds many words to spellchecker ignore list
- adds sanity test for vexing conv=nocreat issue. Still WIP.
The `uname` `-o` switch reports the operating system used. If the GNU C
standard library (glibc) is not in use, for example if musl is being
used instead, report "Linux" instead of "GNU/Linux".
Fails with:
```
error: use of irregular braces for `write!` macro
--> src/uucore/src/lib/features/encoding.rs:19:17
|
19 | #[derive(Debug, Error)]
| ^^^^^
|
= note: `-D clippy::nonstandard-macro-braces` implied by `-D warnings`
help: consider writing `Error`
--> src/uucore/src/lib/features/encoding.rs:19:17
|
19 | #[derive(Debug, Error)]
| ^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#nonstandard_macro_braces
= note: this error originates in the derive macro `Error` (in Nightly builds, run with -Z macro-backtrace for more info)
```
This makes clap wrap the help text according to the terminal width,
which improves readability for terminal widths < 120 chars,
because clap defaults to a width of 120 chars without this feature.
This reimplements version_cmp, which is used in sort and ls to sort
according to versions.
However, it is not bug-for-bug identical with GNU's implementation.
I reported a bug with GNU here:
https://lists.gnu.org/archive/html/bug-coreutils/2021-06/msg00045.html
This implementation does not contain the bugs regarding the handling of
file extensions and null bytes.
Adds the ability to perform file backups before installing newer files on top
of existing ones. Adds a status message about backups to stdout if running in
verbose mode.
Signed-off-by: Hanif Bin Ariffin <hanif.ariffin.4326@gmail.com>
Keep one of the texts in-place
Signed-off-by: Hanif Bin Ariffin <hanif.ariffin.4326@gmail.com>
Reduced the fix to just formatting changes
Signed-off-by: Hanif Bin Ariffin <hanif.ariffin.4326@gmail.com>
The '--backup' option would previously accept arguments separated from the
option either by a space or an equals sign. The GNU implementation strictly
requires an "equals" for argument separation.
As the argument to '--backup' is optional, the equals sign mustn't be ommited
as otherwise there is no way to tell a file argument apart from an argument
that's meant for the '--backup' option. This ensures that if '--backup' is
present it either has no further associated arguments (i.e. fallback to the
default), or the arguments are separated by an equals sign.
Gerenal numeric sort works by comparing pre-parsed floating point
numbers. That means that we don't have to store the &str the float was
parsed from.
As a result, memory usage was slightly reduced for general numeric sort.
`LANGUAGE=C` is not enough, `LC_ALL=C` is needed as the environment
variable that overrides all the other localization settings.
e.g.
```bash
$ LANGUAGE=C id foobar
id: ‘foobar’: no such user
$ LC_ALL=C id foobar
id: 'foobar': no such user
```
* replace `LANGUAGE` with `LC_ALL` as environment variable in the tests
* fix the the date string of affected uutils
* replace `‘` and `’` with `'`
Data that was previously boxed inside the `Line` struct was moved to
separate vectors. Inside of each `Line` remains only an index that
allows to access that data.
This helps with keeping the `Line` struct small and therefore reduces
memory usage in most cases.
Additionally, this improves performance because one big allocation (the
vectors) are faster than many small ones (many boxes inside of each
`Line`). Those vectors can be reused as well, reducing the amount of
(de-)allocations.
- Windows hidden file attribute determination would assume symbolic link
to be valid and would panic
- Check symbolic link's attributes if the link points to non-existing
file
- For dangling symlinks, errors should only be reported if
dereferencing options were passed and dereferencing was applicable to
the particular symlink
- With -i parameter, report '?' as the inode number for dangling
symlinks
When invoked via '[' name, last argument must be ']' or we bail out with
syntax error. Then the trailing ']' is simply disregarded and processing
happens like usual.
- The dd info page mentions a special fast-read framework if no conv=FLAG is
specified (see bs=N) which I left space for. As it turns out, this is performed already
so it does not need to be implemented.
- When we have finished reading from a temproary file, we can immediately
delete it.
- Use one single directory for all temporary files.
- Only create the temporary directory when needed.
- Also compress temporary files created by the merge step if requested.
1. Its better to bump u16 to usize than the other way round.
2. Highly unlikely to have a terminal with usize rows...makes making sense of the code easier.
Signed-off-by: Hanif Bin Ariffin <hanif.ariffin.4326@gmail.com>
- Adds print fn's
- Modifies internal fn's as needed to track read/write state
- Modifies status update thread to respect status level
- Adds signal handler for SIGUSR1 (print xfer stats)
* --check=silent and --check=quiet, which are equivalent to -C.
* --check=diagnose-first, which is the same as --check
We also allow -c=<value>, which confuses GNU sort.
`timeout` used to set the timeout to 0 when -k was not set. This
collided with the behavior of 0 timeouts, which disable the timeout.
When -k is not set the process should not be killed.
To prevent clap from parsing flags for the command to run as flags for
timeout, remove the "args" positional argument, but allow to pass flags
via the "command" positional arg.
As discussed here: https://github.com/uutils/coreutils/pull/2361
the group IDs returned for GNU's 'group' and GNU's 'id --groups'
starts with the effective group ID.
This implements a wrapper for `entris::get_groups()` which mimics
GNU's behaviour.
* add tests for `id`
* add tests for `groups`
* fix `id --groups --real` to no longer ignore `--real`
- Splits read fn into conv=sync and standard (consecutive)
versions.
- Fixes bug in previous read/fill where short reads would copy to wrong
position in output buffer.
- Fixes bug in unit tests. Empty source would pass (since no bytes
failed to match).
- Use `unicode_segmentation` and `unicode_width` to determine proper `break_line` position.
- Keep track of total_width as suggested by @tertsdiepraam.
- Add unittest for ZWJ unicode case
Related to #2319.
consistent
* add tests for each flag that takes NUM/SIZE arguments
* fix bug in tail where 'quiet' and 'verbose' flags did not override each other POSIX style
dst may or may not exist. In case it exists it might already be a symlink.
In neither case we should try to canonicalize dst, only its parent directory.
https://www.gnu.org/software/coreutils/manual/html_node/ln-invocation.html
> Relative symbolic links are generated based on their canonicalized
> **containing directory**, and canonicalized targets.
Port argument parsing from getopts to clap.
The only difference I have observed is that clap auto-generates -h and
-V short options for help and version, and there is no way (in clap 2.x)
to disable them.
Instead of using into_raw_fd(), which transfers ownership and
requires us to close the file descriptor manually,
use as_raw_fd(), which does not transfer ownership to us but drops the
file descriptor when the original file is dropped (in our case at the
end of the function).
We were reporting "no match" when sorting something like "0 ". This is
because we don't distinguish between 0 and invalid lines when sorting.
For debug output we have to get this information back.
GNU seq does not support -t, but always outputs a newline at the end.
Therefore, our default for -t should be \n.
Also removes support for escape sequences (interpreting a literal "\n"
as a newline). This is not what GNU seq is doing, and unexpected.
If we notice that we can represent all arguments as BigInts, take a
different code path. Just like GNU seq this means we can print an
infinite amount of numbers in this case.
When a single directory is passed to ls in recursive mode, uutils ls
won't print the directory name
======================
GNU ls:
z:
======================
======================
uutils ls:
======================
This commit fixes this minor inconsistency and adds corresponding test.
Closes#2254. We should only inherit global settings for keys when there
are absolutely no options attached to the key.
The default key (matching the whole line) is implicitly added only if no
keys are supplied.
Improved some error messages by including more context.
* expr: support arbitrary precision integers
Instead of i64s we now use BigInts for integer operations. This means
that no result or input can be out of range.
The representation of integer flags was changed from i64 to u8 to make
their intention clearer.
* expr: allow big numbers as arguments as well
Also adds some tests
* expr: use num-traits to check bigints for 0 and 1
* expr: remove obsolete refs
match ergonomics made these avoidable.
* formatting
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
Reorganize the code in `truncate.rs` into three distinct functions
representing the three modes of operation of the `truncate` program. The
three modes are
- `truncate -r RFILE FILE`, which sets the length of `FILE` to match the
length of `RFILE`,
- `truncate -r RFILE -s NUM FILE`, which sets the length of `FILE`
relative to the given `RFILE`,
- `truncate -s NUM FILE`, which sets the length of `FILE` either
absolutely or relative to its curent length.
This organization of the code makes it more concise and easier to
follow.
Create a method that computes the final target size in bytes for the
file to truncate, given the reference file size and the parameter to the
`TruncateMode`.
Add a helper function to contain the code for parsing the size and the
modifier symbol, if any. This commit also changes the `TruncateMode`
enum so that the parameter for each "mode" is stored along with the
enumeration value. This is because the parameter has a different meaning
in each mode.
Remove "read" permissions from the `OpenOptions` when opening a new file
just to truncate it. We will never read from the file, only write to
it. (Specifically, we will only call `File::set_len()`.)
* sort: crash when failing to open an input file
Instead of ignoring files we fail to open, crash.
The error message does not exactly match gnu, but that would require
more effort.
* use split_whitespace instead of a manual implementation
* fix expected error on windows
* sort: update expected error message
* sort: disable support for thousand separators
In order to be compatible with GNU, we have to disable thousands
separators. GNU does not enable them for the C locale, either.
Once we add support for locales we can add this feature back.
* sort: delete unused fixtures
* sort: compare -0 and 0 equal
I must have misunderstood this when implementing, but GNU considers
-0, 0, and invalid numbers to be equal.
* sort: strip blanks before applying the char index
* sort: don't crash when key start is after key end
* sort: add "no match" for months at the first non-whitespace char
We should put the "^ no match for key" indicator at the first
non-whitespace character of a field.
* sort: improve support for e notation
* sort: use maches! macros
Add some abstractions to simplify the `rbuf_but_last_n_lines()`
function, which implements the "take all but the last `n` lines"
functionality of the `head` program. This commit adds
- `RingBuffer`, a fixed-size ring buffer,
- `ZLines`, an iterator over zero-terminated "lines",
- `TakeAllBut`, an iterator over all but the last `n` elements of an
iterator.
These three together make the implementation of
`rbuf_but_last_n_lines()` concise.
Reorganize the code in `truncate.rs` into three distinct functions
representing the three modes of operation of the `truncate` program. The
three modes are
- `truncate -r RFILE FILE`, which sets the length of `FILE` to match the
length of `RFILE`,
- `truncate -r RFILE -s NUM FILE`, which sets the length of `FILE`
relative to the given `RFILE`,
- `truncate -s NUM FILE`, which sets the length of `FILE` either
absolutely or relative to its curent length.
This organization of the code makes it more concise and easier to
follow.
Create a method that computes the final target size in bytes for the
file to truncate, given the reference file size and the parameter to the
`TruncateMode`.
Add a helper function to contain the code for parsing the size and the
modifier symbol, if any. This commit also changes the `TruncateMode`
enum so that the parameter for each "mode" is stored along with the
enumeration value. This is because the parameter has a different meaning
in each mode.
Remove "read" permissions from the `OpenOptions` when opening a new file
just to truncate it. We will never read from the file, only write to
it. (Specifically, we will only call `File::set_len()`.)
`sort` supports three ways to specify the sort mode: a long option
(e.g. --numeric-sort), a short option (e.g. -n) and the sort flag
(e.g. --sort=numeric).
This adds support for the sort flag.
Additionally, sort modes now conflict, which means that an error is
shown when multiple modes are passed, instead of silently picking a mode.
For consistency, I added the `random` sort mode to the `SortMode` enum,
instead of it being a bool flag.
Change the behavior of `wc` to print the counts for a file as soon as
it is computed, instead of waiting to compute the counts for all files
before writing any output to `stdout`. The new behavior matches the
behavior of GNU `wc`.
The old behavior looked like this (the word "hello" is entered on
`stdin`):
$ wc emptyfile.txt -
hello
0 0 0 emptyfile.txt
1 1 6
1 1 6 total
The new behavior looks like this:
$ wc emptyfile.txt -
0 0 0 emptyfile.txt
hello
1 1 6
1 1 6 total
Instead of overflowing when calculating the buffer size, use
saturating_{pow, mul}.
When failing to parse the buffer size, we now crash instead of silently
ignoring the error.
To make this work we make default sort a special case of external sort.
External sorting uses auxiliary files for intermediate chunks. However,
when we can keep our intermediate chunks in memory, we don't write them
to the file system at all. Only when we notice that we can't keep them
in memory they are written to the disk.
Additionally, we don't allocate buffers with the capacity of their
maximum size anymore. Instead, they start with a capacity of 8kb and are
grown only when needed.
This makes sorting smaller files about as fast as it was before
(I'm seeing a regression of ~3%), and allows us to seamlessly continue
with auxiliary files when needed.
For any commandline arguments, ls should print the argument as is (and
not truncate to just the file name)
For any other files it reaches (say through recursive exploration), ls
should print just the filename (as path is printed once when we enter
the directory)
A lot of tests depend on GNU's coreutils to be installed in order
to obtain reference values during testing.
In these cases testing is limited to `target_os = linux`.
This PR installs GNU's coreutils on "github actions" and adjusts the
tests for `who`, `stat` and `pinky` in order to be compatible with macOS.
* `brew install coreutils` (prefix is 'g', e.g. `gwho`, `gstat`, etc.
* switch paths for testing to something that's available on both OSs,
e.g. `/boot` -> `/bin`, etc.
* switch paths for testing to the macOS equivalent,
e.g. `/dev/pts/ptmx` -> `/dev/ptmx`, etc.
* exclude paths when no equivalent is available,
e.g. `/proc`, `/etc/fstab`, etc.
* refactor tests to make better use of the testing API
* fix a warning in utmpx.rs to print to stderr instead of stdout
* fix long_usage text in `who`
* fix minor output formatting in `stat`
* the `expected_result` function should be refactored
to reduce duplicate code
* more tests should be adjusted to not only run on `target_os = linux`
Fix a bug in which the incorrect character was being used to indicate
"round up to the nearest multiple" mode. The character was "*" but it
should be "%". This commit corrects that.
Change the error message for when the reference file (the `-r` argument)
is not found to match GNU coreutils. This commit also eliminates a
redundant call to `File::open`; the file need not be opened because the
size in bytes can be read from the result of `std::fs::metadata()`.
Change the interface provided by the `parse_size()` function to reduce
its responsibilities to just a single task: parsing a number of bytes
from a string of the form '123KB', etc. Previously, the function was
also responsible for deciding which mode truncate would operate in.
Furthermore, this commit simplifies the code for parsing the number and
unit to be less verbose and use less mutable state.
Finally, this commit adds some unit tests for the `parse_size()`
function.
Previous version would perform an amount of work proportional to `CHUNK_SIZE`,
so this wasn't a valid way to benchmark at multiple values of that constant.
The `TryInto` implementation for `&mut [T]` to `&mut [T; N]` relies on `const`
generics, and is available in (stable) Rust v1.51 and later.
Change the behavior of `head` to display an error for each problematic
file, instead of displaying an error message for the first problematic
file and terminating immediately at that point. This change now matches
the behavior of GNU `head`.
Before this commit, the first error caused the program to terminate
immediately:
$ head a b c
head: error: head: cannot open 'a' for reading: No such file or directory
After this commit:
$ head a b c
head: cannot open 'a' for reading: No such file or directory
head: cannot open 'b' for reading: No such file or directory
head: cannot open 'c' for reading: No such file or directory
Instead of using a BufReader and reading each line separately,
allocating a String for each one, we read to a chunk. Lines are
references to this chunk. This makes the allocator's job much easier
and yields performance improvements.
Chunks are read on a separate thread to further improve performance.
Fix a bug in which `head` failed to print headings for `stdin` inputs
when reading from multiple files, and fix another bug in which `head`
failed to print a blank line between the contents of a file and the
heading for the next file when reading multiple files. The output now
matches that of GNU `head`.
Fix two issues with the string formatting width for counts displayed
by `wc`.
First, the output was previously not using the default minimum width
(seven characters) when reading from `stdin`. This commit corrects
this behavior to match GNU `wc`. For example,
$ cat alice_in_wonderland.txt | wc
5 57 302
Second, if at least 10^7 bytes were read from `stdin` *after* reading
from a smaller regular file, then every output row would have width
8. This disagrees with GNU `wc`, in which only the `stdin` row and the
total row would have width 8. This commit corrects this behavior to
match GNU `wc`. For example,
$ printf "%.0s0" {1..10000000} | wc emptyfile.txt -
0 0 0 emptyfile.txt
0 1 10000000
0 1 10000000 total
Fixes#2186.
Change the error messages that get printed to `stderr` for compatibility
with GNU `wc` when an input is a directory and when an input does not
exist.
Fixes#2211.
This closes#2181.
`who --lookup` is failing with a runtime panic (double free).
Since `crate::dns-lookup` already includes a safe wrapper for `getaddrinfo`
I used this crate instead of further debugging the existing code in
utmpx::canon_host().
* It was neccessary to remove the version constraint for libc in uucore.
Refactor code from the `backwards_thru_file()` function into a new
`ReverseChunks` iterator, and use that iterator to simplify the
implementation of the `backwards_thru_file()` function. The
`ReverseChunks` iterator yields `Vec<u8>` objects, each of which
references bytes of a given file.
- add `==` as undocumented alias of `=`
- handle negated comparison of `=` as literal
- negation generally applies to only the first expression of a Boolean chain,
except when combining evaluation of two literal strings
Refactor common code out of two branches of the `unbounded_tail()`
function into a new `unbounded_tail_collect()` helper function, that
collects from an iterator into a `VecDeque` and keeps either the last
`n` elements or all but the first `n` elements.
This commit also adds a new struct, `RingBuffer`, in a new module,
`ringbuffer.rs`, to be responsible for keeping the last `n` elements
of an iterator.
When merging files we need to prioritize files that occur earlier in the
command line arguments with -m.
This also makes the extsort merge step (and thus extsort itself) stable again.
Refactor the counting code from the inner loop of the `wc` program
into the `WordCount::from_line()` associated function. This commit
also splits that function up into other helper functions that
encapsulate decoding characters and finding word boundaries from raw
bytes.
This commit also implements the `Sum` trait for the `WordCount`
struct, so that we can simply call `sum()` on an iterator that yields
`WordCount` instances.
This is a refactor to reduce duplicate code, it affects chmod/ls/stat.
* merge `stat/src/fsext::pretty_access` into `uucore/src/lib/feature/fs::display_permissions_unix`
* move tests for `fs::display_permissions` from `test_stat::test_access` to `uucore/src/lib/features/fs::test_display_permissions`
* adjust `uu_chmod`, `uu_ls` and `uu_stat` to use `uucore::fs::display_permissions`
FileMerger is much more efficient than the previous algorithm,
which looped over all elements every time to determine the next element.
FileMerger uses a BinaryHeap, which should bring the complexity for
the merge step down from O(n²) to O(n log n).
* ls: Implement total size feature
- Implement total size reporting that was missing
- Fix minor formatting / readability nits
* tests: Add tests for ls total sizes feature
* ls: Fix MSRV build errors due to unsupported attributes for if blocks
* ls: Add windows support for total sizes feature
- Add windows support (defaults to file size as block sizes related
infromation is not avialable on windows)
- Renamed some functions
Add the `WordCountable::lines()` method that returns an iterator over
lines of a file-like object. This mirrors the
`std::io::BufRead::lines()` method, with some minor differences due to
the particular use case of `wc`.
This commit also creates a new module, `countable.rs`, to contain the
`WordCountable` trait and the new `Lines` struct returned by `lines()`.
Use clap for argument parsing instead of getopts
Also, make the following changes
* Use `executable!()` macro to output the name of utility
* Add another usage to help message
- Replace the parser with a recursive descent implementation that handles
parentheses and produces a stack of operations in postfix order.
Parsing now operates directly on OsStrings passed by the uucore framework.
- Replace the dispatch mechanism with a stack machine operating on the
symbol stack produced by the parser.
- Add tests for parenthesized expressions.
- Begin testing character encoding handling.
Moved argument parsing to clap and added tests to cover using "-" as
stdin, passing in too many file arguments, and updated the "wrap" error
message in the tests.
It is much faster to just write the lines to disk, separated by \n
(or \0 if zero-terminated is enabled), instead of serializing to json.
external_sort now knows of the Line struct instead of interacting with
it using the ExternallySortable trait. Similarly, it now uses the
crash_if_err! macro to handle errors, instead of bubbling them up.
Some functions were changed from taking &[Line] as the input to taking
an Iterator<Item = Line>. This removes the need to collect to a Vec
when not necessary.
This removes the need to allocate a new string for each line when used
with -f, -d or -i. Instead, a custom string comparison algorithm takes
care of these cases.
The resulting performance improvement is about 20% per flag (i.e. there
is a 60% improvement when combining all three flags)
As a side-effect, the size of the Line struct was reduced from 96 to 80
bytes, reducing the overhead for each line.
Add crossterm as dependency
Complete the paging portion
Fixed tests
cp: extract linux COW logic into function
cp: add --reflink support for macOS
Fixes#1773
Fix error in Cargo.lock
Quit automatically if not much output is left
Remove unnecessary redox and windows specific code
Handle line wrapping
Put everything according to uutils coding standards
Add support for multiple files
Fix failing test
Use the args argument to get cli arguments
Fix bug where text is repeated multiple times during printing
Add a little prompt
Add a top file prompt for multiple files
Change println in loops to stdout.write and setup terminal only once
Fix bug where all lines were printed in a single row
Remove useless file and fix failing test
Fix another test
* ls: added creation time
* ls: Added most time features
Missing support for posix-,Format+, translating via locales. Also required more tests
* ls: rustfmt
* ls: Additional changes and fixes
Fixed the argument order, fixed a wrong iso format.
* ls: additional tests for styles
* ls: perfected arg parsing on time styles
* fix birthime test
* ls: Use 'stdout_str' in new tests
* ls: Disabled birthtime test for windows
* ls: removed indoc as a dependency
* ls: birthime test, sync first created file
* ls: birthime test, add comment explaining sync
* Removed ruby testfile birth_test.rb
This accidentally got commited in a merge
- Adds duplicate dd fn :-( for differentiating between File backed and
non-File outputs.
- Implements cflag=sparse,fsync,fdatasync which were previously blocked.
- Adds plumbing for IFlags & OFlags incl parsing.
- Partial impl for seek=N and skip=N which were previously blocked.
Note, I needed to change the error messages in one of the tests because
getopt and clap have different error messages when not providing a
default value
* Change unchecked unwrapping to unwrap_or_default for argument parsing (resolving #1845)
* Added unit-testing for the collect_str function on invalid utf8 OsStrs
* Added a warning-message for identification purpose to the collect_str method.
* - Add removal of wrongly encoded empty strings to basename
- Add testing of broken encoding to basename
- Changed UCommand to use collect_str in args method to allow for integration testing of that method
- Change UCommand to use unwarp_or_default in arg method to match the behaviour of collect_str
* Trying out a new pattern for convert_str for getting a feeling of how the API feels with more control
* Adding convenience API for compact calls
* Add new API to everywhere, fix test for basename
* Added unit-testing for the conversion options
* Added unit-testing for the conversion options for windows
* fixed compilation and some merge hiccups
* Remove windows tests in order to make merge request build
* Fix formatting to match rustfmt for the merged file
* Improve documentation of the collect_str method and the unit-tests
* Fix compilation problems with test
Co-authored-by: Christopher Regali <chris.vdop@gmail.com>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
* ls: ignore leading period when sorting by name
ls now behaves like GNU ls with respect to sorting files by ignoring
leading periods when sorting by main.
Added tests to ensure "touch a .a b .b ; ls" returns ".a a .b b"
* Replaced clone/collect calls.
* Use buffered stdout to reduce write sys calls.
This simple change yielded the biggest performace gain.
* Use `for_byte_record_with_terminator` from the `bstr` crate.
This is to minimize the per line copying needed by
`BufReader::read_until`. The `cut_fields` and `cut_fields_delimiter`
functions used `read_until` to iterate over lines. That required copying
each input line to the line buffer. With
`for_byte_record_with_terminator` copying is minimized as it calls our
closure with a reference to BufReader's buffer most of the time. It
needs to copy (internally) only to process any incomplete lines at the
end of the buffer.
* Re-write `Searcher` to use `memchr`.
Switch from the naive implementation to one that uses `memchr`.
* Rewrite `cut_bytes` almost entirely.
This was already well optimized. The performance gain in this case is
not from avoiding copying. In fact, it needed zero copying whereas new
implementation introduces some copying similar to `cut_fields` described
above. But the occassional copying cost is more than offset by the use
of the very fast `memchr` inside `for_byte_record_with_terminator`.
This change also simplifies the code significantly. Removed the `buffer`
module.
This adds a --debug flag, which, when activated, will draw lines below
the characters that are actually used for comparisons.
This is not a complete implementation of --debug. It should, quoting the man page
for GNU sort: "annotate the part of the line used to sort, and warn
about questionable usage to stderr". Warning about "questionable usage"
is not part of this patch.
This change required some adjustments to be able to get the range that
is actually used for comparisons. Most notably, general numeric comparisons
were rewritten, fixing some bugs along the lines.
Testing is mostly done by adding fixtures for the expected debug output of
existing tests.
- conv=sparse option requires knowledge of File/Stdout to change
behaviour.
- Unclear how best to impl this.
- Possible option: develop 4 versions of dd<X,Y> for each valid pair of { File, Stdin,
Stdout }.
* ls: Remove allocations by eliminating collect/clones
* ls: Introduce PathData structure
- PathData will hold Path related metadata / strings that are required
frequently in subsequent functions
- All data is precomputed and cached and subsequent functions just
use cached data
* ls: Cache more data related to paths
- Cache filename and sort by filename instead of full path
- Cache uid->usr and gid->grp mappings
https://github.com/uutils/coreutils/pull/2099/files
* ls: Add BENCHMARKING.md
* ls: Document PathData structure
* tests/ls: Add testcase for error paths with width option
* ls: Fix unused import warning
cached will be only used for unix currently as current use of
caching gid/uid mappings is only relevant on unix
* ls: Suggest checking syscall count in BENCHMARKING.md
* ls: Remove mentions of sort in BENCHMARKING.md
* ls: Remove dependency on cached
Implement caching using HashMap and lazy_static
* ls: Fix MSRV error related to map_or
Rust 1.40 did not support map_or for result types
Trailing separators were included at the end of the last token, but they
should not be.
This changes tokenize_with_separator as suggested by @cbjadwani.
GNU sort disallows these combinations, presumably because they are
likely not what the user really wants.
Ignoring characters would cause things to be put together that aren't
together in the input. For example, -dn would cause "0.12" or "0,12" to
be parsed as "12" which is highly unexpected and confusing.
This reduces memory usage by only storing two lines of the input file at
a time. The current implementation first builds a list of all duplicate
lines ('group') and then decides which lines of the group should be
printed.
- Passing `never` to `--reflink` does not raise an error anymore.
- Remove `Options::reflink` flag as it was redundant with
`reflink_mode`.
- Add basic tests for this option. Does not check that a copy-on-write
rather than a regular copy was made.
* sort: use unstable sort when possible
This results in a very minor performance (speed) improvement.
It does however result in a memory usage reduction, because unstable
sort does not allocate auxiliary memory. There's also an improvement in
overall CPU usage.
* add benchmarking instructions
* add user time
* fix typo
* sort: implement numeric string comparison
This implements -n and -h using a string comparison algorithm instead
of parsing each number to a f64 and comparing those.
This should result in a moderate performance increase and eliminate loss
of precision.
* cache parsed f64 numbers
For general numeric comparisons we have to parse numbers as f64,
as this behavior is explicitly documented by GNU coreutils.
We can however cache the parsed value to speed up comparisons.
* fix leading zeroes for negative numbers
* use more appropriate name for exponent
* improvements to the parse function
* move checks into main loop and fix thousands separator condition
* remove unneeded checks
* rustfmt
* du error output should match GNU
* Created a new error macro which allows the customization of the
"error:" string part
* Match the du output based on the type of error encountered. Can extend
to handling other errors I guess.
* Rustfmt updates
* Added non-windows test for du no permission output
* Various fixes and performance improvements
* fix a typo
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
* Fix month parse for months with leading whitespace
* Implement test for months whitespace fix
* Confirm human numeric works as expected with whitespace with a test
* Correct arg help value name for --parallel
* Fix SemVer non version lines/empty line sorting with a test
Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
* cat: Unrevert splice patch
* cat: Add fifo test
* cat: Add tests for error cases
* cat: Add tests for character devices
* wc: Make sure we handle short splice writes
* cat: Fix tests for 1.40.0 compiler
* cat: Run rustfmt on test_cat.rs
* Run 'cargo +1.40.0 update'
* sort: implement basic -k and -t support
This allows to specify keys after the -k flag and a custom field
separator using -t.
Support for options for specific keys is still missing, and the -b flag
is not passed down correctly.
* sort: implement support for key options
* remove unstable feature use
* don't pipe in input when we expect a failure
* only tokenize when needed, remove a clone()
* improve comments
* fix clippy lints
* re-add test
* buffer writes to stdout
* fix ignore_non_printing
and make the test fail in case it is broken :)
* move attribute to the right position
* add more tests
* add my name to the copyright section
* disallow dead code
* move a comment
* re-add a loc
* use smallvec for a perf improvement in the common case
* add BENCHMARKING.md
* add ignore_case to benchmarks
* Various fixes and performance improvements
* fix a typo
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
Co-authored-by: Sylvestre Ledru <sledru@mozilla.com>
Co-authored-by: Michael Debertol <michael.debertol@gmail.com>
* Replace outdated time 0.1 dependancy with latest version of chrono
I also noticed that times are being miscalculated on linux, so I fixed that.
* Add time test for issue #2042
* Cleanup use declarations
* Tie time test to `touch` feature
- if we compile with the right OS feature flag then we should have it,
even on Windows
Forgot to handle the case where no arguments were passed to the COMMAND.
Because ARGS can be empty, we need two separate cases for handling
options.values_of(options::ARGS)
Fixed some minor ownership issues in converting from the options to the
arguments to the timeout COMMAND.
Additionally, fixed a rustfmt issue in other files (fold/stdbuf.rs)
Changed from optparse to clap.
None of the logic within timeout has been changed, which could use some
refactoring, but that's beyond the scope of this commit.
- Moves all arg parsing & tests to parseargs.rs
- Moves conversion tables to conversion_tables.rs
- Adds ebcdic_lcase and _ucase tables
- Refactors options: This **Breaks Write for Output**
refactor `is_wsl` to `is_wsl_1` and `is_wsl_2`
On my tests with wsl2 ubuntu2004 all tests pass without special cases
I moved wsl detection into uucore so that it can be shared instead of duplicated
I moved `parse_mode` into uucode as it seemed to fit there better and anyway requires libc feature
Treat tab chars as advancing to the next tab stop rather than having a fixed
8-column width.
Also treat tab as a whitespace split target only when splitting on word
boundaries.
- Adds support for calling dd fn from cl
- Adds basic cl tests from project root
- Adds support for multiplier strings (c, w, b, kB, KB, KiB, ... EB, E,
EiB.
The '-d' flag should create all ancestors (or components) of a
directory regardless of the presence of the "-D" flag.
From the man page:
-d, --directory
treat all arguments as directory names; create all components of the specified directories
With GNU:
$ install -v -d dir1/di2
install: creating directory 'dir1'
install: creating directory 'dir1/di2'
With this version:
$ ./target/release/install -v -d dir3/di4
install: dir3/di4: No such file or directory (os error 2)
install: dir3/di4: chmod failed with error No such file or directory (os error 2)
install: created directory 'dir3/di4'
Also, one of the unit tests misinterprets what a "component" is,
and hence was fixed.
Also don't run chmod when we just failed to create the directory.
Behaviour before this patch:
$ ./target/release/install -v -d dir1/dir2
install: dir1/dir2: Permission denied (os error 13)
install: dir1/dir2: chmod failed with error No such file or directory (os error 2)
install: created directory 'dir1/dir2'
* Issue #1622 port `du` to windows
* Attempt to support Rust 1.32
Old version was getting "attributes are not yet allowed on `if`
expressions" on Rust 1.32
* Less #[cfg]
* Less duplicate code.
I need the return and the semicolon after if otherwise the second #[cfg]
leads to unexpected token complilation error
* More accurate size on disk calculations for windows
* Expect the same output on windows as with WSL
* Better matches output from du on WSL
* In the absence of feedback I'm disabling these tests on Windows.
They require `ln`. Windows does not ship with this utility.
* Use the coreutils version of `ln` to test `du`
`fn ccmd` is courtesy of @Artoria2e5
* Look up inodes (file ids) on Windows
* One more #[cfg(windows)] to prevent unreachable statement warning on linux
* cat: Improve performance, especially on Linux
* cat: Don't use io::copy for splice fallback
On my MacBook Pro 2020, it is around 25% faster to not use io::copy.
* cat: Only fall back to generic copy if first splice fails
* cat: Don't double buffer stdout
* cat: Don't use experimental or-pattern syntax
* cat: Remove nix symbol use from non-Linux
* wc: Don't read() if we only need to count number of bytes
* Resolve a few code review comments
* Use write macros instead of print
* Fix wc tests in case only one thing is printed
* wc: Fix style
* wc: Use return value of first splice rather than second
* wc: Make main loop more readable
* wc: Don't unwrap on failed write to stdout
* wc: Increment error count when stats fail to print
* Re-add Cargo.lock
* Implemented --indicator-style flag on ls.
* Rust fmt
* Grouped indicator_style args.
* Added tests for sockets and pipes.
Needed to modify util.rs to add support for pipes (aka FIFOs).
* Updated util.rs to remove FIFO operations on Windows
* Fixed slight error in specifying (not(windows))
* Fixed style violations and added indicator_style test for non-unix systems
+ aligned 'tee' output with GNU tee when one of the files is '/dev/full'
+ don't stop tee when one of the outputs fails; just continue and return
error status from tee in the end
Co-authored-by: Ivan Rymarchyk <irymarchyk@arlo.com>
* mkfifo: general refactor, move to clap, add unimplemented flags
* chore: update Cargo.lock
* chore: delete unused variables, simplify multiple lines with crash
* test: add tests
* chore: revert the use of crash
* test: use even more invalid mod mode
* install: implement `-C` / `--compare`
GNU coreutils [1] checks the following: whether
- either file is nonexistent,
- there's a sticky bit or set[ug]id bit in play,
- either file isn't a regular file,
- the sizes of both files mismatch,
- the destination file's owner differs from intended, or
- the contents of both files mismatch.
[1] https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/install.c?h=v8.32#n174
* Add test: non-regular files
* Forgot a #[test]
* Give up on non-regular file test
* `cargo fmt` install.rs
Previously this used `print` instead of `println`, and as a result the
prompt would never appear and the command would hang. The Rust docs
note this about print:
> Note that stdout is frequently line-buffered by default so it may be
> necessary to use io::stdout().flush() to ensure the output is emitted
> immediately.
Changing to `println` fixes the issue.
Fixes#1889.
Co-authored-by: Kevin Burke <kevin@burke.dev>
* feat: move unexpand to clap
* chore: allow muliple files
* test: add test fixture, test reading from a file
* test: fix typo on file name, add test for multiple inputs
* chore: use 'success()' instead of asserting
* chore: delete unused variables
* chore: use help instead of long_help, break long line
* fix: use settings to allow leading hyphen and trailing var arg
fixes: https://github.com/uutils/coreutils/issues/1873
* test: add test cases
* test: add more test cases with different order in hyphen values
* chore: add comment to explain why we need TrailingVarArg
- changed some error return codes to match GNU implementation
- changed warning/error messages to match GNU nohup
- replaced getopts dependency with clap
- added a test