This allows us to check files without bringing them entirely into
memory. Also makes it easier to find the disorder in
(seq 9; echo 0) | sort --check
(points at the end of the file, where our previous version would
point at the start of the file)
Itertools' .coalesce() was the most useful helper that I could find
for comparing adjacent values in an iterator. It is designed for
implementing things like .dedup(), so the resulting code is a little
unintuitive.
FileMerger receives Lines Iterables of the pre-sorted input files
via push_file() It implements Iterator, which yields lines from the
input files in (merged) sorted order. If the input files are not sorted,
then the behavior is undefined.
Internally, FileMerger uses a
std::collections::BinaryHeap<MergeableFile>.
MergeableFile is an internal helper that implements Ord in a way that
BinaryHeap can use (note that we want smallest-first, but BinaryHeap
returns largest first, so MergeableFile::cmp() calls reverse() on
whatever compare_by() returns.
Made a new function sort_by(lines, compare_fns), which accepts a
list of compare_fns and calls lines.sort_by() with a closure that
calls each compare_fn in turn until one returns something other
than equal.
Default behavior ensures that String::cmp is the last element in the
compare_fns list (referred to as 'last resort' sorting by man sort).
Passing --stable (-s) turns this behaviour off.
Test cases provided for `sort --month` and `sort --month --stable`.
* Add options -c, -F, -L, -l, -r, -R, -S, -t, -U, --color
* Fix options -a, -A
* Remove unused options
* Output in columns when not using -l
* Output date with -l
The all flag did not cull/remove the directory entries starting with a
dot. The help message indicates it should. The implementation checks
if the string starts with a dot whilst also using '-a' to determine
whether a DirEntry is to be printed.
I forgot that -v refers to "verbose" and not "version"
when making earlier changes. So I fixed that and for
good measure added the verbose flag anyway.
* Added flag -t/--target-directory
* No longer assumes that the source arguments are files in the CWD (in other words, can copy files from directories other than CWD)
We now accept symbolic and numeric mode strings using the
--mode or -m option for install. This is used either when
moving files into a directory, or when creating component
directories with the -d option. This feature was designed
to mirror the GNU implementation, including the possibly
quirky behaviour of `install --mode=u+wx file dir`
resulting in dir/file having exactly permissions 0300.
Extensive integration tests are included.
This chnage required a higher libc dependency.
We check if the user has given one of the (many)
not yet implemented command line arguments. Upon
catching this, we display the specific transgressor
to stderr and exit with return code 2.
This behaviour is tested in one new integration test.
Bare minimum functionality of `install file dir` implemented.
Also added TODO markers in code for outstanding parameters
and split main function into smaller logical chunks.
Add install utility skeleton source, based on
mv, including the getopts setup mirroring
GNU's `man install` documentation. Also
add a single test and build system code.
Before each line of content is printed, check if it's from a different
file than the last one we printed for. If so, print a '==> file <=='
header to separate the output in the way tail does.
If multiple files are passed as arguments with the -f option, a vector
of BufReaders is built as the files are first tailed, so that follow()
can take control for the rest of the time the program is running.
follow() loops over each reader and prints all new available content on
each file before moving on to the next.
To get the -f option to follow multiple files, bounded_tail should just
tail a single file and return, instead of blocking processing of other
files by calling follow() (which loops forever).
Makes `parse_size` return a `Result` where the `Err` part indicates whether
there was a parsing error, or the parse size is too big to store. Also makes the
value parsed a `u64` rather than a `usize`.
Adds unit tests for `parse_size` and integration tests using the suffix
multiplier in a number passed with the `-n` flag.
The main issue is that -octal or -[rwx] is interpreted as an option by
getopts.
Search the args for such a pattern, remove it before parsing and
manually handle it afterwards.
Fixes#788.
When tailing a file, as opposed to stdin, and we are tailing bytes rather than
lines, we can seek the requested number of bytes from the end of the file. This
side steps the whole `backwards_thru_file` file loop and blocks of reads.
Fixes#833.
When tail'ing a file, we do not need to read the whole file from start to finish
just to find the last n lines or bytes. Instead, we can seek to the end of the
file, and then read the file "backwards" in chunks until we find the location of
the first line/byte we wish to print. This ends up being a nice performance win
for very large files.
Fixes#764
The `BufReader` argument passed to the `fn tail<T: Read>(&mut BufReader<T>,
settings: &settings)` function is never reused, so the `tail` function should
just take ownership of it.
calling install goal overrides utility build settings with utility install settings
calling install goal defaults profile to --release
PROG_PREFIX is now applied to all utilities
modify uutils.rs to make symbolic link bins possible
binary install paths rmd first to prevent errors due to lns
simplify vars for more readable install target
other minor fixes
In order to work around lines() removing the newline byte and CRLF, I
switched from the iterator methods (lines/bytes) to the direct methods
(read_line/read). I also manually skipped lines/bytes.
Fixes#744.
For coreutils, there are two build artifacts:
1. multicall executable (each utility is a separate static library)
2. individual utilities (still separate library with main wrapper)
To avoid namespace collision, each utility crate is defined as
"uu_{CMD}". The end user only sees the original utility name. This
simplifies build.rs.
Also, the thin wrapper for the main() function is no longer contained in
the crate. It has been separated into a dedicated file. This was
necessary to work around Cargo's need for the crate name attribute to
match the name in the respective Cargo.toml.
Since several utilities check if the standard streams are interactive, I
moved this into the uucore::fs library as is_std*_interactive(). I also
added Windows support for these methods, which only return false (or at
least until someone finds a way to support this).
When determining the range from which to select portions of a line, the
upper limit of the range is a usize. The maximum upper value is
usize::MAX, but at one point this value is incremented, causing an
overflow. By setting the maximum upper value to usize::MAX-1, the bug is
averted. Since the upper limit of the range is an index (thus, ranging
from 0 to 2^64-1 for 64-bit platforms), the maximum usize should not be
reached.
I separated test's main() into a separate file to override Cargo's
requirement for matching crate names. I had to update the build command
to use a special extern reference for test.
Fixes issues caused by #728.
To avoid linking issues with Rust's libtest, the crate for the test
utility was changed to 'uutest'. However, the user doesn't need to see
this so a few hoops were jumped through to make this transparent.
I also updated the make rules to build the individual features first and
then uutils. This makes 'make && make test' look more organized.
Everything in src/common has been moved to src/uucore. This is defined
as a Cargo library, instead of directly included. This gives us
flexibility to make the library an external crate in the future.
Fixes#717.
Implemented as follows:
Usage: expr EXPRESSION
or: expr OPTION
--help display this help and exit
--version output version information and exit
Print the value of EXPRESSION to standard output. A blank line below
separates increasing precedence groups. EXPRESSION may be:
ARG1 | ARG2 ARG1 if it is neither null nor 0, otherwise ARG2
ARG1 & ARG2 ARG1 if neither argument is null or 0, otherwise 0
ARG1 < ARG2 ARG1 is less than ARG2
ARG1 <= ARG2 ARG1 is less than or equal to ARG2
ARG1 = ARG2 ARG1 is equal to ARG2
ARG1 != ARG2 ARG1 is unequal to ARG2
ARG1 >= ARG2 ARG1 is greater than or equal to ARG2
ARG1 > ARG2 ARG1 is greater than ARG2
ARG1 + ARG2 arithmetic sum of ARG1 and ARG2
ARG1 - ARG2 arithmetic difference of ARG1 and ARG2
ARG1 * ARG2 arithmetic product of ARG1 and ARG2
ARG1 / ARG2 arithmetic quotient of ARG1 divided by ARG2
ARG1 % ARG2 arithmetic remainder of ARG1 divided by ARG2
STRING : REGEXP [NOT IMPLEMENTED] anchored pattern match of REGEXP in STRING
match STRING REGEXP [NOT IMPLEMENTED] same as STRING : REGEXP
substr STRING POS LENGTH [NOT IMPLEMENTED] substring of STRING, POS counted from 1
index STRING CHARS [NOT IMPLEMENTED] index in STRING where any CHARS is found, or 0
length STRING [NOT IMPLEMENTED] length of STRING
+ TOKEN interpret TOKEN as a string, even if it is a
keyword like 'match' or an operator like '/'
( EXPRESSION ) value of EXPRESSION
Beware that many operators need to be escaped or quoted for shells.
Comparisons are arithmetic if both ARGs are numbers, else lexicographical.
Pattern matches return the string matched between \( and \) or null; if
\( and \) are not used, they return the number of characters matched or 0.
Exit status is 0 if EXPRESSION is neither null nor 0, 1 if EXPRESSION is null
or 0, 2 if EXPRESSION is syntactically invalid, and 3 if an error occurred.
Environment variables:
* EXPR_DEBUG_TOKENS=1 dump expression's tokens
* EXPR_DEBUG_RPN=1 dump expression represented in reverse polish notation
* EXPR_DEBUG_SYA_STEP=1 dump each parser step
* EXPR_DEBUG_AST=1 dump expression represented abstract syntax tree
Builds the uutils multicall binary containing all utils (except stdbuf)
by default. To only build a subset
`cargo --no-default-features --features <utils>`
can be used.
Whats missing is building the standalone binaries and a mechanism to
automatically disable the build of unix only utils on windows.
I cleaned up string references, whitespace, and use of unstable
features. I also added a comment about reverting to connect, making
others aware that the method should be replaced by join after 1.3.
We are using connect() instead of join() until Rust 1.3 is stable.
Currently, connect() is just a thin wrapper over join(). Keeping the
deprecated method allows us to build on all releases.
The method, fs::canonicalize(), is unstable and can't be used for stable
builds. We already have our own implementation of canonicalize(), which
supports more options than the Rust library implementation.
Improve handling of unicode on Windows
Disable a few crates on Windows that abuse unix APIs too much
Signed-off-by: Peter Atashian <retep998@gmail.com>
I removed unused linker flags, added platform-specific linker flags, and
used DYLD_LIBRARY_PATH (instead of DYLD_INSERT_LIBRARIES) for loading
the dynamic library. I also removed an unused variable mutation.
There are several areas needing improvement:
1) add tests for hard links
2) add implementation for uncommon flags (-d, -L, -n, -P, -r)
3) align error messages more closely with GNU implementation
I switched over to the getopts crate on crates.io, instead of Rust's
private implementation. This will allow coreutils to build for Rust 1.0.
I'm splitting the updates into several commits for easier reviewing.
I switched over to the getopts crate on crates.io, instead of Rust's
private implementation. This will allow coreutils to build for Rust 1.0.
I'm splitting the updates into several commits for better reviewing.
I switched over to the getopts crate on crates.io, instead of Rust's
private implementation. This will allow coreutils to build for Rust 1.0.
I'm splitting the updates into several commits for better reviewing.
This commit adds `cargo update` to the distclean target in the
makefile. This updates the Cargo.lock file when clearing the
deps directory.
In addition, it adds a faster implementation of the Sieve of
Eratosthenes for use by `src/factor/gen_table.rs` and `test/factor.rs`.
In addition to upgrading the nightly build, I flattened the Stat struct
to embed the metadata fields. This simplified access to the values, but
needed a constructor method for ergonomic reasons.
In addition to upgrading to the nightly build, I refactored the method
that creates the directories by switching from a recursive approach to
an iterative one. I also replaced the obsolete fs::mkdir() with a custom
method using fs::create_dir() and libc::chmod(). I added several
diagnostic messages that match the GNU implementation.
I updated to the nightly build, completed support for the verbose flag,
and refactored the canonicalization method to simplify and add support
for Windows paths.
This commit updates `cut` to build on rust nightly.
In addition, it adds support for null input and output delimiters,
and fixes a bug in the `cut_characters()` function that would cause
incorrect output when two adjacent fields were specified in the range
list.
Aside from the usual upgrades to sync with the nightly build, I fixed an
unwrap() panic when reading lines with only a newline. I also refactored
the repeated command calls to use helper functions.
I created random data to test several cases. I verified that the data is
split into the correct number of files and can also be reassembled into
the original file.
The GNU implementation first strips all trailing slashes before deleting
the directory portion. This case wasn't handled.
I also rewrote the method that strips the directory to use the PathBuf
methods for improved platform-indepedence.
In addition, this commit brings the behavior of `rm` better in line
with the behavior of GNU Coreutils rm, especially as regarding recursive
interactive deletion of directories. This version asks to delete files
in a different order from GNU rm, but it now gives the option of stopping
the recursion at each new directory that is reached.
This change does the following:
1. Updates the arithmetic functions in `src/factor/numeric.rs` to
correctly handle all cases up to 2^64. When numbers are larger
than 2^63, we fall back to slightly slower routines that check
for and handle overflow.
2. Since the arithmetic functions will now not overflow, we no longer
need the safety net trial division implementation. We now always
use Pollard's rho after eliminating small (<=13 bit) primes.
3. Slight tweak in `src/factor/gen_table.rs` to generate the first
1027 primes, which means we test every prime of 13 or fewer bits
before going into Pollard's rho. Includes corresponding update in
`src/factor/prime_table.rs` and the Makefile to reflect this.
4. Add a new test that generates random numbers with exclusively
large (14 to 50 bit) prime factors. This exercises the possible
overflow paths.
5. Add another new test that checks the `is_prime()` function against
a few dozen 64-bit primes. Again this is to exercise possible
overflow paths.
Add a test for `factor`.
This commit also pulls factor's Sieve implementation into its own module
so that the factor test can use it.
Finally, slight refactoring for clarity in gen_table.rs.
This commit builds upon @wikol's Pollard rho implementation.
It adds the following:
1. A generator for prime inverse tables. With these, we can do
very fast divisibility tests (a single multiply and comparison)
for small primes (presently, the first 1000 primes are in the
table, which means all numbers of ~26 bits or less can be
factored very quickly.
2. Always try prime inverse tables before jumping into Pollard's
rho method or using trial division.
3. Since we have eliminated all small factors by the time we're
done with the table division, only use slow trial division when
the number is big enough to cause overflow issues in Pollard's
rho, and jump out of trial division and into Pollard's rho as
soon as the number is small enough.
4. Updates the Makefile to regenerate the prime table if it's not
up-to-date.
The utility need a substantial rewrite due to library changes and
lifetime issues. I needed to implement the MultiWriter struct since it
was no longer available.
I upgraded to the recent Rust release. The only major change was the
reduction of the sleep millisecond resolution from u64 to u32 (this
matches the thread::sleep_ms() method).
This is a reworked version of expand. I did this for two main
reasons:
1. The previous version assumed the input was UTF-8. This
version is compatible with both UTF-8 and non-UTF-8 inputs.
2. This version has a new flag, -U, which forces expand to
treat input as 8-bit ASCII rather than interpreting it
as UTF-8. This might be handy in some cases.
This is a reworked version of unexpand. I did this for two main
reasons:
1. The previous version of unexpand had issues correctly computing
tabstops when the `-a` flag was supplied.
2. The previous version assumed the input was UTF-8. This version works
with non-UTF-8 inputs.
3. This version has a new flag, -U, which forces unexpand to
treat input as 8-bit ASCII rather than interpreting it
as UTF-8. This might be handy in some cases.
With this change, individual submodules can specify their dependencies with
an additional file called "deps.mk" in the subdir. When building, only
the dependencies that are necessary are built, using cargo, and then linked.
This greatly simplifies adding new dependencies: add the package in
deps/Cargo.toml, and add the appropriate line in "deps.mk" in the
src/utilname/ directory, and the dependency will be built automatically
as needed.
This also removes the need to use git submodules.
This patch begins the work of modernizing uutils to work with 1.0-ish
Rust. In particular, it
1. Updates to the latest submodules.
2. Convert mkmain.rs, mkuutils.rs, and src/uutils/uutils.rs
to new slice syntax and use of new io, fs, and path APIs.
3. Convert src/common/util.rs to new io, fs, and path APIs.
4. Convert fmt to use new APIs.
Prior to this CL, --address-radix was being used to determine the format
of the output bytes. This was wrong: this flag controls the printing of
the address (in the POSIX spec for od, this is called the "input offset
base"), not the printing of the content bytes.
+ Make parse_radix terser and clearer.
+ Make purpose of radix clearer at use site
(note that the code currently completely misuses the --address-radix
flag; this is inherited from the previous code)
+ Don't panic! inside parse_radix; instead return Result<> and let the
caller handle errors (currently we panic, but probably we'll want to use
some less alarming error routine); this will be more testable later as
well.