* seq: use BigDecimal to represent floats
Use `BigDecimal` to represent arbitrary precision floats in order to
prevent numerical precision issues when iterating over a sequence of
numbers. This commit makes several changes at once to accomplish this
goal.
First, it creates a new struct, `PreciseNumber`, that is responsible for
storing not only the number itself but also the number of digits (both
integer and decimal) needed to display it. This information is collected
at the time of parsing the number, which lives in the new
`numberparse.rs` module.
Second, it uses the `BigDecimal` struct to store arbitrary precision
floating point numbers instead of the previous `f64` primitive
type. This protects against issues of numerical precision when
repeatedly accumulating a very small increment.
Third, since neither the `BigDecimal` nor `BigInt` types have a
representation of infinity, minus infinity, minus zero, or NaN, we add
the `ExtendedBigDecimal` and `ExtendedBigInt` enumerations which extend
the basic types with these concepts.
* fixup! seq: use BigDecimal to represent floats
* fixup! seq: use BigDecimal to represent floats
* fixup! seq: use BigDecimal to represent floats
* fixup! seq: use BigDecimal to represent floats
* fixup! seq: use BigDecimal to represent floats
Replace the custom `split::walk_lines()` function with a call to
`std::io::copy()`, using a new `TakeLines` reader as the source and
`stdout` as the destination. The `TakeLines` reader is an adaptor that
scans the bytes being read for line ending characters and stops the
reading after a given number of lines has been read (similar to the
`std::io::Take` adaptor).
This change
* makes the `read_n_lines()` function more concise,
* allows it to mirror the implementation of `read_n_bytes()`,
* increases the speed of `head -n NUM`.
Since tac must read its input files completely to start processing them
from the end, it is particularly suited to use memory maps to benefit
from the page cache maintained by the operating systems to bring the
necessary data into memory as required.
This does also include situations where the input is stdin, but not via
a pipe but for example a file descriptor set up by the user's shell
through an input redirection.
This makes it no longer possible to pass multiple filenames, but every
other implementation I tried (GNU, busybox, FreeBSD, sbase) also
forbids that so I think it's for the best.
Fix a mix-up between ownership and mode. The latter (mode / file permissions)
can also be set on windows (which however only affects the read-only flag),
while there doesn't seem to be a straight-forward way to change file ownership
on windows.
Copy the acl as well when copying the mode. This is a non-default feature and can be
enabled with --features feat_acl, because it doesn't seem to work on CI.
It is only available for unix so far.
Copy the SELinux context if possible.
- Implement all of GNU's fiddly little details
- Don't assume Linux for error codes
- Accept badly-encoded filenames
- Report errors after the fact instead of checking ahead of time
- General cleanup
rmdir now passes GNU's tests.
Errors are now always shown with the corresponding filename.
Errors are no longer converted into warnings. Previously `wc < .`
would cause a loop.
Checking whether something is a directory is no longer done in
advance. This removes race conditions and the edge case where stdin is
a directory.
The custom error type is removed because io::Error is now enough.
- Reuse allocations for read lines
- Increase splice size
- Check if /dev/null was opened correctly
- Do not discard read bytes after I/O error
- Add fast line counting with bytecount
Remove a copy operation of the input buffer being read for digest when
reading in text mode on Windows. Previously, the code was copying the
buffer to a completely new `Vec`, replacing "\r\n" with "\n". Instead,
the code now scans for the indices at which each "\r\n" occurs in the
input buffer and inputs into the digest only the characters before the
"\r" and after it.
Report errors properly instead of panicking.
Replace zero_copy by a simpler specialized private module.
Do not assume splices move all data at once.
Use the modern uutils machinery.
Remove the "latency" feature. The time it takes to prepare the buffer
is drowned out by the startup time anyway.
yes: Add tests
yes: Fix long input test on Windows
Fix a bug in `tac` where multi-character line separators would cause
incorrect behavior when there was overlap between candidate matches in
the input string. This commit adds a dependency on `memchr` in order to
use the `memchr::memmem::rfind_iter()` function to scan for
non-overlapping instances of the specified line separator characters,
scanning from right to left.
Fixes#2580.
When hitting Ctrl+C sort now deletes any temporary files. To make this easier I
created a new struct `TmpDirWrapper` that handles the creation of new temporary
files and the registration of the signal handler.