Change `df` so that it correctly scales numbers of bytes by the
default block size, 1024, when neither -h nor -H are specified on the
command-line. Previously, it was not scaling the number of bytes in
this case.
Fixes#3058.
Add support for `split -n l/NUM`. Previously, `split` only supported
`-n NUM`, which splits a file into `NUM` chunks by byte. The `-n
l/NUM` strategy splits a file into `NUM` chunks without splitting
lines across chunks.
Make the `Strategy::Number` enumeration value more general by
replacing the number parameter with a `NumberType` enum parameter.
This allows a future commit to update `split` to support the various
sub-strategies for the `-n`. (This commit does not add support for the
other sub-strategies.)
https://github.com/uutils/coreutils/pull/3084 (2a333ab391) had some
missing coverage and was merged before I had a chance to fix it.
This PR adds some coverage / improved error messages that were missing
from that previous PR.
If `conv=block,sync` command-line arguments are given and there is at
least one partial record read from the input (for example, if the
length of the input is not divisible by the value of the `ibs`
argument), then output an extra block of `cbs` spaces.
For example, no extra spaces are printed in this example because the
input is of length 10, a multiple of `ibs`:
$ printf "012\nabcde\n" \
> | dd ibs=5 cbs=5 conv=block,sync status=noxfer \
> && echo $
012 abcde$
2+0 records in
0+1 records out
But in this example, 5 extra spaces are printed because the length of
the input is not a multiple of `ibs`:
$ printf "012\nabcdefg\n" \
> | dd ibs=5 cbs=5 conv=block,sync status=noxfer \
> && echo $
012 abcde $
2+1 records in
0+1 records out
1 truncated record
The number of spaces printed is the size of the conversion block,
given by `cbs`.
This should correct the usage strings in both the `--help` and user documentation. Previously, sometimes the name of the utils did not show up correctly.
Create a new module `blocks.rs` to contain the block-related helper
functions. This commit only moves the location of the code and related
tests, it does not change the functionality of `dd`.
Collect structs, implementations, and functions that have to do with
reporting number of blocks read and written into their own new module,
`progress.rs`. This commit also adds docstrings for everything and
unit tests for the significant methods. This commit does not change
the behavior of `dd`, just the organization of the code to make it
more maintainable and testable.
Prevent `dd` from terminating with an error when given the
command-line argument `of=/dev/null`. This commit allows the call to
`File::set_len()` to result in an error without causing the process to
terminate prematurely.
- Configured clap to take crate version, so version is now visible in docs.
- Added ABOUT string from expr help output, so about section is getting rendered in docs.
- Added USAGE section.
- Added HELP section for each args.
Place the "truncated records" line below the "records out" line in the
status report produced by `dd` and properly handle the singularization
of the word "record" in the case of 1 truncated record. This matches
the behavior of GNU `dd`.
For example
$ printf "ab" | dd cbs=1 conv=block status=noxfer > /dev/null
0+1 records in
0+1 records out
1 truncated record
$ printf "ab\ncd\n" | dd cbs=1 conv=block status=noxfer > /dev/null
0+1 records in
0+1 records out
2 truncated records
Add the `-e` flag, which indicates whether to elide (that is, remove)
empty files that would have been created by the `-n` option.
The `-n` command-line argument gives a specific number of chunks into
which the input files will be split. If the number of chunks is
greater than the number of bytes, then empty files will be created for
the excess chunks. But if `-e` is given, then empty files will not be
created.
For example, contrast
$ printf 'a\n' > f && split -e -n 3 f && cat xaa xab xac
a
cat: xac: No such file or directory
with
$ printf 'a\n' > f && split -n 3 f && cat xaa xab xac
a
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Add a `NumericHexadecimal` member to the `SuffixType` enum so that a
future commit can add support for hexadecimal filename suffixes to the
`split` program.
Refactor the code to use a `SuffixType` enumeration with two members,
`Alphabetic` and `NumericDecimal`, representing the two currently
supported ways of producing filename suffixes. This prepares the code
to more easily support other formats, like numeric hexadecimal.
Clean up unit tests in the `dd` crate to make them easier to
manage. This commit does a few things.
* move test cases that test the complete functionality of the `dd`
program from the `dd_unit_tests` module up to the
`tests/by-util/test_dd.rs` module so that they can take advantage of
the testing framework and common testing tools provided by uutils,
* move test cases that test internal functions of the `dd`
implementation into the `tests` module within `dd.rs` so that they
live closer to the code they are testing,
* replace test cases defined by macros with test cases defined by
plain old functions to make the test cases easier to read at a
glance.
* include io-blksize parameter
* format changes for including io-blksize
Co-authored-by: DevSabb <devsabb@local>
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
This change is needed to fix missing USAGE section for `od` in user docs.
With reference to this issue
https://github.com/uutils/coreutils/issues/2991, and missing USAGE
section from `od docs` at
https://uutils.github.io/coreutils-docs/user/utils/od.html, it was
found that the USAGE for od app was starts with an empty line and uudoc
only takes 1st line for using in USAGE section in docs.
This resulted in empty line in usage section for `od`
Add support for the `-x` command-line option to `split`. This option
causes `split` to produce filenames with hexadecimal suffixes instead
of the default alphabetic suffixes.
Add a `NumericHexadecimal` member to the `SuffixType` enum so that a
future commit can add support for hexadecimal filename suffixes to the
`split` program.
Refactor the code to use a `SuffixType` enumeration with two members,
`Alphabetic` and `NumericDecimal`, representing the two currently
supported ways of producing filename suffixes. This prepares the code
to more easily support other formats, like numeric hexadecimal.
Correct the accounting for partial records written by `dd` to the
output file. After this commit, if fewer than `obs` bytes are written,
then that is counted as a partial record. For example,
$ printf 'abc' | dd bs=2 status=noxfer > /dev/null
1+1 records in
1+1 records out
That is, one complete record and one partial record are read from the
input, one complete record and one partial record are written to the
output. Previously, `dd` reported two complete records and zero
partial records written to the output in this case.
Change the `filter_mount_list()` function so that it always produces
the same order of `MountInfo` objects. This change ultimately results
in `df` printing its table of filesystems in the same order on each
execution. Previously, the table was in an arbitrary order because the
`MountInfo` objects were read from a `HashMap`.
Fixes#3086.
* ls: add new optional arguments to --classify flag
The --classify flag in ls now takes an option when argument
that may have the values always, auto and none.
Modified clap argument to allow an optional parameter and
changed the classify flag value parsing logic to account for
this change.
* ls: add test for indicator-style, ind and classify with value none
* ls: require option paramter to --classify to use a = to specify flag value
* ls: account for all the undocumented possible values for the --classify flag
Added the other values for the --classify flag along with modifications to tests.
Also documented the inconsistency between GNU coreutils because we accept the
flag value even for the short version of the flag.
The iflag, oflag and conv cli arguments take a list of values
and the correct behavior is to collect all values from multiple
occurences of theme.
For example if we call `dd --iflag=directory --iflag=skip_bytes` this should
collect the two values, `directory` and `skip_bytes` for iflag.
The unittest was added for this case.
Replace `ByteSplitter` and `LineSplitter` with `ByteChunkWriter` and
`LineChunkWriter` respectively. This results in a more maintainable
design and an increase in the speed of splitting by lines.
Add the `ByteChunkWriter` and `LineChunkWriter` structs and
implementations, but don't use them yet. This structs offer an
alternative approach to writing chunks of output (contrasted with
`ByteSplitter` and `LineSplitter`). The main difference is that
control of which underlying file is being written is inside the writer
instead of outside.
Add some helper functions and adjust some error-handling to make the
`Output::dd_out()` method, containing the main loop of the `dd`
program, more concise. This commit also adds documentation and
comments describing the main loop procedure in more detail.
This lets us use fewer reallocations when parsing each line.
The current guess is set to the maximum fields in a line so far. This is
a free performance win in the common case where each line has the same
number of fields, but comes with some memory overhead in the case where
there is a line with lots of fields at the beginning of the file, and
fewer later, but each of these lines are typically not kept for very
long anyway.
Using indexes into the line instead of Vec<u8>s means we don't have to copy
the line to store the fields (indexes instead of slices because it avoids
self-referential structs). Using memchr also empirically saves a lot of
intermediate allocations.
Refactor the code for representing the `df` data table into `Header`
and `Row` structs. These structs live in a new module `table.rs`. When
combined with the `Options` struct, these structs can be
`Display`ed. Organizing the code this way makes it possible to test
the display settings independently of the machinery for getting the
filesystem data. New unit tests have been added to `table.rs` to
demonstrate this benefit.
Show a warning if the `skip=N` command-line argument would cause `dd`
to skip past the end of the input. For example:
$ printf "abcd" | dd bs=1 skip=5 count=0 status=noxfer
'standard input': cannot skip to specified offset
0+0 records in
0+0 records out
Add some structure to errors that can be created during parsing of
settings from command-line options. This commit creates
`StrategyError` and `SettingsError` enumerations to represent the
various parsing and other errors that can arise when transforming
`ArgMatches` into `Settings`.
Show a warning when a block size includes "0x" since this is
ambiguous: the user may have meant "multiply the next number by zero"
or they may have meant "the following characters should be interpreted
as a hexadecimal number".
When specifying `seek=N` and *not* specifying `conv=notrunc`, truncate
the output file to `N` blocks instead of truncating it to zero before
starting to write output. For example
$ printf "abc" > outfile
$ printf "123" | dd bs=1 skip=1 seek=1 count=1 status=noxfer of=outfile
1+0 records in
1+0 records out
$ cat outfile
a2
Fixes#3068.
When this option is present, the files argument is not processed. This option processes the file list from provided file, splitting them by the ascii NUL (\0) character. When files0-from is '-', the file list is processed from stdin.
Correct the behavior of `dd` when multiple arguments are provided.
Before this commit, if the multiple arguments was provided then
the validation error are returned.
For example
```
$ printf '' | ./target/debug/dd status=none status=noxfer
error: The argument '--status=<LEVEL>' was provided more than once, but cannot be used multiple times
USAGE:
dd [OPTIONS]
For more information try --help
```
The unittest was added for this case.
This avoids hacking around the short options of these command line
arguments that have been introduced by clap. Additionally, we test and
correctly handle the combination of both version and help. The GNU
binary will ignore both arguments in this case while clap would perform
the first one. A test for this edge case was added.
Now treats recognized command line options and ignores unrecognized
command line options instead of returning a special exit status for
them.
There is one point of interest, which is related to an implementation
detail in GNU `true`. It may return a non-true exit status (in
particular EXIT_FAIL) if writing the diagnostics of a GNU specific
option fails. For example `true --version > /dev/full` would fail and
have exit status 1.
This behavior was acknowledged in gnu in commit
<9a6a486e6503520fd2581f2d3356b7149f1b225d>. No further
justification provided for keeping this quirk.
POSIX knows no such options, and requires an exit status of 0 in all
cases. We replicate GNU here which is a consistency improvement over the
prior implementation. Adds documentation to clarify the intended
behavior more properly.
Exit with status code 1 for argument parsing errors in `truncate`. When
`clap` encounters an error during argument parsing, it exits with status
code 2. This causes some GNU tests to fail since they expect status code
1.
Refactor the `Mode` enum in the `head.rs` module so that it includes
not only the mode type---lines or bytes---but also whether to read the
first NUM items of that type or all but the last NUM. Before this
commit, these two pieces of information were stored separately. This
made it difficult to read the code through several function calls and
understand at a glance which strategy was being employed.
This allows for `-t` to take invalid unicode (but still single-byte) values
on unix-like platforms. Other platforms, which as of the time of this commit
do not support `OsStr::as_bytes()`, could possibly be supported in the future,
but would require design decisions as to what that means.
Replace the `FilenameFactory` with `FilenameIterator` and calls to
`FilenameFactory::make()` with calls to `FilenameIterator::next()`. We
did not need the fully generality of being able to produce the
filename for an arbitrary chunk index. Instead we need only iterate
over filenames one after another. This allows for a less
mathematically dense algorithm that is easier to understand and
maintain. Furthermore, it can be connected to some familiar concepts
from the representation of numbers as a sequence of digits.
This does not change the behavior of the `split` program, just the
implementation of how filenames are produced.
Co-authored-by: Terts Diepraam <terts.diepraam@gmail.com>
- Change the main! proc_macro to a bin! macro_rules macro.
- Reexport uucore_procs from uucore
- Make utils to not import uucore_procs directly
- Remove the `syn` dependency and don't parse proc_macro input (hopefully for faster compile times)
Prevent usize underflow when reducing the size of a file by more than
its current size. For example, if `f` is a file with 3 bytes, then
truncate -s-5 f
will now set the size of the file to 0 instead of causing a panic.
Improve the error message that gets printed when a directory does not
exist. After this commit, the error message is
truncate: cannot open '{file}' for writing: No such file or directory
where `{file}` is the name of a file in a directory that does not
exist.