fmt:
- Implemented Knuth-Plass optimal linebreaking strategy.
- Added commandline switch -q for "quick" (greedy) split
mode that does not use Knuth-Plass.
- Right now, Knuth-Plass runs about half as fast. It also
uses more memory.
- Updated fmt to use char_width (see below) instead of
assuming each character width is 1.
- Use i64 for demerits instead of int in K-P, since int is
pointer sized and will only be 32 bits on some
architectures.
- incremented version number
- Incorporated improvements suggested by huonw and Arcterus.
- K-P uses indices of linebreaks vector instead of raw
pointers. This gets rid of a lot of allocation of boxes
and improves safety to boot.
- Added a support module for computing displayed widths of unicode
strings based on Markus Kuhn's free implementation at
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
- This is in `charwidth.rs`, but this is a temporary measure
until the Char trait implements .width(). I am submitting
a PR for this soon, and the code in charwidth() is what's
generated libcore.
closes#223
Adds an implementation for cut_fields() and creates a separate funtion
for the --output-delimiter, for performance reasons.
This implementation relies on ::read_until() to find the newline for us
but read_until() allocates a vector every time to return it's result.
This is not ideal and should be improved upon by passing a buffer to
read().
This follows/implements the POSIX specification and all the GNU
conventions. It is a drop-in replacement for GNU cut.
One improvement to GNU is that the --delimter option takes a character
as UTF8 as apposed to single byte only for GNU.
Performance is about two times slower than that of GNU cut.
Remove ranges' sentinel value, All cut functions iterate over the ranges
and therefore it only adds an extra iteration instead of improving
performance.
This follows the cut_bytes() approach of letting read_line() create a
buffer and find the newline. read_line() guarantees our buffer is a
string of utf8 characters.
When writing out the bytes segment we need to make sure we are cutting
on utf8 boundaries, there for we must iterate over the buffer
from read_line(). This implementation is(/should be) efficient as it
only iterates once over the buffer.
The previous performance was about 4x as slow as cut_bytes() and now it
is about 2x as slow as cut_bytes().