Commit graph

7 commits

Author SHA1 Message Date
Arcterus
643d9f0f32 Remove a couple of warnings 2014-06-28 16:57:19 -07:00
Arcterus
ae4ad2bb04 Remove useless main functions and fix nohup on Macs 2014-06-28 16:45:10 -07:00
polyphemus
798af52077 Implement fields cutting
Adds an implementation for cut_fields() and creates a separate funtion
for the --output-delimiter, for performance reasons.

This implementation relies on ::read_until() to find the newline for us
but read_until() allocates a vector every time to return it's result.
This is not ideal and should be improved upon by passing a buffer to
read().

This follows/implements the POSIX specification and all the GNU
conventions. It is a drop-in replacement for GNU cut.

One improvement to GNU is that the --delimter option takes a character
as UTF8 as apposed to single byte only for GNU.

Performance is about two times slower than that of GNU cut.

Remove ranges' sentinel value, All cut functions iterate over the ranges
and therefore it only adds an extra iteration instead of improving
performance.
2014-06-27 17:39:49 +02:00
polyphemus
0e46d453b7 Rewrite cut_characters
This follows the cut_bytes() approach of letting read_line() create a
buffer and find the newline. read_line() guarantees our buffer is a
string of utf8 characters.

When writing out the bytes segment we need to make sure we are cutting
on utf8 boundaries, there for we must iterate over the buffer
from read_line(). This implementation is(/should be) efficient as it
only iterates once over the buffer.

The previous performance was about 4x as slow as cut_bytes() and now it
is about 2x as slow as cut_bytes().
2014-06-27 17:39:49 +02:00
polyphemus
b1c2d7ac7c Rewrite cut_bytes()
Do no longer iterate over each byte and instead rely on the Buffer trait
to find the newline for us. Iterate over the ranges to specify slices of
the line which need to be printed out.

This rewrite gives a signifcant performance increase:
Old:    1.32s
mahkoh: 0.90s
New:    0.20s
GNU:    0.15s
2014-06-27 17:39:49 +02:00
polyphemus
8b1ff08bd5 Add cut_characters implementation, based on cut_bytes
This implementation uses rust's concept of characters and fails if the
input isn't valid utf-8. GNU cut implements '--characters' as an alias
for '--bytes' and thus has different semantics, for this option, from
this implemtation.
2014-06-27 17:39:49 +02:00
polyphemus
2ab586459b Add initial cut support, only bytes cutting 2014-06-27 17:39:41 +02:00