coreutils

mirror of https://github.com/uutils/coreutils synced 2024-12-15 15:52:42 +00:00

Author	SHA1	Message	Date
Arcterus	643d9f0f32	Remove a couple of warnings	2014-06-28 16:57:19 -07:00
Arcterus	ae4ad2bb04	Remove useless main functions and fix nohup on Macs	2014-06-28 16:45:10 -07:00
polyphemus	798af52077	Implement fields cutting Adds an implementation for cut_fields() and creates a separate funtion for the --output-delimiter, for performance reasons. This implementation relies on ::read_until() to find the newline for us but read_until() allocates a vector every time to return it's result. This is not ideal and should be improved upon by passing a buffer to read(). This follows/implements the POSIX specification and all the GNU conventions. It is a drop-in replacement for GNU cut. One improvement to GNU is that the --delimter option takes a character as UTF8 as apposed to single byte only for GNU. Performance is about two times slower than that of GNU cut. Remove ranges' sentinel value, All cut functions iterate over the ranges and therefore it only adds an extra iteration instead of improving performance.	2014-06-27 17:39:49 +02:00
polyphemus	0e46d453b7	Rewrite cut_characters This follows the cut_bytes() approach of letting read_line() create a buffer and find the newline. read_line() guarantees our buffer is a string of utf8 characters. When writing out the bytes segment we need to make sure we are cutting on utf8 boundaries, there for we must iterate over the buffer from read_line(). This implementation is(/should be) efficient as it only iterates once over the buffer. The previous performance was about 4x as slow as cut_bytes() and now it is about 2x as slow as cut_bytes().	2014-06-27 17:39:49 +02:00
polyphemus	b1c2d7ac7c	Rewrite cut_bytes() Do no longer iterate over each byte and instead rely on the Buffer trait to find the newline for us. Iterate over the ranges to specify slices of the line which need to be printed out. This rewrite gives a signifcant performance increase: Old: 1.32s mahkoh: 0.90s New: 0.20s GNU: 0.15s	2014-06-27 17:39:49 +02:00
polyphemus	8b1ff08bd5	Add cut_characters implementation, based on cut_bytes This implementation uses rust's concept of characters and fails if the input isn't valid utf-8. GNU cut implements '--characters' as an alias for '--bytes' and thus has different semantics, for this option, from this implemtation.	2014-06-27 17:39:49 +02:00
polyphemus	2ab586459b	Add initial cut support, only bytes cutting	2014-06-27 17:39:41 +02:00

7 commits