Add string shorten

This is essentially the inverse of `string pad`. Where that adds characters to get up to the specified width, this adds an ellipsis to a string if it goes over a specific maximum width. The char can be given, but defaults to our ellipsis string. ("…" if the locale can handle it and "..." otherwise) If the ellipsis string is empty, it just truncates. For arguments given via argv, it goes line-by-line, because otherwise length makes no sense. If "--no-newline" is given, it adds an ellipsis instead and removes all subsequent lines. Like pad and `length --visible`, it goes by visible width, skipping recognized escape sequences, as those have no influence on width. The default target width is the shortest of the given widths that is non-zero. If the ellipsis is already wider than the target width, we truncate instead. This is safer overall, so we don't e.g. move into a new line. This is especially important given our default ellipsis might be width 3.
2024-09-20 14:32:04 +00:00 · 2022-08-16 17:57:19 +02:00 · 2022-08-16 17:57:19 +02:00 · 41c22d5e60
commit 41c22d5e60
parent 3e3996c9a5
5 changed files with 401 additions and 12 deletions
--- a/doc_src/cmds/string-shorten.rst
+++ b/doc_src/cmds/string-shorten.rst
@ -0,0 +1,89 @@
+string-shorten - shorten strings to a width, with an ellipsis
+===============================================================
+
+Synopsis
+--------
+
+.. BEGIN SYNOPSIS
+
+.. synopsis::
+
+    string shorten [(-c | --char) CHARS] [(-m | --max) INTEGER] [(-N | --no-newline)] [(-l | --left)]
+               [STRING ...]
+
+.. END SYNOPSIS
+
+Description
+-----------
+
+.. BEGIN DESCRIPTION
+
+``string shorten`` truncates each *STRING* to the given visible width and adds an ellipsis to indicate it. "Visible width" means the width of all visible characters added together, excluding escape sequences and accounting for :envvar:`fish_emoji_width` and :envvar:`fish_ambiguous_width`. It is the amount of columns in a terminal the *STRING* occupies.
+
+The escape sequences reflect what fish knows about, and how it computes its output. Your terminal might support more escapes, or not support escape sequences that fish knows about.
+
+If **-m** or **--max** is given, truncate at the given width. Otherwise, the lowest non-zero width of all input strings is used.
+
+If **-N** or **--no-newline** is given, only the first line (or last line with **--left**) of each STRING is used, and an ellipsis is added if it was multiline. This only works for STRINGs being given as arguments, multiple lines given on stdin will be interpreted as separate STRINGs instead.
+
+If **-c** or **--char** is given, add *CHAR* instead of an ellipsis. This can also be empty or more than one character.
+
+If **-l** or **--left** is given, remove text from the left on instead, so this prints the longest *suffix* of the string that fits. With **--no-newline**, this will take from the last line instead of the first.
+
+The default ellipsis is ``…``. If fish thinks your system is incapable because of your locale, it will use ``...`` instead.
+
+.. END DESCRIPTION
+
+Examples
+--------
+
+.. BEGIN EXAMPLES
+
+::
+
+    >_ string shorten foo foobar
+    # No width was given, we infer, and "foo" is the shortest.
+    foo
+    fo…
+
+    >_ string shorten --char="..." foo foobar
+    # The target width is 3 because of "foo",
+    # and our ellipsis is 3 too, so we can't really show anything.
+    # This is the default ellipsis if your locale doesn't allow "…".
+    foo
+    ...
+
+    >_ string shorten --char="" --max 4 abcdef 123456
+    # Leaving the char empty makes us not add an ellipsis
+    # So this truncates at 4 columns:
+    abcd
+    1234
+
+    >_ touch "a multiline"\n"file"
+    >_ for file in *; string shorten -N -- $file; end
+    # Shorten the multiline file so we only show one line per file:
+    a multiline…
+
+    >_ ss -p | string shorten -m$COLUMNS -c ""
+    # `ss` from Linux' iproute2 shows socket information, but prints extremely long lines.
+    # This shortens input so it fits on the screen without overflowing lines.
+
+    >_ git branch | string match -rg '^\* (.*)' | string shorten -m20
+    # Take the current git branch and shorten it at 20 columns.
+    # Here the branch is "builtin-path-with-expand"
+    builtin-path-with-e…
+
+    >_ git branch | string match -rg '^\* (.*)' | string shorten -m20 --left
+    # Taking 20 columns from the right instead:
+    …in-path-with-expand
+
+See Also
+--------
+
+- :ref:`string<cmd-string>`'s ``pad`` subcommand does the inverse of this command, adding padding to a specific width instead.
+  
+- The :ref:`printf <cmd-printf>` command can do simple padding, for example ``printf %10s\n`` works like ``string pad -w10``.
+
+- :ref:`string length <cmd-string-length>` with the ``--visible`` option can be used to show what fish thinks the width is.
+
+.. END EXAMPLES
--- a/doc_src/cmds/string.rst
+++ b/doc_src/cmds/string.rst
@ -24,6 +24,8 @@ Synopsis
                  [-q | --quiet] [STRING ...]
    string replace [-a | --all] [-f | --filter] [-i | --ignore-case]
                   [-r | --regex] [-q | --quiet] PATTERN REPLACE [STRING ...]
+    string shorten [(-c | --char) CHARS] [(-m | --max) INTEGER] [(-N | --no-newline)]
+               [STRING ...]
    string split [(-f | --fields) FIELDS] [(-m | --max) MAX] [-n | --no-empty] 
                 [-q | --quiet] [-r | --right] SEP [STRING ...]
    string split0 [(-f | --fields) FIELDS] [(-m | --max) MAX] [-n | --no-empty]
@ -152,8 +154,8 @@ Examples
   :start-after: BEGIN EXAMPLES
   :end-before: END EXAMPLES

-"pad" subcommand
------------------
+"pad" and "shorten" subcommands
+---------------------------------

 .. include:: string-pad.rst
   :start-after: BEGIN SYNOPSIS
@ -167,6 +169,18 @@ Examples
   :start-after: BEGIN EXAMPLES
   :end-before: END EXAMPLES

+.. include:: string-shorten.rst
+   :start-after: BEGIN SYNOPSIS
+   :end-before: END SYNOPSIS
+
+.. include:: string-shorten.rst
+   :start-after: BEGIN DESCRIPTION
+   :end-before: END DESCRIPTION
+
+.. include:: string-shorten.rst
+   :start-after: BEGIN EXAMPLES
+   :end-before: END EXAMPLES
+
 "repeat" subcommand
 -------------------

--- a/share/completions/string.fish
+++ b/share/completions/string.fish
@ -56,3 +56,8 @@ complete -f -c string -n "test (count (commandline -opc)) -lt 2" -a pad
 complete -x -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] pad" -s r -l right -d "Pad right instead of left"
 complete -x -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] pad" -s c -l char -x -d "Character to use for padding"
 complete -x -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] pad" -s w -l width -x -d "Integer width of the result, default is maximum width of inputs"
+complete -f -c string -n "test (count (commandline -opc)) -lt 2" -a shorten
+complete -x -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] shorten" -s l -l left -d "Remove from the left on"
+complete -x -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] shorten" -s c -l char -x -d "Characters to use as ellipsis"
+complete -x -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] shorten" -s m -l max -x -d "Integer width of the result, default is minimum non-zero width of inputs"
+complete -f -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] shorten" -s N -l no-newline -d "Only keep one line of each input"
--- a/src/builtins/string.cpp
+++ b/src/builtins/string.cpp
@ -144,6 +144,7 @@ struct options_t {  //!OCLINT(too many fields)
    bool all_valid = false;
    bool char_to_pad_valid = false;
    bool chars_to_trim_valid = false;
+    bool chars_to_shorten_valid = false;
    bool count_valid = false;
    bool entire_valid = false;
    bool filter_valid = false;
@ -205,9 +206,10 @@ struct options_t {  //!OCLINT(too many fields)
    escape_string_style_t escape_style = STRING_STYLE_SCRIPT;
 };

-static size_t width_without_escapes(const wcstring &ins) {
+static size_t width_without_escapes(const wcstring &ins, size_t start_pos = 0) {
    ssize_t width = 0;
-    for (auto c : ins) {
+    for (size_t i = start_pos; i < ins.size(); i++) {
+        wchar_t c = ins[i];
        auto w = fish_wcwidth_visible(c);
        // We assume that this string is on its own line,
        // in which case a backslash can't bring us below 0.
@ -218,7 +220,7 @@ static size_t width_without_escapes(const wcstring &ins) {

    // ANSI escape sequences like \e\[31m contain printable characters. Subtract their width
    // because they are not rendered.
-    size_t pos = 0;
+    size_t pos = start_pos;
    while ((pos = ins.find('\x1B', pos)) != std::string::npos) {
        auto len = escape_code_length(ins.c_str() + pos);
        if (len) {
@ -294,7 +296,7 @@ static int handle_flag_a(const wchar_t **argv, parser_t &parser, io_streams_t &s

 static int handle_flag_c(const wchar_t **argv, parser_t &parser, io_streams_t &streams,
                         const wgetopter_t &w, options_t *opts) {
-    if (opts->chars_to_trim_valid) {
+    if (opts->chars_to_trim_valid || opts->chars_to_shorten_valid) {
        opts->chars_to_trim = w.woptarg;
        return STATUS_CMD_OK;
    } else if (opts->char_to_pad_valid) {
@ -557,6 +559,7 @@ static wcstring construct_short_opts(options_t *opts) {  //!OCLINT(high npath co
    if (opts->all_valid) short_opts.append(L"a");
    if (opts->char_to_pad_valid) short_opts.append(L"c:");
    if (opts->chars_to_trim_valid) short_opts.append(L"c:");
+    if (opts->chars_to_shorten_valid) short_opts.append(L"c:");
    if (opts->count_valid) short_opts.append(L"n:");
    if (opts->entire_valid) short_opts.append(L"e");
    if (opts->filter_valid) short_opts.append(L"f");
@ -1655,6 +1658,166 @@ static int string_sub(parser_t &parser, io_streams_t &streams, int argc, const w
    return nsub > 0 ? STATUS_CMD_OK : STATUS_CMD_ERROR;
 }

+static int string_shorten(parser_t &parser, io_streams_t &streams, int argc, const wchar_t **argv) {
+    options_t opts;
+    opts.chars_to_shorten_valid = true;
+    opts.chars_to_trim = get_ellipsis_str();
+    opts.max_valid = true;
+    opts.no_newline_valid = true;
+    opts.quiet_valid = true;
+    opts.max = -1;
+    opts.left_valid = true;
+    int optind;
+    int retval = parse_opts(&opts, &optind, 0, argc, argv, parser, streams);
+    if (retval != STATUS_CMD_OK) return retval;
+
+    // Find max width of strings and keep the inputs
+    size_t min_width = SIZE_MAX;
+    std::vector<wcstring> inputs;
+    wcstring ell = opts.chars_to_trim;
+
+    auto ell_width = fish_wcswidth(ell);
+
+    arg_iterator_t aiter_width(argv, optind, streams);
+    while (const wcstring *arg = aiter_width.nextstr()) {
+        // Visible width only makes sense line-wise.
+        // So either we have no-newlines (which means we shorten on the first newline),
+        // or we handle the lines separately.
+        auto splits = split_string(*arg, L'\n');
+        if (opts.no_newline && splits.size() > 1) {
+            wcstring str = !opts.left ? splits[0] : splits[splits.size() - 1];
+            str.append(ell);
+            ssize_t width = width_without_escapes(str);
+            if (width > 0 && (size_t)width < min_width) min_width = width;
+            inputs.push_back(str);
+        } else {
+            for (auto &input_string : splits) {
+                ssize_t width = width_without_escapes(input_string);
+                if (width > 0 && (size_t)width < min_width) min_width = width;
+                inputs.push_back(std::move(input_string));
+            }
+        }
+    }
+
+    // opts.max is signed for other subcommands,
+    // but we compare against .size() a bunch,
+    // this shuts the compiler up.
+    size_t ourmax = min_width;
+    if (opts.max > 0) {
+        ourmax = opts.max;
+    }
+
+    if (ell_width > (ssize_t)ourmax) {
+        // If we can't even print our ellipsis, we substitute nothing,
+        // truncating instead.
+        ell = L"";
+        ell_width = 0;
+    }
+
+    int nsub = 0;
+    // We could also error out here if the width of our ellipsis is larger
+    // than the target width.
+    // That seems excessive - specifically because the ellipsis on LANG=C
+    // is "..." (width 3!).
+
+    auto skip_escapes = [&](const wcstring &l, size_t pos) {
+        size_t totallen = 0;
+        while (l[pos + totallen] == L'\x1B') {
+            auto len = escape_code_length(l.c_str() + pos + totallen);
+            if (!len) break;
+            totallen += *len;
+        }
+        return totallen;
+    };
+
+    for (auto &line : inputs) {
+        size_t pos = 0;
+        size_t max = 0;
+        // Collect how much of the string we can use without going over the maximum.
+        if (opts.left) {
+            // Our strategy for keeping from the end.
+            // This is rather unoptimized - actually going *backwards*
+            // is extremely tricky because we would have to subtract escapes again.
+            // Also we need to avoid hacking combiners into bits.
+            // This should work for most cases considering the combiners typically have width 0.
+            wcstring out;
+            while (pos < line.size()) {
+                auto w = width_without_escapes(line, pos);
+                // If we're at the beginning and it fits, we sits.
+                //
+                // Otherwise we require it to fit the ellipsis
+                if ((w <= ourmax && pos == 0) || w + ell_width <= ourmax) {
+                    out = line.substr(pos);
+                    break;
+                }
+
+                auto skip = skip_escapes(line, pos);
+                pos += skip > 0 ? skip : 1;
+            }
+            if (opts.quiet && pos != 0) {
+                return STATUS_CMD_OK;
+            }
+
+            if (pos == 0) {
+                streams.out.append(line);
+                streams.out.append(L'\n');
+            } else {
+                // We have an ellipsis, construct our string and print it.
+                nsub++;
+                out = ell + out + L'\n';
+                streams.out.append(out);
+            }
+            continue;
+        } else {
+            // Going from the left.
+            // This is somewhat easier.
+            while (max <= ourmax && pos < line.size()) {
+                pos += skip_escapes(line, pos);
+                auto w = fish_wcwidth(line[pos]);
+                if (w <= 0 || max + w + ell_width <= ourmax) {
+                    // If it still fits, even if it is the last, we add it.
+                    max += w;
+                    pos++;
+                } else {
+                    // We're at the limit, so see if the entire string fits.
+                    auto max2 = max + w;
+                    auto pos2 = pos + 1;
+                    while (pos2 < line.size()) {
+                        pos2 += skip_escapes(line, pos2);
+                        max2 += fish_wcwidth(line[pos2]);
+                        pos2++;
+                    }
+
+                    if (max2 <= ourmax) {
+                        // We're at the end and everything fits,
+                        // no ellipsis.
+                        pos = pos2;
+                    }
+                    break;
+                }
+            }
+        }
+
+        if (opts.quiet && pos != line.size()) {
+            return STATUS_CMD_OK;
+        }
+
+        if (pos == line.size()) {
+            streams.out.append(line);
+            streams.out.append(L'\n');
+        } else {
+            nsub++;
+            wcstring newl = line.substr(0, pos);
+            newl.append(ell);
+            newl.push_back(L'\n');
+            streams.out.append(newl);
+        }
+    }
+
+    // Return true if we have shortened something and false otherwise.
+    return nsub > 0 ? STATUS_CMD_OK : STATUS_CMD_ERROR;
+}
+
 static int string_trim(parser_t &parser, io_streams_t &streams, int argc, const wchar_t **argv) {
    options_t opts;
    opts.chars_to_trim_valid = true;
@ -1744,12 +1907,12 @@ static constexpr const struct string_subcommand {
    int (*handler)(parser_t &, io_streams_t &, int argc,  //!OCLINT(unused param)
                   const wchar_t **argv);                 //!OCLINT(unused param)
 } string_subcommands[] = {
-    {L"collect", &string_collect}, {L"escape", &string_escape}, {L"join", &string_join},
-    {L"join0", &string_join0},     {L"length", &string_length}, {L"lower", &string_lower},
-    {L"match", &string_match},     {L"pad", &string_pad},       {L"repeat", &string_repeat},
-    {L"replace", &string_replace}, {L"split", &string_split},   {L"split0", &string_split0},
-    {L"sub", &string_sub},         {L"trim", &string_trim},     {L"unescape", &string_unescape},
-    {L"upper", &string_upper},
+    {L"collect", &string_collect},   {L"escape", &string_escape},   {L"join", &string_join},
+    {L"join0", &string_join0},       {L"length", &string_length},   {L"lower", &string_lower},
+    {L"match", &string_match},       {L"pad", &string_pad},         {L"repeat", &string_repeat},
+    {L"replace", &string_replace},   {L"shorten", &string_shorten}, {L"split", &string_split},
+    {L"split0", &string_split0},     {L"sub", &string_sub},         {L"trim", &string_trim},
+    {L"unescape", &string_unescape}, {L"upper", &string_upper},
 };
 ASSERT_SORTED_BY_NAME(string_subcommands);
 }  // namespace
--- a/tests/checks/string.fish
+++ b/tests/checks/string.fish
@ -832,3 +832,121 @@ printf \<
 printf my-password | string replace -ra . \*
 printf \>\n
 # CHECK: <***********>
+
+string shorten -m 3 foo
+# CHECK: foo
+string shorten -m 2 foo
+# CHECK: f…
+
+string shorten -m 5 foobar
+# CHECK: foob…
+
+# Char is longer than width, we truncate instead.
+string shorten -m 5 --char ........ foobar
+# CHECK: fooba
+
+string shorten --max 4 -c /// foobar
+# CHECK: f///
+
+string shorten --max 4 -c /// foobarnana
+# CHECK: f///
+
+string shorten --max 2 --chars "" foo
+# CHECK: fo
+
+string shorten foo foobar
+# CHECK: foo
+# CHECK: fo…
+
+# A weird case - our minimum width here is 1,
+# so everything that goes over the width becomes "x"
+for i in (seq 1 10)
+    math 2 ^ $i
+end | string shorten -c x
+# CHECK: 2
+# CHECK: 4
+# CHECK: 8
+# CHECK: x
+# CHECK: x
+# CHECK: x
+# CHECK: x
+# CHECK: x
+# CHECK: x
+# CHECK: x
+
+string shorten -N -cx bar\nfooo
+# CHECK: barx
+
+# Shorten and emoji width.
+begin
+    # \U1F4A9 was widened in unicode 9, so it's affected
+    # by $fish_emoji_width
+    # "…" isn't and always has width 1.
+    #
+    # "abcde" has width 5, we have a total width of 6,
+    # so we need to overwrite the "e" with our ellipsis.
+    fish_emoji_width=1 string shorten --max=5 -- abcde💩
+    # CHECK: abcd…
+    # This fits assuming the poo fits in one column
+    fish_emoji_width=1 string shorten --max=6 -- abcde💩
+    # CHECK: abcde💩
+
+    # This has a total width of 7 (assuming double-wide poo),
+    # so we need to add the ellipsis on the "e"
+    fish_emoji_width=2 string shorten --max=5 -- abcde💩
+    # CHECK: abcd…
+    # This still doesn't fit!
+    fish_emoji_width=2 string shorten --max=6 -- abcde💩
+    # CHECK: abcde…
+    fish_emoji_width=2 string shorten --max=7 -- abcde💩
+    # CHECK: abcde💩
+end
+
+# See that colors aren't counted
+string shorten -m6 (set_color blue)s(set_color red)t(set_color --bold brwhite)rin(set_color red)g(set_color yellow)-shorten | string escape
+# Renders like "strin…" in colors
+# Note that red sequence that we still pass on because it's width 0.
+# CHECK: \e\[34ms\e\[31mt\e\[1m\e\[37mrin\e\[31m…
+
+set -l str (set_color blue)s(set_color red)t(set_color --bold brwhite)rin(set_color red)g(set_color yellow)-shorten
+for i in (seq 1 (string length -V -- $str))
+    set -l len (string shorten -m$i -- $str | string length -V)
+    test $len = $i
+    or echo Oopsie ellipsizing to $i failed
+end
+
+string shorten -m4 foobar\nbananarama
+# CHECK: foo…
+# CHECK: ban…
+
+# First line is empty and printed as-is
+# The other lines are truncated to the width of the first real line.
+printf '
+1. line
+2. another line
+3. third line' | string shorten
+# CHECK:
+# CHECK: 1. line
+# CHECK: 2. ano…
+# CHECK: 3. thi…
+
+printf '
+1. line
+2. another line
+3. third line' | string shorten --left
+# CHECK:
+# CHECK: 1. line
+# CHECK: …r line
+# CHECK: …d line
+
+string shorten -m12 -l (set_color blue)s(set_color red)t(set_color --bold brwhite)rin(set_color red)(set_color green)g(set_color yellow)-shorten | string escape
+# Renders like "…ing-shorten" with g in green and "-shorten" in yellow
+# Yes, that's a "red" escape before.
+# CHECK: …in\e\[31m\e\[32mg\e\[33m-shorten
+
+set -l str (set_color blue)s(set_color red)t(set_color --bold brwhite)rin(set_color red)g(set_color yellow)-shorten
+for i in (seq 1 (string length -V -- $str))
+    set -l len (string shorten -m$i --left -- $str | string length -V)
+    test $len = $i
+    or echo Oopsie ellipsizing to $i failed
+end