The regex struct is pretty large at 560 bytes, with the entire
Abbreviation being 664 bytes.
If it's an "Option<Regex>", any abbr gets to pay the price. Boxing it
means abbrs without a regex are over 500 bytes smaller.
IfStatement is 680 bytes, much larger than the other
variants (SwitchStatement is next at 232). An enum is as large as its
largest variant, so this saves a bunch, especially since
DecoratedStatement is much more likely than IfStatement.
This will speed up the no-execute benchmark by 1.07x.
Unlike C++, Rust requires "char" to be a valid Unicode code point. As a
workaround, we take the raw (probably UTF-8-encoded) input and convert each
input byte to a char representation from the private use area (see commit
3b15e995e (str2wcs: encode invalid Unicode characters in the private use
area, 2023-04-01)). We convert back whenever we output the string, which
is correct as long as the encoding didn't change since the data was input.
We also need to convert keyboard input; do that.
Quick testing shows that our reader drops PUA characters. Since this patch
converts both invalid Unicode input as well as PUA input into a safe PUA
representation, there's no longer a reason to not add PUA characters to
the commandline, so let's do that to restore traditional behavior.
Render them as � (REPLACEMENT CHARACTER); unfortunately we show one per
input byte instead of one per code point. To fix this we probably need our
own char type.
While at it, remove some special cases that try to prevent insertion of
control characters. I don't think they are necessary. Could be wrong..
This makes it so code like
```fish
echo foo
echo bar
```
is collapsed into
```fish
echo foo
echo bar
```
One empty line is allowed, more is overkill.
We could also allow more than one for e.g. function endings.
We don't need to know that it tried these five before finally getting
one, the list is *right there*.
It is also very unlikely that someone has "xterm" or "ansi" but not "xterm-256color"
For xterm-256color, we don't warn *at all* because we have that one hardcoded.
This allows us to get the terminfo information without linking against curses.
That means we can get by without a bunch of awkward C-API trickery.
There is no global "cur_term" kept by a library for us that we need to invalidate.
Note that it still requires a "unhashed terminfo database", and I don't know how well it handles termcap.
I am not actually sure if there are systems that *can't* have terminfo, everything I looked at
has the ncurses terminfo available to install at least.
... even if the file hasn't changed. This addresses an oddity in the following
case:
* Shell is started,
* function `foo` is sourced from foo.fish
* foo.fish is *externally* edited and saved
* <Loaded definition of `foo` is now stale, but fish is unaware>
* `funced foo` loads `type -p foo` showing changed definition, user exits
$EDITOR saving no changes (or with $status 0, more generally).
* Stale definition of `foo` remains
If a hostname starts with a dash `-` character, the prompt_hostname function
fails because the `string` function interprets it as an option instead
of an argument.
We still don't support tabs but as of the parent commit, there are no more
weird glitches, so it should be fine to recall those lines?
This reverts commit cc0e366037.
Inserting Tab or Backspace characters causes weird glitches. Sometimes it's
useful to paste tabs as part of a code block.
Render tabs as "␉" and so on for other ASCII control characters, see
https://unicode-table.com/en/blocks/control-pictures/. This fixes the
width-related glitches.
You can see it in action by inserting some control characters into the
command line:
set chars
for x in (seq 1 0x1F)
set -a chars (printf "%02x\\\\x%02x" $x $x)
end
eval set chars $chars
commandline -i "echo '" $chars
Fixes#6923Fixes#5274Closes#7295
We could extend this approach to display a fallback symbol for every unknown
nonprintable character, not just ASCII control characters.
In future we might want to support tab properly.