This is a *tiny* commit code-wise, but the explanation is a bit
longer.
When I made string read in chunks, I picked a chunk size from bash's
read, under the assumption that they had picked a good one.
It turns out, on the (linux) systems I've tested, that's simply not
true.
My tests show that a bigger chunk size of up to 4096 is better *across
the board*:
- It's better with very large inputs
- It's equal-to-slightly-better with small inputs
- It's equal-to-slightly-better even if we quit early
My test setup:
0. Create various fish builds with various sizes for
STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE".
1. Download the npm package names from
https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I
used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB)
2. Extract the names so we get a line-based version:
```fish
jq '.[]' names.json | string trim -c '"' >/tmp/all
```
3. Create various sizes of random extracts:
```fish
for f in 10000 1000 500 50
shuf /tmp/all | head -n $f > /tmp/$f
end
```
(the idea here is to defeat any form of pattern in the input).
4. Run benchmarks:
hyperfine -w 3 ./fish-{128,512,1024,2048,4096}"
-c 'for i in (seq 1000)
string match -re foot < $f
end; true'"
(reduce the seq size for the larger files so you don't have to wait
for hours - the idea here is to have some time running string and not
just fish startup time)
This shows results pretty much like
```
Summary
'./fish-2048 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true'' ran
1.01 ± 0.02 times faster than './fish-4096 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.02 ± 0.03 times faster than './fish-1024 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.08 ± 0.03 times faster than './fish-512 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.47 ± 0.07 times faster than './fish-128 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
```
So we see that up to 1024 there's a difference, and after that the
returns are marginal. So we stick with 1024 because of the memory
trade-off.
----
Fun extra:
Comparisons with `grep` (GNU grep 3.7) are *weird*. Because you both
get
```
'./fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran
11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end''
```
and
```
'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran
66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
```
Basically, if you *can* give grep a lot of work at once (~40MB in this
case), it'll churn through it like butter. But if you have to call it
a lot, string beats it by virtue of cheating.
clang-format (since 10) can output diagnostics which indicate
lines needing formatting with --dry-run and -Werror: the exit
code indicates if a file is correctly formatted or not.
We used to copy each .cpp file, run clang_format on the duplicate
and then `cmp` to see if there were changes made, before just
printing a line with the filename and moving the new ontop of
the original.
Now we show clang-format diagnostics which indicate which
lines will be changed, prompt for confirmation and then let
clang-format modify the files in-place without the juggling.
Looks like this: https://user-images.githubusercontent.com/291142/184561633-c16754c8-179e-426b-ba15-345ba65b9cf9.png
Rephrase this to more explicitly indicate that the uvar actually
was successfully set. I believe the prior phrasing can leave some
ambiguity as far as wether set just failed with an error, whether it
has done anything or not.
Now uses the same macro other builtins use for a missing -e arg,
and the error message show the short or long option as it was used.
e.g. before
$ set -e
set: Erase needs a variable name
after
$ set --erase
set: --erase: option requires an argument
$ set -e
set: -e: option requires an argument
This moves the stuff that creates skeleton/boilerplate files to
the same place we initialize uvars for the first time or on upgrade.
Being a bit less aggresssive here theoretically makes launch a little
lighter but really I personally just found it weird I couldn't
just delete my empty config.fish file without it getting recreated
and sourced every launch.
A recenty commit was loathe to assume the unicode ellipsis character
was safe so just used '..' instead. However I noticed we actually
already do use that character elsehwere in the completions.
So, just make both spots try to somewhat carefully use it.
We do this same `string match` check on LANG in fish_job_summary.fish
Intern'd strings were intended to be "shared" to reduce memory usage but
this optimization doesn't carry its weight. Remove it. No functional
change expected.
We store filenames in function definitions to indicate where the
function comes from. Previously these were intern'd strings. Switch them
to a shared_ptr<wcstring>, intending to remove intern'd strings.
The history pager will show multiline commands in single-line cells.
We escape newline characters as \\n but that looks awkward if the next line
starts with a letter. Let's render control characters using their corresponding
symbol from the Control Pictures Unicode block.
This means there is also no need to escape backslashes, which further improves
the history pager - now the rendering has exactly as many backslashes as
the eventual command.
This means that (multiline) commands in the history pager will be rendered
with the same amount of characters as are in the actual command (unless
they contain funny nonprintables). This makes it easy for the next commit
to highlight multiline commands correctly in the history pager.
The font size for these symbols (for example ␉) is quite small, but that's
okay since for the proposed uses it's not so important that they readable.
The important thing is that the stand out from surrounding text.
When adding a VLAN-enabled interface, it is named like enp0s31f6.100@enp0s31f6
with the physical interface being appended behind an @.
But subsequent ip commands operate on the interface name without this suffix,
so it needs to be removed when completing interface names in __fish_ip_device
This checked specifically for "| and" and "a=b" and then just gave the
error without a caret at all.
E.g. for a /tmp/broken.fish that contains
```fish
echo foo
echo foo | and cat
```
This would print:
```
/tmp/broken.fish (line 3): The 'and' command can not be used in a pipeline
warning: Error while reading file /tmp/broken.fish
```
without any indication other than the line number as to the location
of the error.
Now we do
```
/tmp/broken.fish (line 3): The 'and' command can not be used in a pipeline
echo foo | and cat
^~^
warning: Error while reading file /tmp/broken.fish
```
Another nice one:
```
fish --no-config -c 'echo notprinted; echo foo; a=b'
```
failed to give the error message!
(Note: Is it really a "warning" if we failed to read the one file we
wer told to?)
We should check if we should either centralize these error messages
completely, or always pass them and remove this "code" system, because
it's only used in some cases.
This skipped printing a "^" line if the start or length of the error
was longer than the source.
That seems like the correc thing at first glance, however it means
that the caret line isn't skipped *if the file goes on*.
So, for example
```fish
echo "$abc["
```
by itself, in a file or via `fish -c`, would not print an error, but
```fish
echo "$abc["
true
```
would. That's not a great way to print errors.
So instead we just.. imagine the start was at most at the end.
The underlying issue why `echo "$abc["` causes this is that `wcstol`
didn't move the end pointer for the index value (because there is no
number there). I'd fix this, but apparently some of
our recursive variable calls absolutely rely on this position value.
This makes the awkward case
fish: Unexpected end of string, square brackets do not match
echo f[oo # not valid, no matching ]
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(that `]` is simply the last character on the line, it's firmly in a comment)
less awkward by only marking the starting brace.
The implementation here is awkward mostly because the tok_t
communicates two things: The error location and how to carry on.
So we need to store the error length separately, and this is the first
time we've done so.
It's possible we can make this simpler.
This makes it so instead of marking the error location with a simple
`^`, we mark it with a caret, then a run of `~`, and then an ending `^`.
This makes it easier to see where exactly an error occured, e.g. which
command substitution was meant.
Note: Because this uses error locations that haven't been exposed like
that, it's likely to shake out weirdnesses and inaccuracies. For that
reason I've not adjusted the tests yet.
This stops us from loading the completions for e.g. `./foo` if there
is no `foo` in path.
This is because the completion scripts will call an unqualified `foo`,
and then error out.
This of course means if the script would work because it never calls
the command, we still don't load it.
Pathed completions via `complete --path` should be unaffected because
they aren't autoloaded anyway.
Workaround for #3117Fixes#9133
These should be friendlier, but aren't as pedantically accurate.
I think the term "index" is terrible and much prefer "staging area".
Also "rev-parse" simply must be believed to be seen, it can't be
described in a single paragraph. (did you know you can use `git
rev-parse --parseopt` as a replacement for `getopt` in arbitrary
shell scripts?)
This was misguidedly "fixed" in
9e08609f85, which made printf error out
with any "-"-prefixed words as the first argument.
Note: This means currently `printf --help` doesn't print the help.
This also matches `echo`, and we currently don't have anything to make
a literal `--help` execute a builtin help except for keywords. Oh well.
Fixes#9132
"socket.has_ipv6" is basically useless - it tells you python has
been *compiled* with ipv6 support.
Instead just try ipv6 and if that fails with EAFNOSUPPORT (checking
the actual errno), try v4.
Yes, I explicitly do not care to test this on python2.
Fixes#3857
I have an alias called "lg" for
log --color --graph --pretty=format:\'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset\' --abbrev-commit --first-parent
Having that in my completions ensures that git commands essentially
always use one column at most. That's not great, so we now shorten it
to 35 chars (plus an annoying 2 for ".." because I can't be bothered
to check for unicode support - an argument for a "string ellipsize", I guess?)
* Disclose pager to screen height immediately
This removes that bit where we only show 4 rows at most at first,
instead we disclose between half of terminal height up to the full terminal height (but still at least 4 rows).
This results in less pressing of tab to get the other results, and
better visibility of results.
Unlike moving it to the actual top of the screen, it's not as jarring and doesn't push terminal history off-screen as much.
Fixes#2698
This used to be kept, so e.g. testing it with
fish_read_limit=5 echo (string repeat -n 10 a)
would cause the prompt and such to error as well.
Also there was no good way to get back to the default value
afterwards.
* string repeat: Don't allocate repeated string all at once
This used to allocate one string and fill it with the necessary
repetitions, which could be a very very large string.
Now, it instead uses one buffer and fills it to a chunk size,
and then writes that.
This fixes:
1. We no longer crash with too large max/count values. Before they
caused a bad_alloc because we tried to fill all RAM.
2. We no longer fill all RAM if given a big-but-not-too-big value. You
could've caused fish to eat *most* of your RAM here.
3. It can start writing almost immediately, instead of waiting
potentially minutes to start.
Performance is about the same to slightly faster overall.