The translation is fairly direct though it adds some duplication, for example
there are multiple "match" statements that mimic function overloading.
Rust has no overloading, and we cannot have generic methods in the Node trait
(due to a Rust limitation, the error is like "cannot be made into an object")
so we include the type name in method names.
Give clients like "indent_visitor_t" a Rust companion ("IndentVisitor")
that takes care of the AST traversal while the AST consumption remains
in C++ for now. In future, "IndentVisitor" should absorb the entirety of
"indent_visitor_t". This pattern requires that "fish_indent" be exposed
includable header to the CXX bridge.
Alternatively, we could define FFI wrappers for recursive AST traversal.
Rust requires we separate the AST visitors for "mut" and "const"
scenarios. Take this opportunity to concretize both visitors:
The only client that requires mutable access is the populator. To match the
structure of the C++ populator which makes heavy use of function overloading,
we need to add a bunch of functions to the trait. Since there is no other
mutable visit, this seems acceptable.
The "const" visitors never use "will_visit_fields_of()" or
"did_visit_fields_of()", so remove them (though this is debatable).
Like in the C++ implementation, the AST nodes themselves are largely defined
via macros. Union fields like "Statement" and "ArgumentOrRedirection"
do currently not use macros but may in future.
This commit also introduces a precedent for a type that is defined in one
CXX bridge and used in another one - "ParseErrorList". To make this work
we need to manually define "ExternType".
There is one annoyance with CXX: functions that take explicit lifetime
parameters require to be marked as unsafe. This makes little sense
because functions that return `&Foo` with implicit lifetime can be
misused the same way on the C++ side.
One notable change is that we cannot directly port "find_block_open_keyword()"
(which is used to compute an error) because it relies on the stack of visited
nodes. We cannot modify a stack of node references while we do the "mut"
walk. Happily, an idiomatic solution is easy: we can tell the AST visitor
to backtrack to the parent node and create the error there.
Since "node_t::accept_base" is no longer a template we don't need the
"node_visitation_t" trampoline anymore.
The added copying at the FFI boundary makes things slower (memcpy dominates
the profile) but it's not unusable, which is good news:
$ hyperfine ./fish.{old,new}" -c 'source ../share/completions/git.fish'"
Benchmark 1: ./fish.old -c 'source ../share/completions/git.fish'
Time (mean ± σ): 195.5 ms ± 2.9 ms [User: 190.1 ms, System: 4.4 ms]
Range (min … max): 193.2 ms … 205.1 ms 15 runs
Benchmark 2: ./fish.new -c 'source ../share/completions/git.fish'
Time (mean ± σ): 677.5 ms ± 62.0 ms [User: 665.4 ms, System: 10.0 ms]
Range (min … max): 611.7 ms … 805.5 ms 10 runs
Summary
'./fish.old -c 'source ../share/completions/git.fish'' ran
3.47 ± 0.32 times faster than './fish.new -c 'source ../share/completions/git.fish''
Leftovers:
- Enum variants are still snakecase; I didn't get around to changing this yet.
- "ast_type_to_string()" still returns a snakecase name. This could be
changed since it's not user visible.
More ugliness with types that cxx bridge can't recognize as being POD. Using
pointers to get/set `termios` values with an assert to make sure we're using
identical definitions on both sides (in cpp from the system headers and in rust
from the libc crate as exported).
I don't know why cxx bridge doesn't allow `SharedPtr<OpaqueRustType>` but we can
work around it in C++ by converting a `Box<T>` to a `shared_ptr<T>` then convert
it back when it needs to be destructed. I can't find a clean way of doing it
from the cxx bridge wrapper so for now it needs to be done manually in the C++
code.
Types/values that are drop-in ready over ffi are renamed to match the old cpp
names but for types that now differ due to ffi difficulties I've left the `_ffi`
in the function names to indicate that this isn't the "correct" way of using the
types/methods.
This works around an autocxx limitations where different types cannot
have the same name even if they live in different namespace.
ast::job_t conflicts with job_t.
It seems to have originally been thought that the only possible way a stack
overflow could happen is via function calls, but there are other possibilities.
Issue #9302 reports how `eval` can be abused to recursively execute a string
substitution ad infinitum, triggering a stack overflow in fish.
This patch extends the stack overflow check to also check the current
`eval_level` against a new constant `FISH_MAX_EVAL_DEPTH`, currently set to a
conservative but hopefully still fair limit of 500. For future reference, with
the default stack size for the main/foreground thread of 8 MiB, we actually have
room for a stack depth around 2800, but that's only with extremely minimal state
stored in each stack frame.
I'm not entirely sure why we don't check `eval_depth` regardless of block type;
it can't be for performance reasons since it's just a simple integer comparison
- and a ridiculously easily one for the branch predictor handle, at that - but
maybe it's to try and support non-recursive nested execution blocks of greater
than `FISH_MAX_STACK_DEPTH`? But even without recursion, the stack can still
overflow so may be we should just bump the limit up some (to 500 like the new
`FISH_MAX_EVAL_DEPTH`?) and check it all the time?
Closes#9302.
Closes#9240.
Squash of the following commits (in reverse-chronological order):
commit 03b5cab3dc40eca9d50a9df07a8a32524338a807
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 15:09:04 2022 -0500
Handle differently declared posix_spawnxxx_t on macOS
On macOS, posix_spawnattr_t and posix_spawn_file_actions_t are declared as void
pointers, so we can't use maybe_t's bool operator to test if it has a value.
commit aed83b8bb308120c0f287814d108b5914593630a
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:48:46 2022 -0500
Update maybe_t tests to reflect dynamic bool conversion
maybe_t<T> is now bool-convertible only if T _isn't_ already bool-convertible.
commit 2b5a12ca97b46f96b1c6b56a41aafcbdb0dfddd6
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:34:03 2022 -0500
Make maybe_t a little harder to misuse
We've had a few bugs over the years stemming from accidental misuse of maybe_t
with bool-convertible types. This patch disables maybe_t's bool operator if the
type T is already bool convertible, forcing the (barely worth mentioning) need
to use maybe_t::has_value() instead.
This patch both removes maybe_t's bool conversion for bool-convertible types and
updates the existing codebase to use the explicit `has_value()` method in place
of existing implicit bool conversions.
Let's hope this doesn't causes build failures for e.g. musl: I just
know it's good on macOS and our Linux CI.
It's been a long time.
One fix this brings, is I discovered we #include assert.h or cassert
in a lot of places. If those ever happen to be in a file that doesn't
include common.h, or we are before common.h gets included, we're
unawaringly working with the system 'assert' macro again, which
may get disabled for debug builds or at least has different
behavior on crash. We undef 'assert' and redefine it in common.h.
Those were all eliminated, except in one catch-22 spot for
maybe.h: it can't include common.h. A fix might be to
make a fish_assert.h that *usually* common.h exports.
ESCAPE_ALL is not really a helpful name. Also it's the most common flag.
Let's make it the default so we can remove this unhelpful name.
While at it, let's add a default value for the flags argument, which helps
most callers.
The absence of ESCAPE_ALL makes it only escape nonprintable characters
(with some exceptions). We use this for displaying strings in the completion
pager as well as for the human-readable output of "set", "set -S", "bind"
and "functions".
No functional change.
Prior to this commit, setting a universal variable may trigger syncing
against the file which will modify other universal variables. But if we
want to support multiple environments we need the parser to decide when to
sync uvars. Shift the decision of when to sync to the parser itself. When a
universal variable is modified, now we just set a flag and it's up to the
(main) parser when to pick it up. This is hopefully just a refactoring with
no user-visible changes.
This adds a path builtin to deal with paths.
It offers the following subcommands:
filter to go through a list of paths and only print the ones that pass some filter - exist, are a directory, have read permission, ...
is as a shortcut for filter -q to only return true if one of the paths passed the filter
basename, dirname and extension to print certain parts of the path
change-extension to change the extension to a different one (as a string operation)
normalize and resolve to canonicalize the paths in various flavors
sort to sort paths, also only using the basename or dirname as a key
The definition of "extension" here was carefully considered and should line up with how extensions are actually used - ~/.bashrc doesn't have an extension, but ~/.conf.d does (".d").
These subcommands all compose well - they can read from arguments or stdin (like string), they can use null-delimited input or output (input is autodetected - if a NULL happens in the first PATH_MAX bytes it switches automatically).
It is both a failglob exception (so like set if a glob passed to it fails it just doesn't get any arguments for it instead of triggering an error), and passes output to command substitution buffers explicitly split (like string split0) so newlines are easy to handle.
This is needed because you might feasibly give e.g. `path filter`
globs to further match, and they might already present no results.
It's also well-handled since path simply does nothing if given no paths.
This cleans up the path_get_path function which is used to resolve a
command name against $PATH, by removing the dependence on errno and
being explicit about which error is returned.
Should be no user-visible change here.
This was already apparently supposed to work, but didn't because we
just overrode errno again.
This now means that, if a correctly named candidate exists, we don't
start the command-not-found handler.
See #8804
Cancellation groups were meant to reflect the following idea: if you ran a
simple block:
begin
cmd1
cmd2
end
then under job control, cmd1 and cmd2 would get separate groups; however if
either exits due to SIGINT or SIGQUIT we also want to propagate that to the
outer block. So the outermost block and its interior jobs would share a
cancellation group. However this is more complex than necessary; it's
sufficient for the execution context to just store an int internally.
This ought not to affect anything user-visible.
This is a cleanup of job groups, rationalizing a bunch of stuff. Some
notable changes (none user-visible hopefully):
1. Previously, if a job group wanted a pgid, then we would assign it to the
first process to run in the job group. Now we deliberately mark which
process will own the pgroup, via a new `leads_pgrp` flag in process_t. This
eliminates a source of ambiguity.
2. Previously, if a job were run inside fish's pgroup, we would set fish's
pgroup as the group of the job. But this meant we had to check if the job
had fish's pgroup in lots of places, for example when calling tcsetpgrp.
Now a job group only has a pgrp if that pgrp is external (i.e. the job is
under job control).
+ No functional change here, just renames and #include changes.
+ CMake can't have slashes in the target names. I'm suspciious of
that weird machinery for test, but I made it work.
+ A couple of builtins did not include their own headers, that
is no longer the case.
- Introduce BUILTIN_ERR_COMBO2_EXCLUSIVE
- Distill generally more terse, unambiguous error descriptions.
Remember English is not everyone's language.
- Do not capitalize sentence fragments
- Use the modality where problem input is in a %s: prefix, then
is explained.
- Do not address the user (the "You cannot do ..." kraderism)
- Spell out 'arguments' rather than 'args' for consistency
- Mention 'function' as a scope
Since #4376, for-loops would set the loop variable outside, so it
stays valid.
They did this by doing the equivalent of
```fish
set -l foo $foo
for foo in 1 2 3
```
And that first imaginary `set -l` would also fire a set-event.
Since there's no use for it and the variable isn't actually set, we
remove it.
Fixes#8384.
This used to construct a vector, which was then passed down and filled
with a new event_t each go around the loop. That's useless - we fire
one event here, and it's simply the variable event.
This reduces the overhead of a for-loop by ~10%:
```fish
for i in (seq 100000)
true
end
```
runs in about 90% of the time now.
This disables job control inside command substitutions. Prior to this
change, a cmdsub might get its own process group. This caused it to fail
to cancel loops properly. For example:
while true ; echo (sleep 5) ; end
could not be control-C cancelled, because the signal would go to sleep,
and so the loop would continue on. The simplest way to fix this is to
match other shells and not use job control in cmdsubs.
Related is #1362
for PWD in foo; true; end
prints:
>..src/parse_execution.cpp:461: end_execution_reason_t parse_execution_context_t::run_for_statement(const ast::for_header_t&, const ast::job_list_t&): Assertion `retval == ENV_OK' failed.
because this used the wrong way to see if something is read-only.
is_block is a field which supports 'status is-block', and also controls
whether notifications get posted. However there is no reason to store
this as a distinct field since it is trivially computed from the block
list. Stop storing it. No functional changes in this commit.
The user may write for example:
echo foo >&5
and fish would try to output to file descriptor 5, within the fish process
itself. This has unpredictable effects and isn't useful. Make this an
error.
Note that the reverse is "allowed" but ignored:
echo foo 5>&1
this conceptually dup2s stdout to fd 5, but since no builtin writes to fd
5 we ignore it.