Commit graph

609 commits

Author SHA1 Message Date
Johannes Altmanninger
fdeb0d9f06 Port the rest of wcstringutil 2023-04-18 12:54:19 +02:00
Xiretza
aab2f660a7 Port math builtin, tinyexpr and wcstod_underscores to Rust 2023-04-16 22:26:46 +02:00
Johannes Altmanninger
971d257e67 Port AST to Rust
The translation is fairly direct though it adds some duplication, for example
there are multiple "match" statements that mimic function overloading.

Rust has no overloading, and we cannot have generic methods in the Node trait
(due to a Rust limitation, the error is like "cannot be made into an object")
so we include the type name in method names.

Give clients like "indent_visitor_t" a Rust companion ("IndentVisitor")
that takes care of the AST traversal while the AST consumption remains
in C++ for now.  In future, "IndentVisitor" should absorb the entirety of
"indent_visitor_t".  This pattern requires that "fish_indent" be exposed
includable header to the CXX bridge.

Alternatively, we could define FFI wrappers for recursive AST traversal.

Rust requires we separate the AST visitors for "mut" and "const"
scenarios. Take this opportunity to concretize both visitors:

The only client that requires mutable access is the populator.  To match the
structure of the C++ populator which makes heavy use of function overloading,
we need to add a bunch of functions to the trait. Since there is no other
mutable visit, this seems acceptable.

The "const" visitors never use "will_visit_fields_of()" or
"did_visit_fields_of()", so remove them (though this is debatable).

Like in the C++ implementation, the AST nodes themselves are largely defined
via macros.  Union fields like "Statement" and "ArgumentOrRedirection"
do currently not use macros but may in future.

This commit also introduces a precedent for a type that is defined in one
CXX bridge and used in another one - "ParseErrorList".  To make this work
we need to manually define "ExternType".

There is one annoyance with CXX: functions that take explicit lifetime
parameters require to be marked as unsafe. This makes little sense
because functions that return `&Foo` with implicit lifetime can be
misused the same way on the C++ side.

One notable change is that we cannot directly port "find_block_open_keyword()"
(which is used to compute an error) because it relies on the stack of visited
nodes. We cannot modify a stack of node references while we do the "mut"
walk. Happily, an idiomatic solution is easy: we can tell the AST visitor
to backtrack to the parent node and create the error there.

Since "node_t::accept_base" is no longer a template we don't need the
"node_visitation_t" trampoline anymore.

The added copying at the FFI boundary makes things slower (memcpy dominates
the profile) but it's not unusable, which is good news:

    $ hyperfine ./fish.{old,new}" -c 'source ../share/completions/git.fish'"
    Benchmark 1: ./fish.old -c 'source ../share/completions/git.fish'
      Time (mean ± σ):     195.5 ms ±   2.9 ms    [User: 190.1 ms, System: 4.4 ms]
      Range (min … max):   193.2 ms … 205.1 ms    15 runs

    Benchmark 2: ./fish.new -c 'source ../share/completions/git.fish'
      Time (mean ± σ):     677.5 ms ±  62.0 ms    [User: 665.4 ms, System: 10.0 ms]
      Range (min … max):   611.7 ms … 805.5 ms    10 runs

    Summary
      './fish.old -c 'source ../share/completions/git.fish'' ran
        3.47 ± 0.32 times faster than './fish.new -c 'source ../share/completions/git.fish''

Leftovers:
- Enum variants are still snakecase; I didn't get around to changing this yet.
- "ast_type_to_string()" still returns a snakecase name. This could be
  changed since  it's not user visible.
2023-04-16 17:46:56 +02:00
Johannes Altmanninger
05bad5eda1 Port common.{h,cpp} to Rust
Most of it is duplicated, hence untested.

Functions like mbrtowc are not exposed by the libc crate, so declare them
ourselves.
Since we don't know the definition of C macros, add two big hacks to make
this work:
1. Replace MB_LEN_MAX and mbstate_t with values (resp types) that should
   be large enough for any implementation.
2. Detect the definition of MB_CUR_MAX in the build script. This requires
   more changes for each new libc. We could also use this approach for 1.

Additionally, this commit brings a small behavior change to
read_unquoted_escape(): we cannot decode surrogate code points like \UDE01
into a Rust char, so use � (\UFFFD, replacement character) instead.
Previously, we added such code points to a wcstring; looks like they were
ignored when printed.
2023-04-02 15:17:06 +02:00
Johannes Altmanninger
998cb7f1cd New wcs2zstring to explicitly convert to zero-terminated strings
wcs2string converts a wide string to a narrow one.  The result is
null-terminated and may also contain interior null-characters.
std::string allows this.

Rust's null-terminated string, CString, does not like interior null-characters.
This means we will need to use Vec<u8> or OsString for the places where we
use interior null-characters.
On the other hand, we want to use CString for places that require a
null-terminator, because other Rust types don't guarantee the null-terminator.

Turns out there is basically no overlap between the two use cases, so make
it two functions. Their equivalents in Rust will have the same name, so
we'll only need to adjust the type when porting.
2023-04-02 15:17:06 +02:00
ridiculousfish
732f7284d4 Adopt the new termsize
This eliminates the C++ version.
2023-03-19 16:13:41 -07:00
ridiculousfish
57f4571a01 Rewrite wait handles and wait handle store in Rust 2023-03-18 18:53:04 -07:00
Mahmoud Al-Qudsi
455b744bca Port fd_monitor tests to rust
This shows some of the ugliness of the rust borrow checker when it comes to
safely implementing any sort of recursive access and the need to be overly
explicit about which types are actually used across threads and which aren't.

We're forced to use an `Arc` for `ItemMaker` (née `item_maker_t`) because
there's no other way to make it clear that its lifetime will last longer than
the FdMonitor's. But once we've created an `Arc<T>` we can't call
`Arc::get_mut()` to get an `&mut T` once we've created even a single weak
reference to the Arc (because that weak ref could be upgraded to a strong ref at
any time). This means we need to finish configuring any non-atomic properties
(such as `ItemMaker::always_exit`) before we initialize the callback (which
needs an `Arc<ItemMaker>` to do its thing).

Because rust doesn't like self-referential types and because of the fact that we
now need to create both the `ItemMaker` and the `FdMonitorItem` separately
before we set the callback (at which point it becomes impossible to get a
mutable reference to the `ItemMaker`), `ItemMaker::item` is dropped from the
struct and we instead have the "constructor" for `ItemMaker` take a reference to
an `FdMonitor` instance and directly add itself to the monitor's set, meaning we
don't need to move the item out of the `ItemMaker` in order to add it to the
`FdMonitor` set later.
2023-03-05 00:33:53 -06:00
Xiretza
8427e05bf7 Move escape_string tests to Rust
This way, both the Rust FFI wrapper and the actual C++ implementation are
tested.
2023-03-04 12:42:06 -08:00
Fabian Boehm
1aa3393f05 Test ifind bug with non-ascii codepoints 2023-03-02 16:33:20 +01:00
Johannes Altmanninger
14f3a5f79a Re-add highlighter tests
These were removed by accident.
2023-03-02 08:54:58 +01:00
Neeraj Jaiswal
f52569a800 abbr: port abbreviation and abbr builtin to rust 2023-02-25 12:24:58 +01:00
Neeraj Jaiswal
e384e63b24 re: port regex make anchored to rust and helper ffi funtions for regex 2023-02-25 12:24:57 +01:00
Johannes Altmanninger
b7041ad89b clang-format C++ files 2023-02-25 12:24:25 +01:00
Mahmoud Al-Qudsi
aaf2d1c19d Use * const u8 instead of * const c_void
The way cxx bridge works, it doesn't recognize any types from another module as
being shared cxx bridge types with generations native to both C++ and Rust,
meaning every module that was going to use function pointers would have to
define its own `c_void` type (because cxx bridge doesn't recognize any of
libc::c_void, std::ffi::c_void, or autocxx::c_void).

FFI on other platforms has long used the equivalent of `uint8_t *` as an
alternative to `void *` for code where `void` was not available or was
undesirable for some reason. We can join the club - this way we can always use
`* {const|mut} u8` in our rust code and `uint8_t *` in our C++ code to pass
around parameters or values over the C abi.
2023-02-19 15:42:07 -06:00
Mahmoud Al-Qudsi
ce559bc20e Port fd_monitor (and its needed components)
I needed to rename some types already ported to rust so they don't clash with
their still-extant cpp counterparts. Helper ffi functions added to avoid needing
to dynamically allocate an FdMonitorItem for every fd (we use dozens per basic
prompt).

I ported some functions from cpp to rust that are used only in the backend but
without removing their existing cpp counterparts so cpp code can continue to use
their version of them (`wperror` and `make_detached_pthread`).

I ran into issues porting line-by-line logic because rust inverts the behavior
of `std::remove_if(..)` by making it (basically) `Vec::retain_if(..)` so I
replaced bools with an explict enum to make everything clearer.

I'll port the cpp tests for this separately, for now they're using ffi.

Porting closures was ugly. It's nothing hard, but it's very ugly as now each
capturing lambda has been changed into an explicit struct that contains its
parameters (that needs to be dynamically allocated), a standalone callback
(member) function to replace the lambda contents, and a separate trampoline
function to call it from rust over the shared C abi (not really relevant to
x86_64 w/ its single calling convention but probably needed on other platforms).

I don't like that `fd_monitor.rs` has its own `c_void`. I couldn't find a way to
move that to `ffi.rs` but still get cxx bridge to consider it a shared POD.
Every time I moved it to a different module, it would consider it to be an
opaque rust type instead. I worry this means we're going to have multiple
`c_void1`, `c_void2`, etc. types as we continue to port code to use function
pointers.

Also, rust treats raw pointers as foreign so you can't do `impl Send for * const
Foo` even if `Foo` is from the same module. That necessitated a wrapper type
(`void_ptr`) that implements `Send` and `Sync` so we can move stuff between
threads.

The code in fd_monitor_t has been split into two objects, one that is used by
the caller and a separate one associated with the background thread (this is
made nice and clean by rust's ownership model). Objects not needed under the
lock (i.e. accessed by the background thread exclusively) were moved to the
separate `BackgroundFdMonitor` type.
2023-02-19 15:42:03 -06:00
Mahmoud Al-Qudsi
a1a8bc3d8d Port timer.cpp to rust 2023-02-14 15:54:18 -06:00
Johannes Altmanninger
39f3c894d7 Port tokenizer.cpp to Rust
In hindsight, I should probably have split this into three different commits.
2023-02-09 00:37:22 +01:00
Johannes Altmanninger
7f8d247211 Port parse_constants.h to Rust 2023-02-09 00:37:22 +01:00
Johannes Altmanninger
25816627de Port redirection.cpp to Rust 2023-02-09 00:37:22 +01:00
Johannes Altmanninger
9ca160eac2 Convert parse_error_code_t to a scoped enum
This will make the Rust port's diff smaller.
2023-02-08 21:49:54 +01:00
Johannes Altmanninger
4639f7ec40 Follow Rust naming convention for some types
But don't do it for enum variants just yet.
2023-02-08 21:49:54 +01:00
ridiculousfish
c2df63f586 Remove an errant printf from fish_tests 2023-02-04 11:24:54 -07:00
Johannes Altmanninger
83fd7ea7c4 Port future_feature_flags.cpp to Rust
This is early work but I guess there's no harm in pushing it?
Some thoughts on the conventions:

Types that live only inside Rust follow Rust naming convention
("FeatureMetadata").

Types that live on both sides of the language boundary follow the existing
naming ("feature_flag_t").
The alternative is to define a type alias ("using feature_flag_t =
rust::FeatureFlag") but that doesn't seem to be supported in "[cxx::bridge]"
blocks. We could put it in a header ("future_feature_flags.h").

"feature_metadata_t" is a variant of "FeatureMetadata" that can cross
the language boundary. This has the advantage that we can avoid tainting
"FeatureMetadata" with "CxxString" and such. This is an experimental approach,
probably not what we should do in general.
2023-02-03 18:55:06 +01:00
Johannes Altmanninger
517d53dc46 Port util.cpp to Rust
The original implementation without the test took me 3 hours (first time
seriously looking into this)

The functions take "wcharz_t" for smooth integration with existing C++ callers.
This is at the expense of Rust callers, which would prefer "&wstr".  Would be
nice to declare a function parameter that accepts both but I don't think
that really works since "wcharz_t" drops the lifetime annotation.
2023-02-03 18:55:06 +01:00
ridiculousfish
681a165721 Add an FFI test facility
This allow testing Rust functions (from fish_tests.cpp) which need to
cross the FFI. See the example in smoke.rs.
2023-02-02 19:34:48 -07:00
ridiculousfish
d843b67d2d Initial Rust commit 2023-02-02 19:34:47 -07:00
ridiculousfish
7fa13e4451 Remove enum_iter_t
This was unused.
2023-01-14 12:58:20 -08:00
Johannes Altmanninger
224f81e250 fish_tests: use default argument for abbreviation tests
Some tests place the cursor at the end of the command line.  This is the
obvious default, so let's make it a default argument.
2022-12-17 18:09:54 +01:00
Johannes Altmanninger
2df8bde5eb fish_tests: test that make_anchored regex helper actually anchors 2022-12-17 18:09:54 +01:00
ridiculousfish
01039537b0 Remove abbreviation triggers
Per code review, this does not add enough value to introduce now.
Leaving the feature in history should want want to revisit this
in the future.
2022-12-10 16:15:00 -08:00
ridiculousfish
35a4688650 Rename abbreviation triggers
This renames abbreviation triggers from `--trigger-on entry` and
`--trigger-on exec` to `--on-space` and `--on-enter`. These names are less
precise, as abbreviations trigger on any character that terminates a word
or any key binding that triggers exec, but they're also more human friendly
and that's a better tradeoff.
2022-12-10 15:38:50 -08:00
ridiculousfish
5841e9f712 Remove '--quiet' feature of abbreviations
Per code review, this is too risky to introduce now. Leaving the feature
in history should want want to revisit this in the future.
2022-12-10 15:38:50 -08:00
ridiculousfish
c51a1f1f60 Implement trigger-on for abbreviations
trigger-on enables abbreviations to trigger only on "entry" (anything
which closes a token, like space) or only on "exec" (typically enter key).
2022-12-10 15:38:50 -08:00
ridiculousfish
7118cb1ae1 Implement set-cursor for abbreviations
set-cursor enables abbreviations to specify the cursor location after
expansion, by passing in a string which is expected to be found in the
expansion. For example you may create an abbreviation like `L!`:

    abbr L! --position anywhere --set-cursor ! "! | less"

and the cursor will be positioned where the "!" is after expansion, with
the "| less" appearing to its right.
2022-12-10 15:38:50 -08:00
ridiculousfish
1d205d0bbd Reimplement abbreviation expansion to support quiet abbreviations
This reimplements abbreviation to support quiet abbreviations. Quiet
abbreviations expand "in secret" before execution.
2022-12-10 15:38:46 -08:00
ridiculousfish
8135c52c13 Abbreviations to support functions
This adds support for the `--function` option of abbreviations, so that the
expansion of an abbreviation may be generated dynamically via a fish
function.
2022-12-10 15:29:04 -08:00
ridiculousfish
d15855d3e3 Abbreviations to support matching via regex
This adds the --regex option to abbreviations, allowing them to match a
pattern of tokens.
2022-12-10 15:29:04 -08:00
ridiculousfish
470153c0df Refactor abbreviation set into its own type
Previously the abbreviation map was just an unordered map; switch it to a
real class so we can hang methods off of it.
2022-12-10 15:29:04 -08:00
ridiculousfish
1402bae7f4 Re-implement abbreviations as a built-in
Prior to this change, abbreviations were stored as fish variables, often
universal. However we intend to add additional features to abbreviations
which would be very awkward to shoe-horn into variables.

Re-implement abbreviations using a builtin, managing them internally.

Existing abbreviations stored in universal variables are still imported,
for compatibility. However new abbreviations will need to be added to a
function. A follow-up commit will add it.

Now that abbr is a built-in, remove the abbr function; but leave the
abbr.fish file so that stale files from past installs do not override
the abbr builtin.
2022-12-10 15:29:03 -08:00
ridiculousfish
d2daa921e9 Introduce re::make_anchored
This allows adjusting a pattern string so that it matches an entire
string, by wrapping the regex in a group like ^(?:...)$

This is a workaround for the fact that PCRE2_ENDANCHORED is unavailable
on PCRE2 prior to 2017, so we have to adjust the pattern instead.

Also introduce an overload of match() which creates its own
match_data_t.
2022-12-10 12:24:43 -08:00
ridiculousfish
fe7d095647 Add maybe_t::value_or
This enables getting the value or returning the passed-in value.
This is helpful for "default if none."
2022-12-10 12:24:43 -08:00
ridiculousfish
b0ec7e07b8 Fix a wgetopt crash and add a test
This has apparently been a problem since forever.
2022-12-09 13:54:00 -08:00
Johannes Altmanninger
6072ea1900 Fix false positive cd higlighting when token ends in slash
We wrongly highlight this as prefix when actually the trailing slash should
invalidate it. Turns out path normalization drops the slash, so let's
sidestep that.

Fixes #9394
2022-12-03 22:36:56 +01:00
ridiculousfish
f1e73a1839 Increase debounce timeout in debounce test
On slow machines this was spuriously failing.
2022-11-12 14:08:22 -08:00
Aaron Gyes
df546e01f6 IWYU fixup 2022-10-26 20:04:04 -07:00
Johannes Altmanninger
f637fb31b5 highlight: underline prefixes of valid paths only if at cursor
As the user is typing an argument, fish continually checks if the input is
the prefix of a valid file path. If yes, the input is underlined.

The same prefix-logic is used for all tokens on the command line, even for
"finished" tokens. This means we highlight any token that happens to be
a prefix of a valid file path. We actually want this to only apply to the
token that the user is currently typing.

Let's use the prefix-logic only for tokens adjacent to the cursor.  This should
better match user expectations (and reduce IO traffic). I don't think this is
the perfect criteria but I don't know how else we can determine if a token is
"unfinished".
2022-10-26 16:12:43 +02:00
Johannes Altmanninger
6667c9f50c highlighter: pass the cursor position to the highlighter
This allows the next commit to correct highlighting based on the cursor
position.
2022-10-26 16:11:00 +02:00
Johannes Altmanninger
861ac00a61 highlighter: underline valid "cd" arguments also if they come from CDPATH
When visiting the "cd" node, we mark invalid paths as error, but don't
underline valid paths.  This works fine most of the time because we later
underline paths (for any command, not just "cd").
However the latter check fails to honor CDPATH.  Let's correct that, which
also allows to simplify the logic.
2022-10-26 16:11:00 +02:00
Johannes Altmanninger
45da77c5c5 Format some C++ files with clang-format 2022-10-26 14:53:06 +02:00