Most of it is duplicated, hence untested.
Functions like mbrtowc are not exposed by the libc crate, so declare them
ourselves.
Since we don't know the definition of C macros, add two big hacks to make
this work:
1. Replace MB_LEN_MAX and mbstate_t with values (resp types) that should
be large enough for any implementation.
2. Detect the definition of MB_CUR_MAX in the build script. This requires
more changes for each new libc. We could also use this approach for 1.
Additionally, this commit brings a small behavior change to
read_unquoted_escape(): we cannot decode surrogate code points like \UDE01
into a Rust char, so use � (\UFFFD, replacement character) instead.
Previously, we added such code points to a wcstring; looks like they were
ignored when printed.
wcs2string converts a wide string to a narrow one. The result is
null-terminated and may also contain interior null-characters.
std::string allows this.
Rust's null-terminated string, CString, does not like interior null-characters.
This means we will need to use Vec<u8> or OsString for the places where we
use interior null-characters.
On the other hand, we want to use CString for places that require a
null-terminator, because other Rust types don't guarantee the null-terminator.
Turns out there is basically no overlap between the two use cases, so make
it two functions. Their equivalents in Rust will have the same name, so
we'll only need to adjust the type when porting.
This shows some of the ugliness of the rust borrow checker when it comes to
safely implementing any sort of recursive access and the need to be overly
explicit about which types are actually used across threads and which aren't.
We're forced to use an `Arc` for `ItemMaker` (née `item_maker_t`) because
there's no other way to make it clear that its lifetime will last longer than
the FdMonitor's. But once we've created an `Arc<T>` we can't call
`Arc::get_mut()` to get an `&mut T` once we've created even a single weak
reference to the Arc (because that weak ref could be upgraded to a strong ref at
any time). This means we need to finish configuring any non-atomic properties
(such as `ItemMaker::always_exit`) before we initialize the callback (which
needs an `Arc<ItemMaker>` to do its thing).
Because rust doesn't like self-referential types and because of the fact that we
now need to create both the `ItemMaker` and the `FdMonitorItem` separately
before we set the callback (at which point it becomes impossible to get a
mutable reference to the `ItemMaker`), `ItemMaker::item` is dropped from the
struct and we instead have the "constructor" for `ItemMaker` take a reference to
an `FdMonitor` instance and directly add itself to the monitor's set, meaning we
don't need to move the item out of the `ItemMaker` in order to add it to the
`FdMonitor` set later.
The way cxx bridge works, it doesn't recognize any types from another module as
being shared cxx bridge types with generations native to both C++ and Rust,
meaning every module that was going to use function pointers would have to
define its own `c_void` type (because cxx bridge doesn't recognize any of
libc::c_void, std::ffi::c_void, or autocxx::c_void).
FFI on other platforms has long used the equivalent of `uint8_t *` as an
alternative to `void *` for code where `void` was not available or was
undesirable for some reason. We can join the club - this way we can always use
`* {const|mut} u8` in our rust code and `uint8_t *` in our C++ code to pass
around parameters or values over the C abi.
I needed to rename some types already ported to rust so they don't clash with
their still-extant cpp counterparts. Helper ffi functions added to avoid needing
to dynamically allocate an FdMonitorItem for every fd (we use dozens per basic
prompt).
I ported some functions from cpp to rust that are used only in the backend but
without removing their existing cpp counterparts so cpp code can continue to use
their version of them (`wperror` and `make_detached_pthread`).
I ran into issues porting line-by-line logic because rust inverts the behavior
of `std::remove_if(..)` by making it (basically) `Vec::retain_if(..)` so I
replaced bools with an explict enum to make everything clearer.
I'll port the cpp tests for this separately, for now they're using ffi.
Porting closures was ugly. It's nothing hard, but it's very ugly as now each
capturing lambda has been changed into an explicit struct that contains its
parameters (that needs to be dynamically allocated), a standalone callback
(member) function to replace the lambda contents, and a separate trampoline
function to call it from rust over the shared C abi (not really relevant to
x86_64 w/ its single calling convention but probably needed on other platforms).
I don't like that `fd_monitor.rs` has its own `c_void`. I couldn't find a way to
move that to `ffi.rs` but still get cxx bridge to consider it a shared POD.
Every time I moved it to a different module, it would consider it to be an
opaque rust type instead. I worry this means we're going to have multiple
`c_void1`, `c_void2`, etc. types as we continue to port code to use function
pointers.
Also, rust treats raw pointers as foreign so you can't do `impl Send for * const
Foo` even if `Foo` is from the same module. That necessitated a wrapper type
(`void_ptr`) that implements `Send` and `Sync` so we can move stuff between
threads.
The code in fd_monitor_t has been split into two objects, one that is used by
the caller and a separate one associated with the background thread (this is
made nice and clean by rust's ownership model). Objects not needed under the
lock (i.e. accessed by the background thread exclusively) were moved to the
separate `BackgroundFdMonitor` type.
This is early work but I guess there's no harm in pushing it?
Some thoughts on the conventions:
Types that live only inside Rust follow Rust naming convention
("FeatureMetadata").
Types that live on both sides of the language boundary follow the existing
naming ("feature_flag_t").
The alternative is to define a type alias ("using feature_flag_t =
rust::FeatureFlag") but that doesn't seem to be supported in "[cxx::bridge]"
blocks. We could put it in a header ("future_feature_flags.h").
"feature_metadata_t" is a variant of "FeatureMetadata" that can cross
the language boundary. This has the advantage that we can avoid tainting
"FeatureMetadata" with "CxxString" and such. This is an experimental approach,
probably not what we should do in general.
The original implementation without the test took me 3 hours (first time
seriously looking into this)
The functions take "wcharz_t" for smooth integration with existing C++ callers.
This is at the expense of Rust callers, which would prefer "&wstr". Would be
nice to declare a function parameter that accepts both but I don't think
that really works since "wcharz_t" drops the lifetime annotation.
This renames abbreviation triggers from `--trigger-on entry` and
`--trigger-on exec` to `--on-space` and `--on-enter`. These names are less
precise, as abbreviations trigger on any character that terminates a word
or any key binding that triggers exec, but they're also more human friendly
and that's a better tradeoff.
set-cursor enables abbreviations to specify the cursor location after
expansion, by passing in a string which is expected to be found in the
expansion. For example you may create an abbreviation like `L!`:
abbr L! --position anywhere --set-cursor ! "! | less"
and the cursor will be positioned where the "!" is after expansion, with
the "| less" appearing to its right.
This adds support for the `--function` option of abbreviations, so that the
expansion of an abbreviation may be generated dynamically via a fish
function.
Prior to this change, abbreviations were stored as fish variables, often
universal. However we intend to add additional features to abbreviations
which would be very awkward to shoe-horn into variables.
Re-implement abbreviations using a builtin, managing them internally.
Existing abbreviations stored in universal variables are still imported,
for compatibility. However new abbreviations will need to be added to a
function. A follow-up commit will add it.
Now that abbr is a built-in, remove the abbr function; but leave the
abbr.fish file so that stale files from past installs do not override
the abbr builtin.
This allows adjusting a pattern string so that it matches an entire
string, by wrapping the regex in a group like ^(?:...)$
This is a workaround for the fact that PCRE2_ENDANCHORED is unavailable
on PCRE2 prior to 2017, so we have to adjust the pattern instead.
Also introduce an overload of match() which creates its own
match_data_t.
We wrongly highlight this as prefix when actually the trailing slash should
invalidate it. Turns out path normalization drops the slash, so let's
sidestep that.
Fixes#9394
As the user is typing an argument, fish continually checks if the input is
the prefix of a valid file path. If yes, the input is underlined.
The same prefix-logic is used for all tokens on the command line, even for
"finished" tokens. This means we highlight any token that happens to be
a prefix of a valid file path. We actually want this to only apply to the
token that the user is currently typing.
Let's use the prefix-logic only for tokens adjacent to the cursor. This should
better match user expectations (and reduce IO traffic). I don't think this is
the perfect criteria but I don't know how else we can determine if a token is
"unfinished".
When visiting the "cd" node, we mark invalid paths as error, but don't
underline valid paths. This works fine most of the time because we later
underline paths (for any command, not just "cd").
However the latter check fails to honor CDPATH. Let's correct that, which
also allows to simplify the logic.
This is a false positive as a result of disabling TLS support in LSAN due to an
incompatibility with newer versions of glibc.
Also remove the older workaround (because it didn't work).
Up to now, in normal locales \x was essentially the same as \X, except
that it errored if given a value > 0x7f.
That's kind of annoying and useless.
A subtle change is that `\xHH` now represents the character (if any)
encoded by the byte value "HH", so even for values <= 0x7f if that's
not the same as the ASCII value we would diverge.
I do not believe anyone has ever run fish on a system where that
distinction matters. It isn't a thing for UTF-8, it isn't a thing for
ASCII, it isn't a thing for UTF-16, it isn't a thing for any extended
ASCII scheme - ISO8859-X, it isn't a thing for SHIFT-JIS.
I am reasonably certain we are making that same assumption in other
places.
Fixes#1352
Closes#9240.
Squash of the following commits (in reverse-chronological order):
commit 03b5cab3dc40eca9d50a9df07a8a32524338a807
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 15:09:04 2022 -0500
Handle differently declared posix_spawnxxx_t on macOS
On macOS, posix_spawnattr_t and posix_spawn_file_actions_t are declared as void
pointers, so we can't use maybe_t's bool operator to test if it has a value.
commit aed83b8bb308120c0f287814d108b5914593630a
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:48:46 2022 -0500
Update maybe_t tests to reflect dynamic bool conversion
maybe_t<T> is now bool-convertible only if T _isn't_ already bool-convertible.
commit 2b5a12ca97b46f96b1c6b56a41aafcbdb0dfddd6
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:34:03 2022 -0500
Make maybe_t a little harder to misuse
We've had a few bugs over the years stemming from accidental misuse of maybe_t
with bool-convertible types. This patch disables maybe_t's bool operator if the
type T is already bool convertible, forcing the (barely worth mentioning) need
to use maybe_t::has_value() instead.
This patch both removes maybe_t's bool conversion for bool-convertible types and
updates the existing codebase to use the explicit `has_value()` method in place
of existing implicit bool conversions.