This adds support for `fish_trace`, a new variable intended to serve the
same purpose as `set -x` as in bash. Setting this variable to anything
non-empty causes execution to be traced. In the future we may give more
specific meaning to the value of the variable.
The user's prompt is not traced unless you run it explicitly. Events are
also not traced because it is noisy; however autoloading is.
Fixes#3427
Soon we will have more complicated logic around whether to call tcsetpgrp.
Prepare to centralize the logic by passing in the new term owner pgrp,
instead of having child_setup_process perform the decision.
When executing a job, if the first process is fish internal, then have
fish claim the job's pgroup.
The idea here is that the terminal must be owned by a pgroup containing
the process reading from the terminal. If the first process is fish
internal (a function or builtin) then the pgroup must contain the fish
process.
This is a bit of a workaround of the behavior where the first process that
executes in a job becomes the process group leader. If there's a deferred
process, then we will execute processes out of order so the pgroup can be
wrong. Fix this by setting the process group leader explicitly as fish
when necessary.
Fixes#5855
25afc9b377 made this unnecessary by
having child processes wait for a signal after fork(), but this change
was later reverted. If we artificially slow down fish (e.g. with a sleep)
after the fork call, we see commands getting backgrounded by mistake.
Put back the tcsetgrp() call.
Prior to this fix, a function_block stored a process_t, which was only used
when printing backtraces. Switch this to an array of arguments, and make
various other cleanups around null terminated argument arrays.
This runs build_tools/style.fish, which runs clang-format on C++, fish_indent on fish and (new) black on python.
If anything is wrong with the formatting, we should fix the tools, but automated formatting is worth it.
Prior to this change, fish used a global flag to decide if we should check
for changes to universal variables. This flag was then checked at arbitrary
locations, potentially triggering variable updates and event handlers for
those updates; this was very hard to reason about.
Switch to triggering a universal variable update at a fixed location,
after running an external command. The common case is that the variable
file has not changed, which we can identify with just a stat() call, so
this is pretty cheap.
This reverts commit cdce8511a1.
This change was unsafe. The prior version (now restored) took the lock and
then copied the data. By returning a reference, the caller holds a
reference to data outside of the lock.
This function isn't worth optimizing. Hardly any functions use this
facility, and for those that do, they typically just capture one or two
variables.
* Convert `function_get_inherit_vars()` to return a reference to the
(possibly) existing map, rather than a copy;
* Preallocate and reuse a static (read-only) map for the (very) common
case of no inherited vars;
* Pass references to the inherit vars map around thereafter, never
triggering the map copy (or even move) constructor.
NB: If it turns out the reference is unsafe, we can switch the inherit vars
to be a shared_ptr and return that instead.
I did not realize builtins could safely call into the parser and inject
jobs during execution. This is much cleaner than hacking around the
required shape of a plain_statement.
`eval` has always been implemented as a function, which was always a bit
of a hack that caused some issues such as triggering the creation of a
new scope. This turns `eval` into a decorator.
The scoping issues with eval prevented it from being usable to actually
implement other shell components in fish script, such as the problems
described in #4442, which should now no longer be the case.
Closes#4443.
While `eval` is still a function, this paves the way for changing that
in the future, and lets the proc/exec functions detect when an eval is
used to allow/disallow certain behaviors and optimizations.
Prior to this fix, a job would only inherit a pgrp from its parent if the
first command were external. There seems to be no reason for this
restriction and this causes tcsetgrp() churn, potentially cuasing SIGTTIN.
Switch to unconditionally inheriting a pgrp from parents.
This should fix most of #5765, the only remaining question is
tcsetpgrp from builtins.
Prior to this fix, in every call to job_continue, fish would reclaim the
foreground pgrp. This would cause other jobs in the pipeline (which may
have another pgrp) to receive SIGTTIN / SIGTTOU.
Only reclaim the foreground pgrp if it was held at the point of job_continue.
This partially addresses #5765
If a function process is deferred, allow it to be unbuffered.
This permits certain simple cases where functions are piped to external
commands to execute without buffering.
This is a somewhat-hacky stopgap measure that can't really be extended
to more general concurrent processes. However it is overall an improvement
in user experience that might help flush out some bugs too.
In a job, a deferred process is the last fish internal process which pipes
to an external command. Execute the deferred process last; this will allow
for streaming its output.
In fish we play fast and loose with status codes as set directly (e.g. on
failed redirections), vs status codes returned from waitpid(), versus the
value $status. Introduce a new value type proc_status_t to encapsulate
this logic.
Now that we use an internal process to perform builtin output, simplify the
logic around how it is performed. In particular we no longer have to be
careful about async-safe functions since we do not fork.
Also fix a bunch of comments that no longer apply.
This uses the new internal process mechanism to write output for builtins.
After this the only reason fish ever forks is to execute external processes.
This introduces "internal processes" which are backed by a pthread instead
of a normal process. Internal processes are reaped using the topic
machinery, plugging in neatly alongside the sigchld topic; this means that
process_mark_finished_children() can wait for internal and external
processes simultaneously.
Initially internal processes replace the forked process that fish uses to
write out the output of blocks and functions.
The sigchld generation expresses the idea that, if we receive a sigchld
signal, the generation will be different than when we last recorded it. A
process cannot exit before it has launched, so check the generation count
before process launch. This is an optimization that reduces failing
waitpid calls.
This is a large change to how io_buffers are filled. The essential problem
comes about with code like (example):
echo ( /bin/pwd )
The output of /bin/pwd must go to fish, not the tty. To arrange for this,
fish does the following:
1. Invoke pipe() to create a pipe.
2. Add an io_bufferfill_t redirection that owns the write end of the pipe.
3. After fork (or equiv), call dup2() to replace pwd's stdout with this pipe.
Now when /bin/pwd writes, it will send output to the read end of the pipe.
But who reads it?
Prior to this fix, fish would do the following in a loop:
1. select() on the pipe with a 10 msec timeout
2. waitpid(WNOHANG) on the pwd proc
This polling is ugly and confusing and is what is replaced here.
With this new change, fish now reads from the pipe via a background thread:
1. Spawn a background pthread, which select()s on the pipe's read end with
a long (100 msec) timeout.
2. In the foreground, waitpid() (allowing hanging) on the pwd proc.
The big win here is a major simplification of job_t::continue_job() since
it no longer has to worry about filling buffers. This will make things
easier for concurrent execution.
It may not be obvious why the background thread still needs a poll (100 msec).
The answer is for cases where the write end of the fd escapes, in particular
background processes invoked inside command substitutions. psub is perhaps
the only important case of this (other shells typically just hang here).
This makes some significant architectual improvements to io_pipe_t and
io_buffer_t.
Prior to this fix, io_buffer_t subclassed io_pipe_t. io_buffer_t is now
replaced with a class io_bufferfill_t, which does not subclass pipe.
io_pipe_t no longer remembers both fds. Instead it has an autoclose_fd_t,
so that the file descriptor ownership is clear.
This switches IO redirections after fork() to use the dup2_list_t,
instead of io_chain_t. This results in simpler code with much simpler
error handling.
This is a large change to how io_buffers are filled. The essential problem
comes about with code like (example):
echo ( /bin/pwd )
The output of /bin/pwd must go to fish, not the tty. To arrange for this,
fish does the following:
1. Invoke pipe() to create a pipe.
2. Add an io_bufferfill_t redirection that owns the write end of the pipe.
3. After fork (or equiv), call dup2() to replace pwd's stdout with this pipe.
Now when /bin/pwd writes, it will send output to the read end of the pipe.
But who reads it?
Prior to this fix, fish would do the following in a loop:
1. select() on the pipe with a 10 msec timeout
2. waitpid(WNOHANG) on the pwd proc
This polling is ugly and confusing and is what is replaced here.
With this new change, fish now reads from the pipe via a background thread:
1. Spawn a background pthread, which select()s on the pipe's read end with
a long (100 msec) timeout.
2. In the foreground, waitpid() (allowing hanging) on the pwd proc.
The big win here is a major simplification of job_t::continue_job() since
it no longer has to worry about filling buffers. This will make things
easier for concurrent execution.
It may not be obvious why the background thread still needs a poll (100 msec).
The answer is for cases where the write end of the fd escapes, in particular
background processes invoked inside command substitutions. psub is perhaps
the only important case of this (other shells typically just hang here).
This makes some significant architectual improvements to io_pipe_t and
io_buffer_t.
Prior to this fix, io_buffer_t subclassed io_pipe_t. io_buffer_t is now
replaced with a class io_bufferfill_t, which does not subclass pipe.
io_pipe_t no longer remembers both fds. Instead it has an autoclose_fd_t,
so that the file descriptor ownership is clear.
This switches IO redirections after fork() to use the dup2_list_t,
instead of io_chain_t. This results in simpler code with much simpler
error handling.
Now jobs are aware of their parent jobs, and can interrogate those jobs,
to determine if every job in the chain is fully constructed.
Remove flags and the static stacks that manipulated them.
The parent of a job is the parent pipeline that executed the function or
block corresponding to this job. This will help simplify
process_mark_finished_children().
When a function is encountered by exec_job, a new context is created for
its execution from the ground up, with a new job and all, ultimately
resulting in a recursive call to exec_job from the same (main) thread.
Since each time exec_job encounters a new job with external commands
that needs terminal control it creates a new pgrp and gives it control
of the terminal (tcsetpgrp & co), this effectively takes control away
from the previously spawned external commands which may be (and likely
are) expecting to still have terminal access.
This commit attempts to detect when such a situation arises by handling
recursive calls to exec_job (which can only happen if the pipeline
included a function) by borrowing the pgrp from the (necessarily still
active) parent job and spawning new external commands into it.
When a parent job spawns new jobs due to the evaluation of a new
function (which shouldn't be the case in the first place), we end up
with two distinct jobs sharing one pgrp (to fix#3952). This can lead to
early termination of a pgrp if finished parent job children are reaped
before future processes in either the parent or future child jobs can
join it.
While the parent job is under construction, require that waitpid(2)
calls for the child job be done by process id and not job pgrp.
Closes#3952.
* Convert JOB_* enums to scoped enums
* Convert standalone job_is_* functions to member functions
* Convert standalone job_{promote, signal, continue} to member functions
* Convert standolen job_get{,_from_pid} to `job_t` static functions
* Reduce usage of JOB_* enums outside of proc.cpp by using new
`job_t::is_foo()` const helper methods instead.
This patch is only a refactor and should not change any functionality or
behavior (both observed and unobserved).
* Debug level 3: describe all commands being executed (this is, after all,
a shell and one can argue that this is the most important debug
information avaliable)
* Debug level 4: details of execution, mainly fork vs no-fork and io
handling
Also introduced j->preview() to print a short descriptor of the job
based on the head of the first process so we don't overwhelm with
needless repitition, but also so that we don't have to rely on
distinguishing between repeated, non-unique/non-monotonic job ids that
are often recycled within a single "execution cycle" (pressing enter
once).
* Instead of reaping all child processes when we receive a SIGCHLD, try
reaping only processes belonging to process groups from fully-
constructed jobs, which should eliminate the need for the keepalive
process entirely (WSL's lack of zombies not withstanding) as now
completed processes are not reaped until the job has been fully
constructed (i.e. all processes launched), which means their process
group should still be around for new processes to join.
* When `tcgetpgrp()` calls return 0, attempt to `tcsetpgrp()` before
invoking failure handling code.
* When forking a builtin and not running interactively, do not bail if
unable to set/restore terminal attributes.
Fixes#4178. Fixes#3805. Fixes#5210.
This reverts commit 8c14f0f30f.
This list is not reliable - there are many ways for fish to quit that does not
invoke these functions. It's also not necessary since the history is correctly
saved on exec.
Fix#5133 changed builtins to acquire the terminal, but this regressed
caused fish to be stopped when running in background via `sudo fish`.
Fix this by only acquiring the terminal if the terminal was owned by the
builtin's pgroup.
Fixes#5147
This adds a new string command split0, which splits on zero bytes.
split0 has superpowers because its output is not further split on
newlines when used in command substitutions.
separated_buffer_t encapsulates the logic around discarding (which
was previously duplicated between output_stream_t and io_buffer_t),
and will also encapsulate the logic around explicitly separated
output.
This should speed things up on slower PCs given that the vast majority
of shell commands are simple jobs consisting of a single command without
any pipelines, in which case there's no need for a keepalive process at
all. Applies to WSL only.
As a temporary workaround for the behavior described in
Microsoft/WSL#2997 wherein WSL does not correctly assign the spawned
child its own PID as its PGID, explicitly set the PGID for the newly
spawned process.
The job control functions were a bit messy, in particular
`set_child_group`'s name would imply that all it does is set the child
group, but in reality it used to set the child group (via `setpgid`),
set the job's pgrp if it hasn't been set, and possibly assign control of
the terminal to the newly-created job.
These have been split into separate functions. Now `set_child_group`
does just (and only) that, `maybe_assign_terminal` might assign the
terminal to the new pgrp, and `on_process_created` is used to set the
job properties the first time an external process is created. This might
also speed things up (but probably not noticeably) as there are no more
repeated calls to `getpgrp()` if JOB_CONTROL is not set.
Additionally, this closes#4715 by no longer unconditionally calling
`setpgid` on all new processes, including those created by `posix_spawn`
which does not need this since the child's pgrep is set at in the
arguments to that API call.
This switches function execution from the function's source code to
its stored node and pstree. This means we no longer have to re-parse
the function every time we execute it.
This concerns block nodes with redirections, like
begin ... end | grep ...
Prior to this fix, we passed in a pointer to the node. Switch to passing
in the tnode and parsed source ref. This improves type safety and better
aligns with the function-node plans.
keepalive processes are typically killed by the main shell process.
However if the main shell exits the keepalive may linger. In WSL
keepalives are used more often, and the lingering keepalives are both
leaks and prevent the tests from finishing.
Have keepalives poll for their parent process ID and exit when it
changes, so they can clean themselves up. The polling frequency can be
low.
Have WSL use a keepalive whenever the first process is external.
This works around the fact that WSL prohibits setting an exited
process as the group leader.
We had pid_status defined as a pid_t instance, which was fine since on
most platforms pid_t is an alias for int. However, that is not
universally the case and waitpid takes an int *, not a pid_t *.
This eliminates the "missing" notion of env_var_t. Instead
env_get returns a maybe_t<env_var_t>, which forces callers to
handle the possibility that the variable is missing.
Internally fish should store vars as a vector of elements. The current
flat string representation is a holdover from when the code was written
in C.
Fixes#4200
It's bugged me forever that the scope is the second arg to `env_get()`
but not `env_set()`. And since I'll be introducing some helper functions
that wrap `env_set()` now is a good time to change the order of its
arguments.
There is no more race condition between parent and child with
regards to setting the process groups. Each child sets it for themselves
and then blocks indefinitely until the parent does what it needs to for
them (having waited for them to set their process groups). They are not
SIGCONT'd until the next process in the chain (if any) starts so that
that process can join their process group and open the pipes.
In the last commit, we introduced an indiscriminate if !EXTERNAL check
that unblocks a previously SIGSTOP'd command (if any) to allow the main
loop in exec_job to read from it without deadlocking (since builtins and
functions read directly from input as an optimization, sometimes).
Now only unblocking where a fork will not happen to ensure that if a
builtin ends up forking, that fork'd process is guaranteed to be able to
join the previous process' process group and access its output pipes.
We were having child processes SIGSTOP themselves immediately after
setting their process group and before launching their intended targets,
but they were not necessarily stopped by the time the next command was
being executed (so the opposite of the original race condition where
they might have finished executing by the time the next command came
around), and as a result when we sent them SIGCONT, that could never
reach. Now using waitpid to synchronize the SIGSTOP/SIGCONT between the
two.
If we had a good, unnamed inter-process event/semaphore, we could use
that to have a child process conditionally stop itself if the next
command in the job chain hadn't yet been started / setup, but this is
probably a lot more straightforward and less-confusing, which isn't a
bad thing.
Additionally, there was a bug caused by the fact that the main exec_job
loop actually blocks to read from previous commands in the job if the
current command is a built-in that doesn't need to fork.
With this waitpid code, I was able to finally add the SIGSTOP code to
all the fork'd processes in the main exec_job loop without introducing
deadlocks; it turns out that they should be treated just like the main
EXTERNAL fork, but they tend to execute faster causing the same deadlock
described above to occur more readily.
The only thing I'm not sure about is whether we should execute
unblock_pid undconditionally for all !EXTERNAL commands. It makes more
sense to *only* do that if a blocking read were about to be done in the
main loop, otherwise the original race condition could still appear
(though it is probably mitigated by whatever duration the SIGSTOP lasted
for, even if it is SIGCONT'd before the next command tries to join the
process group).
I hadn't realized that the for loop is called multiple times for a given
"single input" (anything that doesn't include semicolons, etc) to fish,
and so processes were being blocked but blocked_pid was lost by the time
that the next job (which was reading from the last process in the
previous job) came around.
Now using a static variable to store the last blocked PID. AFAICT, this
main job control loop is always executed from the same process and
thread, so this shouldn't need to be wrapped in atomics/mutexes, etc.
This code should be more portable, and certainly cleaner. We are
currently always sending SIGCONT to the last process (if it was part of
a job chain) regardless of whether it called SIGSTOP on itself or not,
which should be fine.
Need to explore whether or not the other forks in src/exec.cpp need to
be SIGSTOP'd on run or only the one that we included in this patch.
I'm not sure if this happens on all platforms, but under WSL with the
existing codebase, processes in the job chain that pipe their
stdout/stderr to the next process in the job could terminate before the
next job started (on fast enough machines for quick enough jobs).
This caused issues like #4235 and possibly #3952, at least for external
commands. What was happening is that the first process was finishing
before the second process was fully set up. fish would then try to
assign (in both the child and the parent) the process group id belonging
to the process group leader to the new process; and if the first process
had already terminated, it would have ended its process group with it as
well before that happened.
I'm not sure if there was already a mechanism in place for ensuring that
a process remains running at least as long as it takes for the next
process in the chain to join its group, etc., but if that code was
there, it wasn't working in my test setup (WSL).
This patch definitely needs some review; I'm not sure how I should
handle non-external commands (and external commands executed via
posix_spawn). I don't know if they are affected by the race condition in
the first place, but when I tried to add the same "wait for next command
in chain to run before unblocking" that would cause black screens
requiring ctrl+c to bypass.
The "unblock previous command" code was originally run by the next child
to be forked, but was then moved to the shell code instead, making it
more-centrally located and less error-prone.
Note that additional headers may be required for the mmap system call on
other platforms.
This is the first step to implementing issue #4200 is to stop subclassing
env_var_t from wcstring. Not too surprisingly doing this identified
several places that were incorrectly treating env_var_t and wcstring as
interchangeable types. I'm not talking about those places that passed
an env_var_t instance to a function that takes a wcstring. I'm talking
about doing things like assigning the former to the latter type, relying
on the implicit conversion, and thus losing information.
We also rename `env_get_string()` to `env_get()` for symmetry with
`env_set()` and to make it clear the function does not return a string.
This makes command substitutions impose the same limit on the amount
of data they accept as the `read` builtin. It does not limit output of
external commands or builtins in other contexts.
Fixes#3822
PR #3691 made most calls to `signal_block()` and `signal_unblock()`
no-ops unless a magic env var is set when fish starts running. It's
been seven months since that change was made and no problems have been
reported. This finishes that work by removing those no-op function calls
and support for the magic env var in our next major release (which won't
happen till at least six months from now).
This implements `status is-breakpoint` that returns true if the current
shell prompt is displayed in the context of a `breakpoint` command.
This also fixes several bugs. Most notably making `breakpoint` a no-op if
the shell isn't interactive. Also, typing `breakpoint` at an interactive
prompt should be an error rather than creating a new nested debugging
context.
Partial fix for #1310
This is the next step in determining whether we can disable blocking
signals without a good reason to do so. This makes not blocking signals
the default behavior. If someone finds a problem they can add this to
their ~/config/fish/config.fish file:
set FISH_NO_SIGNAL_BLOCK 0
Alternatively set that env var before starting fish. I won't be surprised
if people report problems. Till now we have relied on people opting in
to this behavior to tell us whether it causes problems. This makes the
experimental behavior the default that has to be opted out of. This will
give us a lot more confidence this change doesn't cause problems before
the next minor release.
Note that there are still a few places where we force blocking of
signals. Primarily to keep SIGTSTP from interfering with the shell in
response to manipulating the controlling tty. Bash is more selective
in the signals it blocks around the problematic syscalls (c.f., its
`git_terminal_to()` function). However, I don't see any value in that
refinement.
I recently upgraded the software on my macOS server and was dismayed to
see that cppcheck reported a huge number of format string errors due to
mismatches between the format string and its arguments from calls to
`assert()`. It turns out they are due to the macOS header using `%lu`
for the line number which is obviously wrong since it is using the C
preprocessor `__LINE__` symbol which evaluates to a signed int.
I also noticed that the macOS implementation writes to stdout, rather
than stderr. It also uses `printf()` which can be a problem on some
platforms if the stream is already in wide mode which is the normal case
for fish.
So implement our own `assert()` implementation. This also eliminates
double-negative warnings that we get from some of our calls to
`assert()` on some platforms by oclint.
Also reimplement the `DIE()` macro in terms of our internal
implementation.
Rewrite `assert(0 && msg)` statements to `DIE(msg)` for clarity and to
eliminate oclint warnings about constant expressions.
Fixes#3276, albeit not in the fashion I originally envisioned.
Currently the block stack is just a vector of pointers.
Clients must manually use new() to allocate a block, and then
transfer ownership to the stack (so must NOT delete it).
Give the parser itself responsibility for allocating blocks too,
so that it takes over both allocation and deletion. Use unique_ptr
to make deletion less error-prone.