Fixes a race condition in output redirection in job chain

I'm not sure if this happens on all platforms, but under WSL with the
existing codebase, processes in the job chain that pipe their
stdout/stderr to the next process in the job could terminate before the
next job started (on fast enough machines for quick enough jobs).

This caused issues like #4235 and possibly #3952, at least for external
commands. What was happening is that the first process was finishing
before the second process was fully set up. fish would then try to
assign (in both the child and the parent) the process group id belonging
to the process group leader to the new process; and if the first process
had already terminated, it would have ended its process group with it as
well before that happened.

I'm not sure if there was already a mechanism in place for ensuring that
a process remains running at least as long as it takes for the next
process in the chain to join its group, etc., but if that code was
there, it wasn't working in my test setup (WSL).

This patch definitely needs some review; I'm not sure how I should
handle non-external commands (and external commands executed via
posix_spawn). I don't know if they are affected by the race condition in
the first place, but when I tried to add the same "wait for next command
in chain to run before unblocking" that would cause black screens
requiring ctrl+c to bypass.

The "unblock previous command" code was originally run by the next child
to be forked, but was then moved to the shell code instead, making it
more-centrally located and less error-prone.

Note that additional headers may be required for the mmap system call on
other platforms.
This commit is contained in:
Mahmoud Al-Qudsi 2017-07-25 18:14:42 -05:00 committed by Kurtis Rader
parent 35ee28ff24
commit cdb72b7024

View file

@ -23,6 +23,8 @@
#include <string>
#include <type_traits>
#include <vector>
#include <sys/mman.h>
#include <semaphore.h>
#include "builtin.h"
#include "common.h"
@ -493,6 +495,7 @@ void exec_job(parser_t &parser, job_t *j) {
//
// We are careful to set these to -1 when closed, so if we exit the loop abruptly, we can still
// close them.
sem_t *chained_wait_prev = nullptr;
int pipe_current_read = -1, pipe_current_write = -1, pipe_next_read = -1;
for (std::unique_ptr<process_t> &unique_p : j->processes) {
if (exec_error) {
@ -511,6 +514,14 @@ void exec_job(parser_t &parser, job_t *j) {
// See if we need a pipe.
const bool pipes_to_next_command = !p->is_last_in_job;
//these semaphores will be used to make sure the first process lives long enough for the
//next process in the chain to open its handles, process group, etc.
//this child will block on this one, the next child will have to call sem_post against it.
sem_t *chained_wait_next = !pipes_to_next_command ? nullptr : (sem_t*)mmap(NULL, sizeof(sem_t*), PROT_READ|PROT_WRITE,MAP_SHARED|MAP_ANONYMOUS,-1, 0);
if (chained_wait_next != nullptr) {
sem_init(chained_wait_next, 1, 0);
}
// The pipes the current process write to and read from. Unfortunately these can't be just
// allocated on the stack, since j->io wants shared_ptr.
//
@ -1060,6 +1071,14 @@ void exec_job(parser_t &parser, job_t *j) {
{
pid = execute_fork(false);
if (pid == 0) {
// a hack that fixes any tcsetpgrp errors caused by race conditions
// usleep(20 * 1000);
if (chained_wait_next != nullptr) {
debug(3, L"Waiting for next command in chain to start.\n");
sem_wait(chained_wait_next);
sem_destroy(chained_wait_next);
munmap(chained_wait_next, sizeof(sem_t));
}
// This is the child process.
p->pid = getpid();
setup_child_process(j, p, process_net_io_chain);
@ -1101,6 +1120,16 @@ void exec_job(parser_t &parser, job_t *j) {
exec_close(pipe_current_write);
pipe_current_write = -1;
}
//now that next command in the chain has been started, unblock the previous command
if (chained_wait_prev != nullptr) {
debug(3, L"Unblocking previous command in chain.\n");
sem_post(chained_wait_prev);
munmap(chained_wait_prev, sizeof(sem_t));
}
if (chained_wait_next != nullptr) {
chained_wait_prev = chained_wait_next;
}
}
// Clean up any file descriptors we left open.