Export arrays as colon delimited, and support path-style variables

This commit begins to bake in a notion of path-style variables.

Prior to this fix, fish would export arrays as ASCII record separator
delimited, except for a whitelist (PATH, CDPATH, MANPATH). This is
surprising and awkward for other programs to deal with, and there's no way
to get similar behavior for other variables like GOPATH or LD_LIBRARY_PATH.

This commit does the following:

1. Exports all arrays as colon delimited strings, instead of RS.

2. Introduces a notion of "path variable." A path variable will be
"colon-delimited" which means it gets colon-separated in quoted expansion,
and automatically splits on colons. In this commit we only do the exporting
part.

Colons are not escaped in exporting; this is deliberate to support uses
like

    `set -x PYTHONPATH "/foo:/bar"`

which ought to work (and already do, we don't want  to make a compat break
here).
This commit is contained in:
ridiculousfish 2018-09-09 18:32:15 -07:00
parent ff042bbb7b
commit 3f3b3a7006
9 changed files with 104 additions and 61 deletions

View file

@ -11,6 +11,7 @@ fish 3.0 is a major release which brings with it both improvements in functional
## Notable non-backward compatible changes
- `%` is no longer used for process and job expansion, except for `%self` which is retained. Some commands have been wrapped to still understand process expansion, including `bg`, `fg` and `kill` (#4230, #1202)
- Incoming environment variables are no longer split into arrays based on RS. Instead, variables are not split, unless their name ends in PATH, in which case they are split on colons. (#436)
- A literal `{}` now expands to itself, rather than nothing. This makes working with `find -exec` easier. (#1109, #4632)
- Successive commas in brace expansions are handled in less surprising manner (`{,,,}` expands to four empty strings rather than an empty string, a comma and an empty string again). (#3002, #4632).
- `for` loop control variables are no longer local to the `for` block (#1935).

View file

@ -848,7 +848,18 @@ A range of indices can be specified, see <a href='#expand-index-range'>index ran
All arrays are one-dimensional and cannot contain other arrays, although it is possible to fake nested arrays using the dereferencing rules of <a href="#expand-variable">variable expansion</a>.
`fish` automatically creates arrays from the variables `PATH`, `CDPATH` and `MANPATH` when it is started. (Previous versions created arrays from *all* colon-delimited environment variables.)
When an array is exported as an environment variable, it is either space or colon delimited, depending on whether it is a path variable:
\fish
set -x smurf blue small
set -x smurf_PATH forest mushroom
env | grep smurf
<outp>
# smurf=blue small
# smurf_PATH=forest:mushroom
</outp>
\endfish
`fish` automatically creates arrays from all environment variables whose name ends in PATH, by splitting them on colons. Other variables are not automatically split.
\subsection variables-special Special variables

View file

@ -18,6 +18,7 @@
#endif
#include <algorithm>
#include <iterator>
#include <memory>
#include <mutex>
#include <sstream>
@ -301,6 +302,13 @@ bool contains(const Col &col, const T2 &val) {
return std::find(std::begin(col), std::end(col), val) != std::end(col);
}
/// Append a vector \p donator to the vector \p receiver.
template <typename T>
void vec_append(std::vector<T> &receiver, std::vector<T> &&donator) {
receiver.insert(receiver.end(), std::make_move_iterator(donator.begin()),
std::make_move_iterator(donator.end()));
}
/// Print a stack trace to stderr.
void show_stackframe(const wchar_t msg_level, int frame_count = 100, int skip_levels = 0);

View file

@ -76,6 +76,10 @@ extern char **environ;
#define READ_BYTE_LIMIT 10 * 1024 * 1024
size_t read_byte_limit = READ_BYTE_LIMIT;
/// The character used to delimit path and non-path variables in exporting and in string expansion.
static const wchar_t PATH_ARRAY_SEP = L':';
static const wchar_t NONPATH_ARRAY_SEP = L' ';
bool g_use_posix_spawn = false; // will usually be set to true
bool curses_initialized = false;
@ -298,20 +302,9 @@ static bool is_read_only(const wchar_t *val) {
static bool is_read_only(const wcstring &val) { return is_read_only(val.c_str()); }
// Here is the whitelist of variables that we colon-delimit, both incoming from the environment and
// outgoing back to it. This is deliberately very short - we don't want to add language-specific
// values like CLASSPATH.
static const string_set_t colon_delimited_variable = {L"PATH", L"CDPATH", L"MANPATH"};
static bool variable_is_colon_delimited_var(const wchar_t *str) {
/// List of "path" like variable names that need special handling. This includes automatic
/// splitting and joining on import/export. As well as replacing empty elements, which
/// implicitly refer to the CWD, with an explicit '.' in the case of PATH and CDPATH. Note this
/// is sorted
return string_set_contains(colon_delimited_variable, str);
}
static bool variable_is_colon_delimited_var(const wcstring &str) {
return variable_is_colon_delimited_var(str.c_str());
/// Return true if a variable should become a path variable by default. See #436.
static bool variable_should_auto_pathvar(const wcstring &name) {
return string_suffixes_string(L"PATH", name);
}
/// Table of variables whose value is dynamically calculated, such as umask, status, etc.
@ -353,9 +346,6 @@ static void handle_timezone(const wchar_t *env_var_name) {
/// Unfortunately that convention causes problems for fish scripts. So this function replaces the
/// empty path element with an explicit ".". See issue #3914.
static void fix_colon_delimited_var(const wcstring &var_name) {
// While we auto split/join MANPATH we do not want to replace empty elements with "." (#4158).
if (var_name == L"MANPATH") return;
const auto paths = env_get(var_name);
if (paths.missing_or_empty()) return;
@ -527,9 +517,9 @@ static bool initialize_curses_using_fallback(const char *term) {
/// Ensure the content of the magic path env vars is reasonable. Specifically, that empty path
/// elements are converted to explicit "." to make the vars easier to use in fish scripts.
static void init_path_vars() {
for (const wchar_t *var_name : colon_delimited_variable) {
fix_colon_delimited_var(var_name);
}
// Do not replace empties in MATHPATH - see #4158.
fix_colon_delimited_var(L"PATH");
fix_colon_delimited_var(L"CDPATH");
}
/// Update the value of g_guessed_fish_emoji_width
@ -847,10 +837,8 @@ static void setup_var_dispatch_table() {
var_dispatch_table.emplace(var_name, handle_curses_change);
}
for (const auto &var_name : colon_delimited_variable) {
var_dispatch_table.emplace(var_name, handle_magic_colon_var_change);
}
var_dispatch_table.emplace(L"PATH", handle_magic_colon_var_change);
var_dispatch_table.emplace(L"CDPATH", handle_magic_colon_var_change);
var_dispatch_table.emplace(L"fish_term256", handle_fish_term_change);
var_dispatch_table.emplace(L"fish_term24bit", handle_fish_term_change);
var_dispatch_table.emplace(L"fish_escape_delay_ms", handle_escape_delay_change);
@ -885,10 +873,9 @@ void env_init(const struct config_paths_t *paths /* or NULL */) {
env_set_empty(key_and_val, ENV_EXPORT | ENV_GLOBAL);
} else {
key.assign(key_and_val, 0, eql);
if (is_read_only(key) || is_electric(key)) continue;
val.assign(key_and_val, eql+1, wcstring::npos);
wchar_t sep = variable_is_colon_delimited_var(key) ? L':' : ARRAY_SEP;
env_set(key, ENV_EXPORT | ENV_GLOBAL, split_string(val, sep));
if (is_read_only(key) || is_electric(key)) continue;
env_set(key, ENV_EXPORT | ENV_GLOBAL, {val});
}
}
@ -1056,8 +1043,10 @@ static int set_umask(const wcstring_list_t &list_val) {
/// * ENV_SCOPE, the variable cannot be set in the given scope. This applies to readonly/electric
/// variables set from the local or universal scopes, or set as exported.
/// * ENV_INVALID, the variable value was invalid. This applies only to special variables.
static int env_set_internal(const wcstring &key, env_mode_flags_t var_mode, wcstring_list_t val) {
static int env_set_internal(const wcstring &key, env_mode_flags_t input_var_mode,
wcstring_list_t val) {
ASSERT_IS_MAIN_THREAD();
env_mode_flags_t var_mode = input_var_mode;
bool has_changed_old = vars_stack().has_changed_exported;
int done = 0;
@ -1109,13 +1098,12 @@ static int env_set_internal(const wcstring &key, env_mode_flags_t var_mode, wcst
// Determine the node.
bool has_changed_new = false;
env_node_t *preexisting_node = env_get_node(key);
bool preexisting_entry_exportv = false;
maybe_t<env_var_t::env_var_flags_t> preexisting_flags{};
if (preexisting_node != NULL) {
var_table_t::const_iterator result = preexisting_node->env.find(key);
assert(result != preexisting_node->env.end());
const env_var_t &var = result->second;
if (var.exports()) {
preexisting_entry_exportv = true;
preexisting_flags = result->second.get_flags();
if (*preexisting_flags & env_var_t::flag_export) {
has_changed_new = true;
}
}
@ -1129,8 +1117,9 @@ static int env_set_internal(const wcstring &key, env_mode_flags_t var_mode, wcst
node = preexisting_node;
if ((var_mode & (ENV_EXPORT | ENV_UNEXPORT)) == 0) {
// Use existing entry's exportv status.
var_mode = //!OCLINT(parameter reassignment)
preexisting_entry_exportv ? ENV_EXPORT : 0;
if (preexisting_flags && (*preexisting_flags & env_var_t::flag_export)) {
var_mode |= ENV_EXPORT;
}
}
} else {
if (!get_proc_had_barrier()) {
@ -1158,6 +1147,27 @@ static int env_set_internal(const wcstring &key, env_mode_flags_t var_mode, wcst
}
if (!done) {
// Resolve if we should mark ourselves as a path variable or not.
// If there's an existing variable, use its path flag; otherwise infer it.
if ((var_mode & (ENV_PATHVAR | ENV_UNPATHVAR)) == 0) {
bool should_pathvar = false;
if (auto existing = node->find_entry(key)) {
should_pathvar = existing->is_pathvar();
} else {
should_pathvar = variable_should_auto_pathvar(key);
}
var_mode |= should_pathvar ? ENV_PATHVAR : ENV_UNPATHVAR;
}
// Split about ':' if it's a path variable.
if (var_mode & ENV_PATHVAR) {
wcstring_list_t split_val;
for (const wcstring &str : val) {
vec_append(split_val, split_string(str, PATH_ARRAY_SEP));
}
val = std::move(split_val);
}
// Set the entry in the node. Note that operator[] accesses the existing entry, or
// creates a new one.
env_var_t &var = node->env[key];
@ -1167,6 +1177,7 @@ static int env_set_internal(const wcstring &key, env_mode_flags_t var_mode, wcst
}
var.set_vals(std::move(val));
var.set_pathvar(var_mode & ENV_PATHVAR);
if (var_mode & ENV_EXPORT) {
// The new variable is exported.
@ -1291,10 +1302,9 @@ int env_remove(const wcstring &key, int var_mode) {
const wcstring_list_t &env_var_t::as_list() const { return vals; }
/// Return a string representation of the var. At the present time this uses the legacy 2.x
/// encoding.
/// Return a string representation of the var.
wcstring env_var_t::as_string() const {
wchar_t sep = (flags & flag_colon_delimit) ? L':' : ARRAY_SEP;
wchar_t sep = is_pathvar() ? PATH_ARRAY_SEP : NONPATH_ARRAY_SEP;
return join_strings(vals, sep);
}
@ -1305,7 +1315,6 @@ void env_var_t::to_list(wcstring_list_t &out) const {
env_var_t::env_var_flags_t env_var_t::flags_for(const wchar_t *name) {
env_var_flags_t result = 0;
if (is_read_only(name)) result |= flag_read_only;
if (variable_is_colon_delimited_var(name)) result |= flag_colon_delimit;
return result;
}
@ -1496,13 +1505,6 @@ static std::vector<std::string> get_export_list(const var_table_t &envs) {
for (const auto &kv : envs) {
std::string ks = wcs2string(kv.first);
std::string vs = wcs2string(kv.second.as_string());
// Arrays in the value are ASCII record separator (0x1e) delimited. But some variables
// should have colons. Add those.
if (variable_is_colon_delimited_var(kv.first)) {
// Replace ARRAY_SEP with colon.
std::replace(vs.begin(), vs.end(), (char)ARRAY_SEP, ':');
}
// Create and append a string of the form ks=vs
std::string str;
str.reserve(ks.size() + 1 + vs.size());

View file

@ -16,10 +16,6 @@
extern size_t read_byte_limit;
extern bool curses_initialized;
/// Character for separating two array elements. We use 30, i.e. the ascii record separator since
/// that seems logical.
#define ARRAY_SEP (wchar_t)0x1e
// Flags that may be passed as the 'mode' in env_set / env_get.
enum {
/// Default mode. Used with `env_get()` to indicate the caller doesn't care what scope the var
@ -35,11 +31,15 @@ enum {
ENV_EXPORT = 1 << 3,
/// Flag for unexported variable.
ENV_UNEXPORT = 1 << 4,
/// Flag to mark a variable as a path variable.
ENV_PATHVAR = 1 << 5,
/// Flag to unmark a variable as a path variable.
ENV_UNPATHVAR = 1 << 6,
/// Flag for variable update request from the user. All variable changes that are made directly
/// by the user, such as those from the `read` and `set` builtin must have this flag set. It
/// serves one purpose: to indicate that an error should be returned if the user is attempting
/// to modify a var that should not be modified by direct user action; e.g., a read-only var.
ENV_USER = 1 << 5,
ENV_USER = 1 << 7,
};
typedef uint32_t env_mode_flags_t;
@ -63,16 +63,18 @@ void env_init(const struct config_paths_t *paths = NULL);
void misc_init();
class env_var_t {
private:
public:
using env_var_flags_t = uint8_t;
private:
wcstring_list_t vals; // list of values assigned to the var
env_var_flags_t flags;
public:
enum {
flag_export = 1 << 0, // whether the variable is exported
flag_colon_delimit = 1 << 1, // whether the variable is colon delimited
flag_read_only = 1 << 2 // whether the variable is read only
flag_read_only = 1 << 1, // whether the variable is read only
flag_pathvar = 1 << 2, // whether the variable is a path variable
};
// Constructors.
@ -92,6 +94,8 @@ class env_var_t {
bool empty() const { return vals.empty() || (vals.size() == 1 && vals[0].empty()); };
bool read_only() const { return flags & flag_read_only; }
bool exports() const { return flags & flag_export; }
bool is_pathvar() const { return flags & flag_pathvar; }
env_var_flags_t get_flags() const { return flags; }
wcstring as_string() const;
void to_list(wcstring_list_t &out) const;
@ -107,6 +111,14 @@ class env_var_t {
}
}
void set_pathvar(bool pathvar) {
if (pathvar) {
flags |= flag_pathvar;
} else {
flags &= ~flag_pathvar;
}
}
static env_var_flags_t flags_for(const wchar_t *name);
env_var_t &operator=(const env_var_t &var) = default;

View file

@ -200,16 +200,20 @@ static bool append_file_entry(fish_message_type_t type, const wcstring &key_in,
/// Encoding of a null string.
static const wchar_t * const ENV_NULL = L"\x1d";
/// Character used to separate arrays in universal variables file.
/// This is 30, the ASCII record separator.
static const wchar_t UVAR_ARRAY_SEP = 0x1e;
/// Decode a serialized universal variable value into a list.
static wcstring_list_t decode_serialized(const wcstring &val) {
if (val == ENV_NULL) return {};
return split_string(val, ARRAY_SEP);
return split_string(val, UVAR_ARRAY_SEP);
}
/// Decode a a list into a serialized universal variable value.
static wcstring encode_serialized(const wcstring_list_t &vals) {
if (vals.empty()) return ENV_NULL;
return join_strings(vals, ARRAY_SEP);
return join_strings(vals, UVAR_ARRAY_SEP);
}
env_universal_t::env_universal_t(wcstring path) : explicit_vars_path(std::move(path)) {}

View file

@ -105,7 +105,7 @@ $var6: not set in universal scope
# Exporting works
TESTVAR0=
TESTVAR1=a
TESTVAR2=a^^b
TESTVAR2=a b
####################
# if/for/while scope

View file

@ -300,10 +300,14 @@ env SHLVL=" 3" ../test/root/bin/fish -c 'echo SHLVL: $SHLVL'
env DISPLAY="localhost:0.0" ../test/root/bin/fish -c 'echo Elements in DISPLAY: (count $DISPLAY)'
# We can't use PATH for this because the global configuration will modify PATH
# based on /etc/paths and /etc/paths.d.
# Exported arrays should use record separator, with a few exceptions. So we can use an arbitrary variable for this.
# Exported arrays are colon delimited; they are automatically split on colons if they end in PATH.
set -gx FOO one two three four
../test/root/bin/fish -c 'echo Elements in FOO: (count $FOO)'
set -gx FOOPATH one two three four
../test/root/bin/fish -c 'echo Elements in FOO and FOOPATH: (count $FOO) (count $FOOPATH)'
# some must use colon separators!
set -lx MANPATH man1 man2 man3 ; env | grep '^MANPATH='
# ensure we don't escape colon values
set -x DONT_ESCAPE_COLONS 1: 2: :3: ; env | grep '^DONT_ESCAPE_COLONS='
true

View file

@ -41,5 +41,6 @@ SHLVL: 1
SHLVL: 4
SHLVL: 4
Elements in DISPLAY: 1
Elements in FOO: 4
Elements in FOO and FOOPATH: 1 4
MANPATH=man1:man2:man3
DONT_ESCAPE_COLONS=1: 2: :3: