Updates after review comments

- make match/replace without -a operate on the first match on each
  argument
- use different exit codes for "no operation performed" and errors, as
  grep does
- refactor regex compile code
- use human-friendly error messages from pcre2
- improve error handling & reporting elsewhere
- add a few tests
- make some doc fixes
- some simplification & cleanup
- fix ci build failure (I hope)
This commit is contained in:
Michael Steed 2015-08-19 18:03:49 -06:00
parent efd47dcbda
commit 896a2c2b27
6 changed files with 311 additions and 271 deletions

View file

@ -798,7 +798,7 @@ fish: $(FISH_OBJS) obj/fish.o $(PCRE2_LIB)
$(CXX) $(CXXFLAGS) $(LDFLAGS_FISH) $(FISH_OBJS) obj/fish.o $(LIBS) -o $@
$(PCRE2_H):
(cd $(PCRE2_DIR) && ./configure $(PCRE2_CONFIG) && make)
(cd $(PCRE2_DIR) && autoconf && ./configure $(PCRE2_CONFIG) && make)
$(PCRE2_LIB): $(PCRE2_H)

View file

@ -28,23 +28,25 @@ Arguments beginning with `-` are normally interpreted as switches; `--` causes t
Most subcommands accept a `-q` or `--quiet` switch, which suppresses the usual output but exits with the documented status.
In addition to the exit codes documented below, all the string subcommands exit with a value of 2 to indicate that an error occurred.
The following subcommands are available:
- `length` reports the length of each string argument in characters. Exit status: 0 if at least one non-empty STRING was given, or 1 otherwise.
- `sub` prints a substring of each string argument. The start of the substring can be specified with `-s` or `--start` followed by a 1-based index value. Positive index values are relative to the start of the string and negative index values are relative to the end of the string. The default start value is 1. The length of the substring can be specified with `-l` or `--length`. If the length is not specified, the substring continues to the end of each STRING. Exit status: 0 if at least one substring operation was performed, 1 otherwise.
- `split` splits each STRING on the separator SEP, which can be an empty string. If `-m` or `--max` is specified, at most MAX splits are done. If `-r` or `--right` is given, splitting is performed right-to-left. This is useful in combination with `-m` or `--max`. Exit status: 0 if at least one split was performed, or 1 otherwise.
- `split` splits each STRING on the separator SEP, which can be an empty string. If `-m` or `--max` is specified, at most MAX splits are done on each STRING. If `-r` or `--right` is given, splitting is performed right-to-left. This is useful in combination with `-m` or `--max`. Exit status: 0 if at least one split was performed, or 1 otherwise.
- `join` joins its STRING arguments into a single string separated by SEP, which can be an empty string. Exit status: 0 if at least one join was performed, or 1 otherwise.
- `trim` removes leading and trailing whitespace from each STRING. If `-l` or `--left` is given, only leading whitespace is removed. If `-r` or `--right` is given, only trailing whitespace is trimmed. The `-c` or `--chars` switch causes the characters in CHARS to be removed instead of whitespace. Exit status: 0 if at least one character was trimmed, or 1 otherwise.
- `escape` escapes each STRING such that it can be passed back to `eval` to produce the original argument again. By default, all special characters are escaped, and quotes are used to simplify the output when possible. If `-q` or `--no-quote` is given, the simplifying quoted format is not used. Exit status: 0 if at least one string was escaped, or 1 otherwise.
- `escape` escapes each STRING such that it can be passed back to `eval` to produce the original argument again. By default, all special characters are escaped, and quotes are used to simplify the output when possible. If `-n` or `--no-quote` is given, the simplifying quoted format is not used. Exit status: 0 if at least one string was escaped, or 1 otherwise.
- `match` tests each STRING against a pattern and prints matching substrings. Only the first match is printed unless `-a` or `--all` is given, in which case all matches are reported. Matching can be made case-insensitive with `-i` or `--ignore-case`. If `-n` or `--index` is given, each match is reported as a 1-based start position, or 0 for no match. By default, PATTERN is interpreted as a glob pattern matched against each entire string argument. If `-r` or `--regex` is given, PATTERN is interpreted as a Perl-compatible regular expression. Note that for a regular expressions containing capturing groups, multiple items will be reported for each match, one for the entire match and one for each capturing group. Exit status: 0 if at least one match was found, or 1 otherwise.
- `match` tests each STRING against PATTERN and prints matching substrings. Only the first match for each STRING is reported unless `-a` or `--all` is given, in which case all matches are reported. Matching can be made case-insensitive with `-i` or `--ignore-case`. If `-n` or `--index` is given, each match is reported as a 1-based start position and a length. By default, PATTERN is interpreted as a glob pattern matched against each entire STRING argument. If `-r` or `--regex` is given, PATTERN is interpreted as a Perl-compatible regular expression. For a regular expression containing capturing groups, multiple items will be reported for each match, one for the entire match and one for each capturing group. Exit status: 0 if at least one match was found, or 1 otherwise.
- `replace` is similar to `match` but replaces non-overlapping matching substrings with a replacement string and prints the result. By default, PATTERN is treated as a literal substring to be matched by the literal string REPLACEMENT. If `-r` or `--regex` is given, PATTERN is interpreted as a Perl-compatible regular expression, and REPLACEMENT can refer to capturing groups by number or name as `$n` or `${n}`. Exit status: 0 if at least one replacement was performed, or 1 otherwise.
- `replace` is similar to `match` but replaces non-overlapping matching substrings with a replacement string and prints the result. By default, PATTERN is treated as a literal substring to be matched. If `-r` or `--regex` is given, PATTERN is interpreted as a Perl-compatible regular expression, and REPLACEMENT can contain C-style escape sequences as well as references to capturing groups by number or name as `$n` or `${n}`. Exit status: 0 if at least one replacement was performed, or 1 otherwise.
\subsection string-example Examples
@ -128,7 +130,7 @@ string match -i 'a??B' Axxb
# Output:
# Axxb
string match -a -i '[aeiou]' A B C D E
string match -i '[aeiou]' A B C D E
# Output:
# A
# E
@ -161,11 +163,13 @@ string match -r '^(\\w{2,4})\\g1$' papa mud murmur
# murmur
# mur
string match -r -n at catch
string match -r -a -n at ratatat
# Output:
# 2
# 2 2
# 4 2
# 6 2
string match -r -i '0x[0-9a-f]{1,8}' 'int xyzzy = 0xBadC0de;'
string match -r -i '0x[0-9a-f]{1,8}' 'int magic = 0xBadC0de;'
# Output:
# 0xBadC0de
\endfish
@ -194,7 +198,12 @@ string replace -r -a '[^\\d.]+' ' ' '0 one two 3.14 four 5x'
# Output:
# 0 3.14 5
string replace -r '(\\w+)\\s+(\\w+)' '$1 $2 $1' 'left right'
string replace -r '(\\w+)\\s+(\\w+)' '$2 $1 $$' 'left right'
# Output:
# left right left
# right left $
string replace -r '\s*newline\s*' '\n' 'put a newline here'
# Output:
# put a
# here
\endfish

View file

@ -12,23 +12,26 @@
enum
{
BUILTIN_STRING_OK = STATUS_BUILTIN_OK,
BUILTIN_STRING_ERROR = STATUS_BUILTIN_ERROR
BUILTIN_STRING_OK = 0,
BUILTIN_STRING_NONE = 1,
BUILTIN_STRING_ERROR = 2
};
static void string_fatal_error(const wchar_t *fmt, ...)
static void string_error(const wchar_t *fmt, ...)
{
va_list va;
va_start(va, fmt);
wcstring errstr = vformat_string(fmt, va);
va_end(va);
if (!errstr.empty() && errstr.at(errstr.length() - 1) != L'\n')
{
errstr += L'\n';
stderr_buffer += L"string ";
stderr_buffer += errstr;
}
stderr_buffer += errstr;
static void string_unknown_option(parser_t &parser, const wchar_t *subcmd, const wchar_t *opt)
{
string_error(BUILTIN_ERR_UNKNOWN, subcmd, opt);
builtin_print_help(parser, L"string", stderr_buffer);
}
static const wchar_t *string_get_arg_stdin()
@ -135,7 +138,7 @@ static int string_escape(parser_t &parser, int argc, wchar_t **argv)
break;
case '?':
builtin_unknown_option(parser, argv[0], argv[w.woptind - 1]);
string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
return BUILTIN_STRING_ERROR;
}
}
@ -143,7 +146,7 @@ static int string_escape(parser_t &parser, int argc, wchar_t **argv)
int i = w.woptind;
if (!isatty(builtin_stdin) && argc > i)
{
string_fatal_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
return BUILTIN_STRING_ERROR;
}
@ -156,7 +159,7 @@ static int string_escape(parser_t &parser, int argc, wchar_t **argv)
nesc++;
}
return (nesc > 0) ? 0 : 1;
return (nesc > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
}
static int string_join(parser_t &parser, int argc, wchar_t **argv)
@ -188,7 +191,7 @@ static int string_join(parser_t &parser, int argc, wchar_t **argv)
break;
case '?':
builtin_unknown_option(parser, argv[0], argv[w.woptind - 1]);
string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
return BUILTIN_STRING_ERROR;
}
}
@ -197,13 +200,13 @@ static int string_join(parser_t &parser, int argc, wchar_t **argv)
const wchar_t *sep;
if ((sep = string_get_arg_argv(&i, argv)) == 0)
{
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
return BUILTIN_STRING_ERROR;
}
if (!isatty(builtin_stdin) && argc > i)
{
string_fatal_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
return BUILTIN_STRING_ERROR;
}
@ -224,7 +227,7 @@ static int string_join(parser_t &parser, int argc, wchar_t **argv)
stdout_buffer += L'\n';
}
return (nargs > 1) ? 0 : 1;
return (nargs > 1) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
}
static int string_length(parser_t &parser, int argc, wchar_t **argv)
@ -256,7 +259,7 @@ static int string_length(parser_t &parser, int argc, wchar_t **argv)
break;
case '?':
builtin_unknown_option(parser, argv[0], argv[w.woptind - 1]);
string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
return BUILTIN_STRING_ERROR;
}
}
@ -264,7 +267,7 @@ static int string_length(parser_t &parser, int argc, wchar_t **argv)
int i = w.woptind;
if (!isatty(builtin_stdin) && argc > i)
{
string_fatal_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
return BUILTIN_STRING_ERROR;
}
@ -284,7 +287,7 @@ static int string_length(parser_t &parser, int argc, wchar_t **argv)
}
}
return (nnonempty > 0) ? 0 : 1;
return (nnonempty > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
}
struct match_options_t
@ -300,24 +303,23 @@ struct match_options_t
class string_matcher_t
{
protected:
match_options_t opts;
const wchar_t *argv0;
int nmatch;
const wchar_t *pattern;
match_options_t opts;
int total_matched;
public:
string_matcher_t(const wchar_t *argv0_, const match_options_t &opts_)
: opts(opts_), argv0(argv0_), nmatch(0)
string_matcher_t(const wchar_t *argv0_, const wchar_t *pattern_, const match_options_t &opts_)
: argv0(argv0_), pattern(pattern_), opts(opts_), total_matched(0)
{ }
virtual ~string_matcher_t() { }
virtual bool report_matches(const wchar_t *arg) = 0;
int match_count() { return nmatch; }
int match_count() { return total_matched; }
};
class wildcard_matcher_t: public string_matcher_t
{
const wchar_t *pattern;
bool arg_matches(const wchar_t *pat, const wchar_t *arg)
{
for (; *arg != L'\0'; arg++, pat++)
@ -413,21 +415,20 @@ class wildcard_matcher_t: public string_matcher_t
}
public:
wildcard_matcher_t(const wchar_t *argv0_, const match_options_t &opts_, const wchar_t *pattern_)
: string_matcher_t(argv0_, opts_),
pattern(pattern_)
wildcard_matcher_t(const wchar_t *argv0_, const wchar_t *pattern_, const match_options_t &opts_)
: string_matcher_t(argv0_, pattern_, opts_)
{ }
virtual ~wildcard_matcher_t() { }
bool report_matches(const wchar_t *arg)
{
if (opts.all || nmatch == 0)
{
// Note: --all is a no-op for glob matching since the pattern is always
// matched against the entire argument
bool match = arg_matches(pattern, arg);
if (match)
{
nmatch++;
total_matched++;
}
if (!opts.quiet)
{
@ -435,7 +436,9 @@ public:
{
if (opts.index)
{
stdout_buffer += L"1\n";
stdout_buffer += L"1 ";
stdout_buffer += to_string(wcslen(arg));
stdout_buffer += L'\n';
}
else
{
@ -444,15 +447,74 @@ public:
}
}
}
}
return true;
}
};
static const wchar_t *pcre2_strerror(int err_code)
{
static wchar_t buf[128];
pcre2_get_error_message(err_code, (PCRE2_UCHAR *)buf, sizeof(buf) / sizeof(wchar_t));
return buf;
}
struct compiled_regex_t
{
const wchar_t *argv0;
pcre2_code *code;
pcre2_match_data *match;
compiled_regex_t(const wchar_t *argv0_, const wchar_t *pattern, bool ignore_case)
: argv0(argv0_), code(0), match(0)
{
// Disable some sequences that can lead to security problems
uint32_t options = PCRE2_NEVER_UTF;
#if PCRE2_CODE_UNIT_WIDTH < 32
options |= PCRE2_NEVER_BACKSLASH_C;
#endif
int err_code = 0;
PCRE2_SIZE err_offset = 0;
code = pcre2_compile(
PCRE2_SPTR(pattern),
PCRE2_ZERO_TERMINATED,
options | (ignore_case ? PCRE2_CASELESS : 0),
&err_code,
&err_offset,
0);
if (code == 0)
{
string_error(_(L"%ls: Regular expression compile error: %ls\n"),
argv0, pcre2_strerror(err_code));
string_error(L"%ls: %ls\n", argv0, pattern);
string_error(L"%ls: %*ls\n", argv0, err_offset, L"^");
return;
}
match = pcre2_match_data_create_from_pattern(code, 0);
if (match == 0)
{
DIE_MEM();
}
}
~compiled_regex_t()
{
if (match != 0)
{
pcre2_match_data_free(match);
}
if (code != 0)
{
pcre2_code_free(code);
}
}
};
class pcre2_matcher_t: public string_matcher_t
{
pcre2_code *regex;
pcre2_match_data *match;
compiled_regex_t regex;
int report_match(const wchar_t *arg, int pcre2_rc)
{
@ -463,17 +525,17 @@ class pcre2_matcher_t: public string_matcher_t
}
if (pcre2_rc < 0)
{
// see http://www.pcre.org/current/doc/html/pcre2api.html#SEC30
string_fatal_error(_(L"%ls: Regular expression match error %d"), argv0, pcre2_rc);
string_error(_(L"%ls: Regular expression match error: %ls\n"),
argv0, pcre2_strerror(pcre2_rc));
return -1;
}
if (pcre2_rc == 0)
{
// The output vector wasn't big enough. Should not happen.
string_fatal_error(_(L"%ls: Regular expression internal error"), argv0);
string_error(_(L"%ls: Regular expression internal error\n"), argv0);
return -1;
}
PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(match);
PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(regex.match);
for (int j = 0; j < pcre2_rc; j++)
{
PCRE2_SIZE begin = ovector[2*j];
@ -485,6 +547,8 @@ class pcre2_matcher_t: public string_matcher_t
if (opts.index)
{
stdout_buffer += to_string(begin + 1);
stdout_buffer += ' ';
stdout_buffer += to_string(end - begin);
}
else if (end > begin) // may have end < begin if \K is used
{
@ -498,70 +562,28 @@ class pcre2_matcher_t: public string_matcher_t
}
public:
pcre2_matcher_t(const wchar_t *argv0_, const match_options_t &opts_, const wchar_t *pattern)
: string_matcher_t(argv0_, opts_),
regex(0), match(0)
{
// Disable some sequences that can lead to security problems
uint32_t options = PCRE2_NEVER_UTF;
#if PCRE2_CODE_UNIT_WIDTH < 32
options |= PCRE2_NEVER_BACKSLASH_C;
#endif
pcre2_matcher_t(const wchar_t *argv0_, const wchar_t *pattern_, const match_options_t &opts_)
: string_matcher_t(argv0_, pattern_, opts_),
regex(argv0_, pattern, opts.ignore_case)
{ }
int err_code = 0;
PCRE2_SIZE err_offset = 0;
regex = pcre2_compile(
PCRE2_SPTR(pattern),
PCRE2_ZERO_TERMINATED,
options | (opts.ignore_case ? PCRE2_CASELESS : 0),
&err_code,
&err_offset,
0);
if (regex == 0)
{
string_fatal_error(_(L"%ls: Regular expression compilation failed at offset %d"),
argv0, int(err_offset));
return;
}
match = pcre2_match_data_create_from_pattern(regex, 0);
if (match == 0)
{
DIE_MEM();
}
}
virtual ~pcre2_matcher_t()
{
if (match != 0)
{
pcre2_match_data_free(match);
}
if (regex != 0)
{
pcre2_code_free(regex);
}
}
virtual ~pcre2_matcher_t() { }
bool report_matches(const wchar_t *arg)
{
// A return value of true means all is well (even if no matches were
// found), false indicates an unrecoverable error.
if (regex == 0)
if (regex.code == 0)
{
// pcre2_compile() failed
return false;
}
if (!opts.all && nmatch > 0)
{
return true;
}
int matched = 0;
// See pcre2demo.c for an explanation of this logic
PCRE2_SIZE arglen = wcslen(arg);
int rc = report_match(arg, pcre2_match(regex, PCRE2_SPTR(arg), arglen, 0, 0, match, 0));
int rc = report_match(arg, pcre2_match(regex.code, PCRE2_SPTR(arg), arglen, 0, 0, regex.match, 0));
if (rc < 0)
{
// pcre2 match error
@ -572,15 +594,16 @@ public:
// no match
return true;
}
nmatch++;
matched++;
total_matched++;
// Report any additional matches
PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(match);
while (opts.all || nmatch == 0)
PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(regex.match);
while (opts.all || matched == 0)
{
uint32_t options = 0;
PCRE2_SIZE offset = ovector[1]; // Start at end of previous match
PCRE2_SIZE old_offset = pcre2_get_startchar(match);
PCRE2_SIZE old_offset = pcre2_get_startchar(regex.match);
if (offset <= old_offset)
{
offset = old_offset + 1;
@ -595,7 +618,7 @@ public:
options = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
}
rc = report_match(arg, pcre2_match(regex, PCRE2_SPTR(arg), arglen, offset, options, match, 0));
rc = report_match(arg, pcre2_match(regex.code, PCRE2_SPTR(arg), arglen, offset, options, regex.match, 0));
if (rc < 0)
{
return false;
@ -610,7 +633,8 @@ public:
ovector[1] = offset + 1;
continue;
}
nmatch++;
matched++;
total_matched++;
}
return true;
}
@ -666,7 +690,7 @@ static int string_match(parser_t &parser, int argc, wchar_t **argv)
break;
case '?':
builtin_unknown_option(parser, argv[0], argv[w.woptind - 1]);
string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
return BUILTIN_STRING_ERROR;
}
}
@ -675,24 +699,24 @@ static int string_match(parser_t &parser, int argc, wchar_t **argv)
const wchar_t *pattern;
if ((pattern = string_get_arg_argv(&i, argv)) == 0)
{
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
return BUILTIN_STRING_ERROR;
}
if (!isatty(builtin_stdin) && argc > i)
{
string_fatal_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
return BUILTIN_STRING_ERROR;
}
string_matcher_t *matcher;
if (regex)
{
matcher = new pcre2_matcher_t(argv[0], opts, pattern);
matcher = new pcre2_matcher_t(argv[0], pattern, opts);
}
else
{
matcher = new wildcard_matcher_t(argv[0], opts, pattern);
matcher = new wildcard_matcher_t(argv[0], pattern, opts);
}
const wchar_t *arg;
@ -705,7 +729,7 @@ static int string_match(parser_t &parser, int argc, wchar_t **argv)
}
}
int rc = matcher->match_count() > 0 ? 0 : 1;
int rc = matcher->match_count() > 0 ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
delete matcher;
return rc;
}
@ -722,18 +746,18 @@ struct replace_options_t
class string_replacer_t
{
protected:
replace_options_t opts;
const wchar_t *argv0;
int nreplace;
replace_options_t opts;
int total_replaced;
public:
string_replacer_t(const wchar_t *argv0_, const replace_options_t &opts_)
: opts(opts_), argv0(argv0_), nreplace(0)
: argv0(argv0_), opts(opts_), total_replaced(0)
{ }
virtual ~string_replacer_t() {}
virtual bool replace_matches(const wchar_t *arg) = 0;
int replace_count() { return nreplace; }
int replace_count() { return total_replaced; }
};
class literal_replacer_t: public string_replacer_t
@ -743,9 +767,9 @@ class literal_replacer_t: public string_replacer_t
int patlen;
public:
literal_replacer_t(const wchar_t *argv0_, const replace_options_t &opts_, const wchar_t *pattern_,
const wchar_t *replacement_)
: string_replacer_t(argv0_, opts_),
literal_replacer_t(const wchar_t *argv0, const wchar_t *pattern_, const wchar_t *replacement_,
const replace_options_t &opts)
: string_replacer_t(argv0, opts),
pattern(pattern_), replacement(replacement_), patlen(wcslen(pattern))
{ }
@ -753,33 +777,35 @@ public:
bool replace_matches(const wchar_t *arg)
{
wcstring replaced;
wcstring result;
if (patlen == 0)
{
replaced = arg;
result = arg;
}
else
{
int replaced = 0;
const wchar_t *cur = arg;
while (*cur != L'\0')
{
if ((opts.all || nreplace == 0) &&
if ((opts.all || replaced == 0) &&
(opts.ignore_case ? wcsncasecmp(cur, pattern, patlen) : wcsncmp(cur, pattern, patlen)) == 0)
{
replaced += replacement;
result += replacement;
cur += patlen;
nreplace++;
replaced++;
total_replaced++;
}
else
{
replaced += *cur;
result += *cur;
cur++;
}
}
}
if (!opts.quiet)
{
stdout_buffer += replaced;
stdout_buffer += result;
stdout_buffer += L'\n';
}
return true;
@ -788,78 +814,49 @@ public:
class regex_replacer_t: public string_replacer_t
{
pcre2_code *regex;
pcre2_match_data *match;
const wchar_t *replacement;
compiled_regex_t regex;
wcstring replacement;
wcstring interpret_escapes(const wchar_t *orig)
{
wcstring result;
while (*orig != L'\0')
{
if (*orig == L'\\')
{
orig += read_unquoted_escape(orig, &result, true, false);
}
else
{
result += *orig;
orig++;
}
}
return result;
}
public:
regex_replacer_t(const wchar_t *argv0_, const replace_options_t &opts_, const wchar_t *pattern,
const wchar_t *replacement_)
: string_replacer_t(argv0_, opts_),
regex(0), match(0), replacement(replacement_)
{
// Disable some sequences that can lead to security problems
uint32_t options = PCRE2_NEVER_UTF;
#if PCRE2_CODE_UNIT_WIDTH < 32
options |= PCRE2_NEVER_BACKSLASH_C;
#endif
regex_replacer_t(const wchar_t *argv0, const wchar_t *pattern, const wchar_t *replacement_,
const replace_options_t &opts)
: string_replacer_t(argv0, opts),
regex(argv0, pattern, opts.ignore_case),
replacement(interpret_escapes(replacement_))
{ }
int err_code = 0;
PCRE2_SIZE err_offset = 0;
regex = pcre2_compile(
PCRE2_SPTR(pattern),
PCRE2_ZERO_TERMINATED,
options | (opts.ignore_case ? PCRE2_CASELESS : 0),
&err_code,
&err_offset,
0);
if (regex == 0)
{
string_fatal_error(_(L"%ls: Regular expression compilation failed at offset %d"),
argv0, int(err_offset));
return;
}
match = pcre2_match_data_create_from_pattern(regex, 0);
if (match == 0)
{
DIE_MEM();
}
}
virtual ~regex_replacer_t()
{
if (match != 0)
{
pcre2_match_data_free(match);
}
if (regex != 0)
{
pcre2_code_free(regex);
}
}
virtual ~regex_replacer_t() { }
bool replace_matches(const wchar_t *arg)
{
// A return value of true means all is well (even if no replacements
// were performed), false indicates an unrecoverable error.
if (regex == 0)
if (regex.code == 0)
{
// pcre2_compile() failed
return false;
}
if (!opts.all && nreplace > 0)
{
if (!opts.quiet)
{
stdout_buffer += arg;
stdout_buffer += L'\n';
}
return true;
}
uint32_t options = opts.all ? PCRE2_SUBSTITUTE_GLOBAL : 0;
int arglen = wcslen(arg);
PCRE2_SIZE outlen = (arglen == 0) ? 16 : 2 * arglen;
@ -872,14 +869,14 @@ public:
for (;;)
{
pcre2_rc = pcre2_substitute(
regex,
regex.code,
PCRE2_SPTR(arg),
arglen,
0, // start offset
options,
match,
regex.match,
0, // match context
PCRE2_SPTR(replacement),
PCRE2_SPTR(replacement.c_str()),
PCRE2_ZERO_TERMINATED,
(PCRE2_UCHAR *)output,
&outlen);
@ -896,7 +893,7 @@ public:
}
continue;
}
string_fatal_error(_(L"%ls: Replacement string too large"), argv0);
string_error(_(L"%ls: Replacement string too large\n"), argv0);
free(output);
return false;
}
@ -904,14 +901,10 @@ public:
}
bool rc = true;
if (pcre2_rc == PCRE2_ERROR_BADREPLACEMENT)
if (pcre2_rc < 0)
{
string_fatal_error(_(L"%ls: Invalid use of $ in replacement string"), argv0);
rc = false;
}
else if (pcre2_rc < 0)
{
string_fatal_error(_(L"%ls: Regular expression match error %d"), argv0, pcre2_rc);
string_error(_(L"%ls: Regular expression substitute error: %ls\n"),
argv0, pcre2_strerror(pcre2_rc));
rc = false;
}
else
@ -921,7 +914,7 @@ public:
stdout_buffer += output;
stdout_buffer += L'\n';
}
nreplace += pcre2_rc;
total_replaced += pcre2_rc;
}
free(output);
@ -974,7 +967,7 @@ static int string_replace(parser_t &parser, int argc, wchar_t **argv)
break;
case '?':
builtin_unknown_option(parser, argv[0], argv[w.woptind - 1]);
string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
return BUILTIN_STRING_ERROR;
}
}
@ -983,29 +976,29 @@ static int string_replace(parser_t &parser, int argc, wchar_t **argv)
const wchar_t *pattern, *replacement;
if ((pattern = string_get_arg_argv(&i, argv)) == 0)
{
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
return BUILTIN_STRING_ERROR;
}
if ((replacement = string_get_arg_argv(&i, argv)) == 0)
{
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
return BUILTIN_STRING_ERROR;
}
if (!isatty(builtin_stdin) && argc > i)
{
string_fatal_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
return BUILTIN_STRING_ERROR;
}
string_replacer_t *replacer;
if (regex)
{
replacer = new regex_replacer_t(argv[0], opts, pattern, replacement);
replacer = new regex_replacer_t(argv[0], pattern, replacement, opts);
}
else
{
replacer = new literal_replacer_t(argv[0], opts, pattern, replacement);
replacer = new literal_replacer_t(argv[0], pattern, replacement, opts);
}
const wchar_t *arg;
@ -1018,7 +1011,7 @@ static int string_replace(parser_t &parser, int argc, wchar_t **argv)
}
}
int rc = replacer->replace_count() > 0 ? 0 : 1;
int rc = replacer->replace_count() > 0 ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
delete replacer;
return rc;
}
@ -1034,7 +1027,7 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
{ 0, 0, 0, 0 }
};
int max = 0;
long max = LONG_MAX;
bool quiet = false;
bool right = false;
wgetopter_t w;
@ -1052,8 +1045,17 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
break;
case 'm':
max = int(wcstol(w.woptarg, 0, 10));
{
errno = 0;
wchar_t *endptr = 0;
max = wcstol(w.woptarg, &endptr, 10);
if (*endptr != L'\0' || errno != 0)
{
string_error(BUILTIN_ERR_NOT_NUMBER, argv[0], w.woptarg);
return BUILTIN_STRING_ERROR;
}
break;
}
case 'q':
quiet = true;
@ -1064,11 +1066,11 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
break;
case ':':
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
return BUILTIN_STRING_ERROR;
case '?':
builtin_unknown_option(parser, argv[0], argv[w.woptind - 1]);
string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
return BUILTIN_STRING_ERROR;
}
}
@ -1077,13 +1079,13 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
const wchar_t *sep;
if ((sep = string_get_arg_argv(&i, argv)) == 0)
{
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
return BUILTIN_STRING_ERROR;
}
if (!isatty(builtin_stdin) && argc > i)
{
string_fatal_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
return BUILTIN_STRING_ERROR;
}
@ -1095,14 +1097,16 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
{
while ((arg = string_get_arg(&i, argv)) != 0)
{
int nargsplit = 0;
if (seplen == 0)
{
// Split to individual characters
const wchar_t *cur = arg + wcslen(arg) - 1;
while (cur > arg && (max == 0 || nsplit < max))
while (cur > arg && nargsplit < max)
{
splits.push_front(wcstring(cur, 1));
cur--;
nargsplit++;
nsplit++;
}
splits.push_front(wcstring(arg, cur - arg + 1));
@ -1111,13 +1115,14 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
{
const wchar_t *end = arg + wcslen(arg);
const wchar_t *cur = end - seplen;
while (cur >= arg && (max == 0 || nsplit < max))
while (cur >= arg && nargsplit < max)
{
if (wcsncmp(cur, sep, seplen) == 0)
{
splits.push_front(wcstring(cur + seplen, end - cur - seplen));
end = cur;
cur -= seplen;
nargsplit++;
nsplit++;
}
else
@ -1134,14 +1139,16 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
while ((arg = string_get_arg(&i, argv)) != 0)
{
const wchar_t *cur = arg;
int nargsplit = 0;
if (seplen == 0)
{
// Split to individual characters
const wchar_t *last = arg + wcslen(arg) - 1;
while (cur < last && (max == 0 || nsplit < max))
while (cur < last && nargsplit < max)
{
splits.push_back(wcstring(cur, 1));
cur++;
nargsplit++;
nsplit++;
}
splits.push_back(cur);
@ -1150,7 +1157,7 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
{
while (cur != 0)
{
const wchar_t *ptr = (max > 0 && nsplit >= max) ? 0 : wcsstr(cur, sep);
const wchar_t *ptr = (nargsplit < max) ? wcsstr(cur, sep) : 0;
if (ptr == 0)
{
splits.push_back(cur);
@ -1160,6 +1167,7 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
{
splits.push_back(wcstring(cur, ptr - cur));
cur = ptr + seplen;
nargsplit++;
nsplit++;
}
}
@ -1178,7 +1186,7 @@ static int string_split(parser_t &parser, int argc, wchar_t **argv)
}
}
return (nsplit > 0) ? 0 : 1;
return (nsplit > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
}
static int string_sub(parser_t &parser, int argc, wchar_t **argv)
@ -1196,6 +1204,7 @@ static int string_sub(parser_t &parser, int argc, wchar_t **argv)
int length = -1;
bool quiet = false;
wgetopter_t w;
wchar_t *endptr = 0;
for (;;)
{
int c = w.wgetopt_long(argc, argv, short_options, long_options, 0);
@ -1210,10 +1219,16 @@ static int string_sub(parser_t &parser, int argc, wchar_t **argv)
break;
case 'l':
length = int(wcstol(w.woptarg, 0, 10));
errno = 0;
length = int(wcstol(w.woptarg, &endptr, 10));
if (*endptr != L'\0' || errno != 0)
{
string_error(BUILTIN_ERR_NOT_NUMBER, argv[0], w.woptarg);
return BUILTIN_STRING_ERROR;
}
if (length < 0)
{
string_fatal_error(_(L"%ls: Invalid length value\n"), argv[0]);
string_error(_(L"%ls: Invalid length value '%d'\n"), argv[0], length);
return BUILTIN_STRING_ERROR;
}
break;
@ -1223,20 +1238,26 @@ static int string_sub(parser_t &parser, int argc, wchar_t **argv)
break;
case 's':
start = int(wcstol(w.woptarg, 0, 10));
errno = 0;
start = int(wcstol(w.woptarg, &endptr, 10));
if (*endptr != L'\0' || errno != 0)
{
string_error(BUILTIN_ERR_NOT_NUMBER, argv[0], w.woptarg);
return BUILTIN_STRING_ERROR;
}
if (start == 0)
{
string_fatal_error(_(L"%ls: Invalid start value\n"), argv[0]);
string_error(_(L"%ls: Invalid start value '%d'\n"), argv[0], start);
return BUILTIN_STRING_ERROR;
}
break;
case ':':
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
return BUILTIN_STRING_ERROR;
case '?':
builtin_unknown_option(parser, argv[0], argv[w.woptind - 1]);
string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
return BUILTIN_STRING_ERROR;
}
}
@ -1244,7 +1265,7 @@ static int string_sub(parser_t &parser, int argc, wchar_t **argv)
int i = w.woptind;
if (!isatty(builtin_stdin) && argc > i)
{
string_fatal_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
return BUILTIN_STRING_ERROR;
}
@ -1286,7 +1307,7 @@ static int string_sub(parser_t &parser, int argc, wchar_t **argv)
nsub++;
}
return (nsub > 0) ? 0 : 1;
return (nsub > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
}
static int string_trim(parser_t &parser, int argc, wchar_t **argv)
@ -1335,11 +1356,11 @@ static int string_trim(parser_t &parser, int argc, wchar_t **argv)
break;
case ':':
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
return BUILTIN_STRING_ERROR;
case '?':
builtin_unknown_option(parser, argv[0], argv[w.woptind - 1]);
string_unknown_option(parser, argv[0], argv[w.woptind - 1]);
return BUILTIN_STRING_ERROR;
}
}
@ -1347,7 +1368,7 @@ static int string_trim(parser_t &parser, int argc, wchar_t **argv)
int i = w.woptind;
if (!isatty(builtin_stdin) && argc > i)
{
string_fatal_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
string_error(BUILTIN_ERR_TOO_MANY_ARGUMENTS, argv[0]);
return BUILTIN_STRING_ERROR;
}
@ -1380,7 +1401,7 @@ static int string_trim(parser_t &parser, int argc, wchar_t **argv)
}
}
return (ntrim > 0) ? 0 : 1;
return (ntrim > 0) ? BUILTIN_STRING_OK : BUILTIN_STRING_NONE;
}
static const struct string_subcommand
@ -1409,7 +1430,7 @@ string_subcommands[] =
int argc = builtin_count_args(argv);
if (argc <= 1)
{
string_fatal_error(BUILTIN_ERR_MISSING, argv[0]);
string_error(BUILTIN_ERR_MISSING, argv[0]);
builtin_print_help(parser, L"string", stderr_buffer);
return BUILTIN_STRING_ERROR;
}
@ -1427,7 +1448,7 @@ string_subcommands[] =
}
if (subcmd->handler == 0)
{
string_fatal_error(_(L"%ls: Unknown subcommand '%ls'"), argv[0], argv[1]);
string_error(_(L"%ls: Unknown subcommand '%ls'\n"), argv[0], argv[1]);
builtin_print_help(parser, L"string", stderr_buffer);
return BUILTIN_STRING_ERROR;
}

View file

@ -1106,7 +1106,7 @@ static wint_t string_last_char(const wcstring &str)
}
/* Given a null terminated string starting with a backslash, read the escape as if it is unquoted, appending to result. Return the number of characters consumed, or 0 on error */
static size_t read_unquoted_escape(const wchar_t *input, wcstring *result, bool allow_incomplete, bool unescape_special)
size_t read_unquoted_escape(const wchar_t *input, wcstring *result, bool allow_incomplete, bool unescape_special)
{
if (input[0] != L'\\')
{

View file

@ -825,6 +825,9 @@ wcstring escape_string(const wcstring &in, escape_flags_t flags);
character set.
*/
/** Given a null terminated string starting with a backslash, read the escape as if it is unquoted, appending to result. Return the number of characters consumed, or 0 on error */
size_t read_unquoted_escape(const wchar_t *input, wcstring *result, bool allow_incomplete, bool unescape_special);
/** Unescapes a string in-place. A true result indicates the string was unescaped, a false result indicates the string was unmodified. */
bool unescape_string_in_place(wcstring *str, unescape_flags_t escape_special);

View file

@ -4064,7 +4064,7 @@ static void test_string(void)
{ {L"string", L"escape", L"hello", L"world", 0}, 0, L"hello\nworld\n" },
{ {L"string", L"escape", L"-n", L"~", 0}, 0, L"\\~\n" },
{ {L"string", L"join", 0}, 1, L"" },
{ {L"string", L"join", 0}, 2, L"" },
{ {L"string", L"join", L"", 0}, 1, L"" },
{ {L"string", L"join", L"", L"", L"", L"", 0}, 0, L"\n" },
{ {L"string", L"join", L"", L"a", L"b", L"c", 0}, 0, L"abc\n" },
@ -4072,7 +4072,7 @@ static void test_string(void)
{ {L"string", L"join", L"/", L"usr", 0}, 1, L"usr\n" },
{ {L"string", L"join", L"/", L"usr", L"local", L"bin", 0}, 0, L"usr/local/bin\n" },
{ {L"string", L"join", L"...", L"3", L"2", L"1", 0}, 0, L"3...2...1\n" },
{ {L"string", L"join", L"-q", 0}, 1, L"" },
{ {L"string", L"join", L"-q", 0}, 2, L"" },
{ {L"string", L"join", L"-q", L".", 0}, 1, L"" },
{ {L"string", L"join", L"-q", L".", L".", 0}, 1, L"" },
@ -4087,7 +4087,7 @@ static void test_string(void)
{ {L"string", L"length", L"-q", L"", 0}, 1, L"" },
{ {L"string", L"length", L"-q", L"a", 0}, 0, L"" },
{ {L"string", L"match", 0}, 1, L"" },
{ {L"string", L"match", 0}, 2, L"" },
{ {L"string", L"match", L"", 0}, 1, L"" },
{ {L"string", L"match", L"", L"", 0}, 0, L"\n" },
{ {L"string", L"match", L"?", L"a", 0}, 0, L"a\n" },
@ -4131,12 +4131,13 @@ static void test_string(void)
{ {L"string", L"match", L"0x[0-9a-fA-F][0-9a-fA-F]", L"0xbad", 0}, 1, L"" },
{ {L"string", L"match", L"-a", L"*", L"ab", L"cde", 0}, 0, L"ab\ncde\n" },
{ {L"string", L"match", L"*", L"ab", L"cde", 0}, 0, L"ab\n" },
{ {L"string", L"match", L"-n", L"*d*", L"cde", 0}, 0, L"1\n" },
{ {L"string", L"match", L"*", L"ab", L"cde", 0}, 0, L"ab\ncde\n" },
{ {L"string", L"match", L"-n", L"*d*", L"cde", 0}, 0, L"1 3\n" },
{ {L"string", L"match", L"-n", L"*x*", L"cde", 0}, 1, L"" },
{ {L"string", L"match", L"-q", L"a*", L"b", L"c", 0}, 1, L"" },
{ {L"string", L"match", L"-q", L"a*", L"b", L"a", 0}, 0, L"" },
{ {L"string", L"match", L"-r", 0}, 1, L"" },
{ {L"string", L"match", L"-r", 0}, 2, L"" },
{ {L"string", L"match", L"-r", L"", 0}, 1, L"" },
{ {L"string", L"match", L"-r", L"", L"", 0}, 0, L"\n" },
{ {L"string", L"match", L"-r", L".", L"a", 0}, 0, L"a\n" },
@ -4145,8 +4146,8 @@ static void test_string(void)
{ {L"string", L"match", L"-r", L"a*b", L"aab", 0}, 0, L"aab\n" },
{ {L"string", L"match", L"-r", L"-i", L"a*b", L"Aab", 0}, 0, L"Aab\n" },
{ {L"string", L"match", L"-r", L"-a", L"a[bc]", L"abadac", 0}, 0, L"ab\nac\n" },
{ {L"string", L"match", L"-r", L"-a", L"a", L"x", L"a", L"x", L"a", 0}, 0, L"a\na\n" },
{ {L"string", L"match", L"-r", L"a", L"x", L"a", L"x", L"a", 0}, 0, L"a\n" },
{ {L"string", L"match", L"-r", L"a", L"xaxa", L"axax", 0}, 0, L"a\na\n" },
{ {L"string", L"match", L"-r", L"-a", L"a", L"xaxa", L"axax", 0}, 0, L"a\na\na\na\n" },
{ {L"string", L"match", L"-r", L"a[bc]", L"abadac", 0}, 0, L"ab\n" },
{ {L"string", L"match", L"-r", L"-q", L"a[bc]", L"abadac", 0}, 0, L"" },
{ {L"string", L"match", L"-r", L"-q", L"a[bc]", L"ad", 0}, 1, L"" },
@ -4154,26 +4155,27 @@ static void test_string(void)
{ {L"string", L"match", L"-r", L"-a", L"(a)b(c)", L"abcabc", 0}, 0, L"abc\na\nc\nabc\na\nc\n" },
{ {L"string", L"match", L"-r", L"(a)b(c)", L"abcabc", 0}, 0, L"abc\na\nc\n" },
{ {L"string", L"match", L"-r", L"(a|(z))(bc)", L"abc", 0}, 0, L"abc\na\nbc\n" },
{ {L"string", L"match", L"-r", L"-n", L"a", L"a", 0}, 0, L"1\n" },
{ {L"string", L"match", L"-r", L"-n", L"-a", L"a", L"bacadae", 0}, 0, L"2\n4\n6\n" },
{ {L"string", L"match", L"-r", L"-n", L"(a).*(b)", L"a---b", 0}, 0, L"1\n1\n5\n" },
{ {L"string", L"match", L"-r", L"-n", L"(a)(b)", L"ab", 0}, 0, L"1\n1\n2\n" },
{ {L"string", L"match", L"-r", L"-n", L"(a)(b)", L"abab", 0}, 0, L"1\n1\n2\n" },
{ {L"string", L"match", L"-r", L"-n", L"-a", L"(a)(b)", L"abab", 0}, 0, L"1\n1\n2\n3\n3\n4\n" },
{ {L"string", L"match", L"-r", L"*", L"", 0}, 1, L"" },
{ {L"string", L"match", L"-r", L"-n", L"a", L"ada", L"dad", 0}, 0, L"1 1\n2 1\n" },
{ {L"string", L"match", L"-r", L"-n", L"-a", L"a", L"bacadae", 0}, 0, L"2 1\n4 1\n6 1\n" },
{ {L"string", L"match", L"-r", L"-n", L"(a).*(b)", L"a---b", 0}, 0, L"1 5\n1 1\n5 1\n" },
{ {L"string", L"match", L"-r", L"-n", L"(a)(b)", L"ab", 0}, 0, L"1 2\n1 1\n2 1\n" },
{ {L"string", L"match", L"-r", L"-n", L"(a)(b)", L"abab", 0}, 0, L"1 2\n1 1\n2 1\n" },
{ {L"string", L"match", L"-r", L"-n", L"-a", L"(a)(b)", L"abab", 0}, 0, L"1 2\n1 1\n2 1\n3 2\n3 1\n4 1\n" },
{ {L"string", L"match", L"-r", L"*", L"", 0}, 2, L"" },
{ {L"string", L"match", L"-r", L"foo\\Kbar", L"foobar", 0}, 0, L"bar\n" },
{ {L"string", L"match", L"-r", L"(foo)\\Kbar", L"foobar", 0}, 0, L"bar\nfoo\n" },
{ {L"string", L"match", L"-r", L"(?=ab\\K)", L"ab", 0}, 0, L"\n" },
{ {L"string", L"match", L"-r", L"(?=ab\\K)..(?=cd\\K)", L"abcd", 0}, 0, L"\n" },
{ {L"string", L"replace", 0}, 1, L"" },
{ {L"string", L"replace", L"", 0}, 1, L"" },
{ {L"string", L"replace", 0}, 2, L"" },
{ {L"string", L"replace", L"", 0}, 2, L"" },
{ {L"string", L"replace", L"", L"", 0}, 1, L"" },
{ {L"string", L"replace", L"", L"", L"", 0}, 1, L"\n" },
{ {L"string", L"replace", L"", L"", L" ", 0}, 1, L" \n" },
{ {L"string", L"replace", L"a", L"b", L"", 0}, 1, L"\n" },
{ {L"string", L"replace", L"a", L"b", L"a", 0}, 0, L"b\n" },
{ {L"string", L"replace", L"a", L"b", L"xax", 0}, 0, L"xbx\n" },
{ {L"string", L"replace", L"a", L"b", L"xax", L"axa", 0}, 0, L"xbx\nbxa\n" },
{ {L"string", L"replace", L"bar", L"x", L"red barn", 0}, 0, L"red xn\n" },
{ {L"string", L"replace", L"x", L"bar", L"red xn", 0}, 0, L"red barn\n" },
{ {L"string", L"replace", L"--", L"x", L"-", L"xyz", 0}, 0, L"-yz\n" },
@ -4186,9 +4188,10 @@ static void test_string(void)
{ {L"string", L"replace", L"-a", L"x", L"", L"xxx", 0}, 0, L"\n" },
{ {L"string", L"replace", L"-a", L"***", L"_", L"*****", 0}, 0, L"_**\n" },
{ {L"string", L"replace", L"-a", L"***", L"***", L"******", 0}, 0, L"******\n" },
{ {L"string", L"replace", L"-a", L"a", L"b", L"xax", L"axa", 0}, 0, L"xbx\nbxb\n" },
{ {L"string", L"replace", L"-r", 0}, 1, L"" },
{ {L"string", L"replace", L"-r", L"", 0}, 1, L"" },
{ {L"string", L"replace", L"-r", 0}, 2, L"" },
{ {L"string", L"replace", L"-r", L"", 0}, 2, L"" },
{ {L"string", L"replace", L"-r", L"", L"", 0}, 1, L"" },
{ {L"string", L"replace", L"-r", L"", L"", L"", 0}, 0, L"\n" }, // pcre2 behavior
{ {L"string", L"replace", L"-r", L"", L"", L" ", 0}, 0, L" \n" }, // pcre2 behavior
@ -4202,22 +4205,24 @@ static void test_string(void)
{ {L"string", L"replace", L"-r", L"-a", L"(\\w)", L"$1$1", L"ab", 0}, 0, L"aabb\n" },
{ {L"string", L"replace", L"-r", L"-a", L".", L"", L"abc", 0}, 0, L"\n" },
{ {L"string", L"replace", L"-r", L"a", L"x", L"bc", L"cd", L"de", 0}, 1, L"bc\ncd\nde\n" },
{ {L"string", L"replace", L"-r", L"a", L"x", L"bc", L"ca", L"ab", 0}, 0, L"bc\ncx\nab\n" },
{ {L"string", L"replace", L"-r", L"-a", L"a", L"x", L"bc", L"ca", L"ab", 0}, 0, L"bc\ncx\nxb\n" },
{ {L"string", L"replace", L"-r", L"a", L"x", L"aba", L"caa", 0}, 0, L"xba\ncxa\n" },
{ {L"string", L"replace", L"-r", L"-a", L"a", L"x", L"aba", L"caa", 0}, 0, L"xbx\ncxx\n" },
{ {L"string", L"replace", L"-r", L"-i", L"A", L"b", L"xax", 0}, 0, L"xbx\n" },
{ {L"string", L"replace", L"-r", L"-i", L"[a-z]", L".", L"1A2B", 0}, 0, L"1.2B\n" },
{ {L"string", L"replace", L"-r", L"A", L"b", L"xax", 0}, 1, L"xax\n" },
{ {L"string", L"replace", L"-r", L"a", L"$1", L"a", 0}, 1, L"" },
{ {L"string", L"replace", L"-r", L"(a)", L"$2", L"a", 0}, 1, L"" },
{ {L"string", L"replace", L"-r", L"*", L".", L"a", 0}, 1, L"" },
{ {L"string", L"replace", L"-r", L"a", L"$1", L"a", 0}, 2, L"" },
{ {L"string", L"replace", L"-r", L"(a)", L"$2", L"a", 0}, 2, L"" },
{ {L"string", L"replace", L"-r", L"*", L".", L"a", 0}, 2, L"" },
{ {L"string", L"replace", L"-r", L"^(.)", L"\t$1", L"abc", L"x", 0}, 0, L"\tabc\n\tx\n" },
{ {L"string", L"split", 0}, 1, L"" },
{ {L"string", L"split", 0}, 2, L"" },
{ {L"string", L"split", L":", 0}, 1, L"" },
{ {L"string", L"split", L".", L"www.ch.ic.ac.uk", 0}, 0, L"www\nch\nic\nac\nuk\n" },
{ {L"string", L"split", L"..", L"....", 0}, 0, L"\n\n\n" },
{ {L"string", L"split", L"-m", L"x", L"..", L"....", 0}, 2, L"" },
{ {L"string", L"split", L"-m1", L"..", L"....", 0}, 0, L"\n..\n" },
{ {L"string", L"split", L"-m0", L"/", L"/usr/local/bin/fish", 0}, 0, L"\nusr\nlocal\nbin\nfish\n" },
{ {L"string", L"split", L"-m2", L":", L"a:b", L"c:d", L"e:f", 0}, 0, L"a\nb\nc\nd\ne:f\n" },
{ {L"string", L"split", L"-m0", L"/", L"/usr/local/bin/fish", 0}, 1, L"/usr/local/bin/fish\n" },
{ {L"string", L"split", L"-m2", L":", L"a:b:c:d", L"e:f:g:h", 0}, 0, L"a\nb\nc:d\ne\nf\ng:h\n" },
{ {L"string", L"split", L"-m1", L"-r", L"/", L"/usr/local/bin/fish", 0}, 0, L"/usr/local/bin\nfish\n" },
{ {L"string", L"split", L"-r", L".", L"www.ch.ic.ac.uk", 0}, 0, L"www\nch\nic\nac\nuk\n" },
{ {L"string", L"split", L"--", L"--", L"a--b---c----d", 0}, 0, L"a\nb\n-c\n\nd\n" },
@ -4233,18 +4238,20 @@ static void test_string(void)
{ {L"string", L"split", L"-r", L"", L"ab", 0}, 0, L"a\nb\n" },
{ {L"string", L"split", L"-r", L"", L"abc", 0}, 0, L"a\nb\nc\n" },
{ {L"string", L"split", L"-r", L"-m1", L"", L"abc", 0}, 0, L"ab\nc\n" },
{ {L"string", L"split", L"-q", 0}, 1, L"" },
{ {L"string", L"split", L"-q", 0}, 2, L"" },
{ {L"string", L"split", L"-q", L":", 0}, 1, L"" },
{ {L"string", L"split", L"-q", L"x", L"axbxc", 0}, 0, L"" },
{ {L"string", L"sub", 0}, 1, L"" },
{ {L"string", L"sub", L"abcde", 0}, 0, L"abcde\n"},
{ {L"string", L"sub", L"-l", L"x", L"abcde", 0}, 2, L""},
{ {L"string", L"sub", L"-s", L"x", L"abcde", 0}, 2, L""},
{ {L"string", L"sub", L"-l0", L"abcde", 0}, 0, L"\n"},
{ {L"string", L"sub", L"-l2", L"abcde", 0}, 0, L"ab\n"},
{ {L"string", L"sub", L"-l5", L"abcde", 0}, 0, L"abcde\n"},
{ {L"string", L"sub", L"-l6", L"abcde", 0}, 0, L"abcde\n"},
{ {L"string", L"sub", L"-l-1", L"abcde", 0}, 1, L""},
{ {L"string", L"sub", L"-s0", L"abcde", 0}, 1, L""},
{ {L"string", L"sub", L"-l-1", L"abcde", 0}, 2, L""},
{ {L"string", L"sub", L"-s0", L"abcde", 0}, 2, L""},
{ {L"string", L"sub", L"-s1", L"abcde", 0}, 0, L"abcde\n"},
{ {L"string", L"sub", L"-s5", L"abcde", 0}, 0, L"e\n"},
{ {L"string", L"sub", L"-s6", L"abcde", 0}, 0, L"\n"},