Improves startup time when using std-lib (#13842)
Updated summary for commit
[612e0e2](https://github.com/nushell/nushell/pull/13842/commits/612e0e21602f55092bc121bfd07f7c3bf5119e4f)
- While folks are welcome to read through the entire comments, the core
information is summarized here.
# Description
This PR drastically improves startup times of Nushell by only parsing a
single submodule of the Standard Library that provides the `banner` and
`pwd` commands. All other Standard Library commands and submodules are
parsed when imported by the user. This cuts startup times by more than
60%.
At the moment, we have stopped adding to `std-lib` because every
addition adds a small amount to the Nushell startup time.
With this change, we should once again be able to allow new
functionality to be added to the Standard Library without it impacting
`nu` startup times.
# User-Facing Changes
* Nushell now starts about 60% faster
* Breaking change: The `dirs` (Shells) aliases will return a warning
message that it will not be auto-loaded in the following release, along
with instructions on how to restore it (and disable the message)
* The `use std <submodule> *` syntax is available for convenience, but
should be avoided in scripts as it parses the entire `std` module and
all other submodules and places it in scope. The correct syntax to
*just* load a submodule is `use std/<submodule> *` (asterisk optional).
The slash is important. This will be documented.
* `use std *` can be used for convenience to load all of the library but
still incurs the full loading-time.
* `std/dirs`: Semi-breaking change. The `dirs` command replaces the
`show` command. This is more in line with the directory-stack
functionality found in other shells. Existing users will not be impacted
by this as the alias (`shells`) remains the same.
* Breaking-change: Technically a breaking change, but probably only
impacts maintainers of `std`. The virtual path for the standard library
has changed. It could previously be imported using its virtual path (and
technically, this would have been the correct way to do it):
```nu
use NU_STDLIB_VIRTUAL_DIR/std
```
The path is now simply `std/`:
```nu
use std
```
All submodules have moved accordingly.
# Timings
Comparisons below were made:
* In a temporary, clean config directory using `$env.XDG_CONFIG_HOME =
(mktemp -d)`.
* `nu` was run with a release build
* `nu` was run one time to generate the default `config.nu` (etc.) files
- Otherwise timings would include the user-prompt
* The shell was exited and then restarted several times to get timing
samples
(Note: Old timings based on 0.97 rather than 0.98, but in the range of
being accurate)
| Scenario | `$nu.startup-time` |
| --- | --- |
| 0.97.2
([aaaab8e](https://github.com/nushell/nushell/commit/aaaab8e070c644a87bbd7682099e3fe9e6a4b42a))
Without this PR | 23ms - 24ms |
| This PR with deprecated commands | 9ms - <11ms |
| This PR after deprecated commands are removed in following release |
8ms - <10ms |
| Final PR (remove deprecated), using `--no-std-lib` | 6.1ms to 6.4ms |
| Final PR (remove deprecated), using `--no-config-file` | 3.1ms - 3.6ms
|
| Final PR (remove deprecated), using `--no-config-file --no-std-lib` |
1ms - 1.5ms |
*These last two timings point to the opportunity for further
optimization (see comment in thread below (will link once I write it).*
# Implementation details for future maintenance
* `use std banner` is a ridiculously deceptive call. That call parses
and imports *all* of `std` into scope. Simply replacing it with `use
std/core *` is essentially what saves ~14-15ms. This *only* imports the
submodule with the `banner` and `pwd` commands.
* From the code-comments, the reason that `NU_STDLIB_VIRTUAL_DIR` was
used as a prefix was so that there wouldn't be an issue if a user had a
`./std/mod.nu` in the current directory. This does **not** appear to be
an issue. After removing the prefix, I tested with both a relative
module as well as one in the `$env.NU_LIB_DIRS` path, and in all cases
the *internal* `std` still took precedence.
* By removing the prefix, users can now `use std` (and variants) without
requiring that it already be parsed and in scope.
* In the next release, we'll stop autoloading the `dirs` (shells)
functionality. While this only costs an additional 1-1.5ms, I think it's
better moved to the `config.nu` where the user can optionally remove it.
The main reason is its use of aliases (which have also caused issues) -
The `n`, `p`, and `g` short-commands are valuable real-estate, and users
may want to map these to something else.
For this release, there's an `deprecated_dirs` module that is still
autoloaded. As with the top-level commands, use of these will give a
deprecation warning with instructions on how to handle going forward.
To help with this, moved the aliases to their own submodule inside the
`dirs` module.
* Also sneaks in a small change where the top-level `dirs` command is
now the replacement for `dirs show`
* Fixed a double-import of `assert` in `dirs.nu`
* The `show_banner` step is replaced with simply `banner` rather than
re-importing it.
* A `virtual_path` may now be referenced with either a forward-slash or
a backward-slash on Windows. This allows `use std/<submodule>` to work
on all platforms.
# Performance side-notes:
* Future parsing and/or IR improvements should improve performance even
further.
* While the existing load time penalty of `std-lib` was not noticeable
on many systems, Nushell runs on a wide-variety of hardware and OS
platforms. Slower platforms will naturally see a bigger jump in
performance here. For users starting multiple Nushell sessions
frequently (e.g., `tmux`, Zellij, `screen`, et. al.) it is recommended
to keep total startup time (including user configuration) under ~250ms.
# Tests + Formatting
* All tests are green
* Updated tests:
- Removed the test that confirmed that `std` was loaded (since we
don't).
- Removed the `shells` test since it is not autoloaded. Main `dirs.nu`
functionality is tested through `stdlib-test`.
- Many tests assumed that the library was fully loaded, because it was
(even though we didn't intend for it to be). Fixed those tests.
- Tests now import only the necessary submodules (e.g., `use
std/assert`, rather than `use std assert`)
- Some tests *thought* they were loading `std/log`, but were doing so
improperly. This was masked by the now-fixed "load-everything-into-scope
bug". Local CI would pass due the `$env.NU_LOG_<...>` variables being
inherited from the calling process, but would fail in the "clean" GitHub
CI environment. These tests have also been fixed.
* Added additional tests for the changes
# After Submitting
Will update the Standard Library doc page
2024-10-03 11:28:22 +00:00
use std/log
export-env {
# Place NU_FORMAT... environment variables in module-scope
export use std/log *
}
2023-06-10 18:16:17 +00:00
2023-07-08 09:27:56 +00:00
def "nu-complete threads" [] {
2024-05-06 23:20:27 +00:00
seq 1 (sys cpu | length)
2023-07-08 09:27:56 +00:00
}
2023-06-10 18:16:17 +00:00
2023-07-02 08:41:33 +00:00
# Here we store the map of annotations internal names and the annotation actually used during test creation
# The reason we do that is to allow annotations to be easily renamed without modifying rest of the code
# Functions with no annotations or with annotations not on the list are rejected during module evaluation
# test and test-skip annotations may be used multiple times throughout the module as the function names are stored in a list
# Other annotations should only be used once within a module file
# If you find yourself in need of multiple before- or after- functions it's a sign your test suite probably needs redesign
def valid-annotations [] {
{
"#[test]": "test",
"#[ignore]": "test-skip",
"#[before-each]": "before-each"
"#[before-all]": "before-all"
"#[after-each]": "after-each"
"#[after-all]": "after-all"
}
}
# Returns a table containing the list of function names together with their annotations (comments above the declaration)
def get-annotated [
file: path
Add run-time type checking for command pipeline input (#14741)
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx
you can also mention related issues, PRs or discussions!
-->
# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.
Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR adds type checking of all command input types at run-time.
Generally, these errors should be caught by the parser, but sometimes we
can't know the type of a value at parse-time. The simplest example is
using the `echo` command, which has an output type of `any`, so
prefixing a literal with `echo` will bypass parse-time type checking.
Before this PR, each command has to individually check its input types.
This can result in scenarios where the input/output types don't match
the actual command behavior. This can cause valid usage with an
non-`any` type to become a parse-time error if a command is missing that
type in its pipeline input/output (`drop nth` and `history import` do
this before this PR). Alternatively, a command may not list a type in
its input/output types, but doesn't actually reject that type in its
code, which can have unintended side effects (`get` does this on an
empty pipeline input, and `sort` used to before #13154).
After this PR, the type of the pipeline input is checked to ensure it
matches one of the input types listed in the proceeding command's
input/output types. While each of the issues in the "before this PR"
section could be addressed with each command individually, this PR
solves this issue for _all_ commands.
**This will likely cause some breakage**, as some commands have
incorrect input/output types, and should be adjusted. Also, some scripts
may have erroneous usage of commands. In writing this PR, I discovered
that `toolkit.nu` was passing `null` values to `str join`, which doesn't
accept nothing types (if folks think it should, we can adjust it in this
PR or in a different PR). I found some issues in the standard library
and its tests. I also found that carapace's vendor script had an
incorrect chaining of `get -i`:
```nushell
let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion)
```
Before this PR, if the `get -i 0` ever actually did evaluate to `null`,
the second `get` invocation would error since `get` doesn't operate on
`null` values. After this PR, this is immediately a run-time error,
alerting the user to the problematic code. As a side note, we'll need to
PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to
carapace.
A notable exception to the type checking is commands with input type of
`nothing -> <type>`. In this case, any input type is allowed. This
allows piping values into the command without an error being thrown. For
example, `123 | echo $in` would be an error without this exception.
Additionally, custom types bypass type checking (I believe this also
happens during parsing, but not certain)
I added a `is_subtype` method to `Value` and `PipelineData`. It
functions slightly differently than `get_type().is_subtype()`, as noted
in the doccomments. Notably, it respects structural typing of lists and
tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]`
is a subtype of `table<a: int>`, whereas the type returned by
`Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some
special handling for `ListStream`s and `ByteStream`s. The latter was
needed for this PR to work properly with external commands.
Here's some examples.
Before:
```nu
1..2 | drop nth 1
Error: nu::parser::input_type_mismatch
× Command does not support range input.
╭─[entry #9:1:8]
1 │ 1..2 | drop nth 1
· ────┬───
· ╰── command doesn't support range input
╰────
echo 1..2 | drop nth 1
# => ╭───┬───╮
# => │ 0 │ 1 │
# => ╰───┴───╯
```
After this PR, I've adjusted `drop nth`'s input/output types to accept
range input.
Before this PR, zip accepted any value despite not being listed in its
input/output types. This caused different behavior depending on if you
triggered a parse error or not:
```nushell
1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# =>
# => × Command does not support int input.
# => ╭─[entry #3:1:5]
# => 1 │ 1 | zip [2]
# => · ─┬─
# => · ╰── command doesn't support int input
# => ╰────
echo 1 | zip [2]
# => ╭───┬───────────╮
# => │ 0 │ ╭───┬───╮ │
# => │ │ │ 0 │ 1 │ │
# => │ │ │ 1 │ 2 │ │
# => │ │ ╰───┴───╯ │
# => ╰───┴───────────╯
```
After this PR, it works the same in both cases. For cases like this, if
we do decide we want `zip` or other commands to accept any input value,
then we should explicitly add that to the input types.
```nushell
1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# =>
# => × Command does not support int input.
# => ╭─[entry #3:1:5]
# => 1 │ 1 | zip [2]
# => · ─┬─
# => · ╰── command doesn't support int input
# => ╰────
echo 1 | zip [2]
# => Error: nu::shell::only_supports_this_input_type
# =>
# => × Input type not supported.
# => ╭─[entry #14:2:6]
# => 2 │ echo 1 | zip [2]
# => · ┬ ─┬─
# => · │ ╰── only list<any> and range input data is supported
# => · ╰── input type: int
# => ╰────
```
# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
**Breaking change**: The type of a command's input is now checked
against the input/output types of that command at run-time. While these
errors should mostly be caught at parse-time, in cases where they can't
be detected at parse-time they will be caught at run-time instead. This
applies to both internal commands and custom commands.
Example function and corresponding parse-time error (same before and
after PR):
```nushell
def foo []: int -> nothing {
print $"my cool int is ($in)"
}
1 | foo
# => my cool int is 1
"evil string" | foo
# => Error: nu::parser::input_type_mismatch
# =>
# => × Command does not support string input.
# => ╭─[entry #16:1:17]
# => 1 │ "evil string" | foo
# => · ─┬─
# => · ╰── command doesn't support string input
# => ╰────
# =>
```
Before:
```nu
echo "evil string" | foo
# => my cool int is evil string
```
After:
```nu
echo "evil string" | foo
# => Error: nu::shell::only_supports_this_input_type
# =>
# => × Input type not supported.
# => ╭─[entry #17:1:6]
# => 1 │ echo "evil string" | foo
# => · ──────┬────── ─┬─
# => · │ ╰── only int input data is supported
# => · ╰── input type: string
# => ╰────
```
Known affected internal commands which erroneously accepted any type:
* `str join`
* `zip`
* `reduce`
# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.
Make sure you've run and fixed any issues with these commands:
- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library
> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
- :green_circle: `toolkit fmt`
- :green_circle: `toolkit clippy`
- :green_circle: `toolkit test`
- :green_circle: `toolkit test stdlib`
# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
* Play whack-a-mole with the commands and scripts this will inevitably
break
2025-01-08 22:09:47 +00:00
]: nothing -> table<function_name: string, annotation: string> {
2023-10-01 13:37:51 +00:00
let raw_file = (
open $file
| lines
2023-07-02 08:41:33 +00:00
| enumerate
| flatten
)
2023-10-01 13:37:51 +00:00
$raw_file
| where item starts-with def and index > 0
2023-07-02 08:41:33 +00:00
| insert annotation {|x|
2023-10-01 13:37:51 +00:00
$raw_file
| get ($x.index - 1)
| get item
| str trim
2023-07-02 08:41:33 +00:00
}
2023-10-01 13:37:51 +00:00
| where annotation in (valid-annotations|columns)
| reject index
| update item {
2023-10-19 20:08:09 +00:00
split column --collapse-empty ' '
2023-10-01 13:37:51 +00:00
| get column2.0
}
| rename function_name
2023-07-02 08:41:33 +00:00
}
# Takes table of function names and their annotations such as the one returned by get-annotated
#
# Returns a record where keys are internal names of valid annotations and values are corresponding function names
# Annotations that allow multiple functions are of type list<string>
# Other annotations are of type string
# Result gets merged with the template record so that the output shape remains consistent regardless of the table content
Fix silent failure of parsing input output types (#14510)
- This PR should fix/close:
- #11266
- #12893
- #13736
- #13748
- #14170
- It doesn't fix #13736 though unfortunately. The issue there is at a
different level to this fix (I think probably in the lexing somewhere,
which I haven't touched).
# The Problem
The linked issues have many examples of the problem and the related
confusion it causes, but I'll give some more examples here for
illustration. It boils down to the following:
This doesn't type check (good):
```nu
def foo []: string -> int { false }
```
This does (bad):
```nu
def foo [] : string -> int { false }
```
Because the parser is completely ignoring all the characters. This also
compiles in 0.100.0:
```nu
def blue [] Da ba Dee da Ba da { false }
```
And this also means commands which have a completely fine type, but an
extra space before `:`, lose that type information and end up as `any ->
any`, e.g.
```nu
def foo [] : int -> int {$in + 3}
```
```bash
$ foo --help
Input/output types:
╭───┬───────┬────────╮
│ # │ input │ output │
├───┼───────┼────────┤
│ 0 │ any │ any │
╰───┴───────┴────────╯
```
# The Fix
Special thank you to @texastoland whose draft PR (#12358) I referenced
heavily while making this fix.
That PR seeks to fix the invalid parsing by disallowing whitespace
between `[]` and `:` in declarations, e.g. `def foo [] : int -> any {}`
This PR instead allows the whitespace while properly parsing the type
signature. I think this is the better choice for a few reasons:
- The parsing is still straightforward and the information is all there
anyway,
- It's more consistent with type annotations in other places, e.g. `do
{|nums : list<int>| $nums | describe} [ 1 2 3 ]` from the [Type
Signatures doc
page](https://www.nushell.sh/lang-guide/chapters/types/type_signatures.html)
- It's more consistent with the new nu parser, which allows `let x :
bool = false` (current nu doesn't, but this PR doesn't change that)
- It will be less disruptive and should only break code where the types
are actually wrong (if your types were correct, but you had a space
before the `:`, those declarations will still compile and now have more
type information vs. throwing an error in all cases and requiring spaces
to be deleted)
- It's the more intuitive syntax for most functional programmers like
myself (haskell/lean/coq/agda and many more either allow or require
whitespace for type annotations)
I don't use Rust a lot, so I tried to keep most things the same and the
rest I wrote as if it was Haskell (if you squint a bit). Code
review/suggestions very welcome. I added all the tests I could think of
and `toolkit check pr` gives it the all-clear.
# User-Facing Changes
This PR meets part of the goal of #13849, but doesn't do anything about
parsing signatures twice and doesn't do much to improve error messages,
it just enforces the existing errors and error messages.
This will no doubt be a breaking change, mostly because the code is
already broken and users don't realise yet (one of my personal scripts
stopped compiling after this fix because I thought `def foo [] -> string
{}` was valid syntax). It shouldn't break any type-correct code though.
2024-12-07 15:55:15 +00:00
def create-test-record []: nothing -> record<before-each: string, after-each: string, before-all: string, after-all: string, test: list<string>, test-skip: list<string>> {
2023-07-02 08:41:33 +00:00
let input = $in
let template_record = {
before-each: '',
before-all: '',
after-each: '',
after-all: '',
test-skip: []
}
let test_record = (
$input
| update annotation {|x|
valid-annotations
| get $x.annotation
}
2023-10-28 19:15:14 +00:00
| group-by --to-table annotation
| update items {|x|
$x.items.function_name
add multiple grouper support to `group-by` (#14337)
- closes #14330
Related:
- #2607
- #14019
- #14316
# Description
This PR changes `group-by` to support grouping by multiple `grouper`
arguments.
# Changes
- No grouper: no change in behavior
- Single grouper
- `--to-table=false`: no change in behavior
- `--to-table=true`:
- closure grouper: named group0
- cell-path grouper: named after the cell-path
- Multiple groupers:
- `--to-table=false`: nested groups
- `--to-table=true`: one column for each grouper argument, followed by
the `items` column
- columns corresponding to cell-paths are named after them
- columns corresponding to closure groupers are named `group{i}` where
`i` is the index of the grouper argument
# Examples
```nushell
> [1 3 1 3 2 1 1] | group-by
╭───┬───────────╮
│ │ ╭───┬───╮ │
│ 1 │ │ 0 │ 1 │ │
│ │ │ 1 │ 1 │ │
│ │ │ 2 │ 1 │ │
│ │ │ 3 │ 1 │ │
│ │ ╰───┴───╯ │
│ │ ╭───┬───╮ │
│ 3 │ │ 0 │ 3 │ │
│ │ │ 1 │ 3 │ │
│ │ ╰───┴───╯ │
│ │ ╭───┬───╮ │
│ 2 │ │ 0 │ 2 │ │
│ │ ╰───┴───╯ │
╰───┴───────────╯
> [1 3 1 3 2 1 1] | group-by --to-table
╭─#─┬─group─┬───items───╮
│ 0 │ 1 │ ╭───┬───╮ │
│ │ │ │ 0 │ 1 │ │
│ │ │ │ 1 │ 1 │ │
│ │ │ │ 2 │ 1 │ │
│ │ │ │ 3 │ 1 │ │
│ │ │ ╰───┴───╯ │
│ 1 │ 3 │ ╭───┬───╮ │
│ │ │ │ 0 │ 3 │ │
│ │ │ │ 1 │ 3 │ │
│ │ │ ╰───┴───╯ │
│ 2 │ 2 │ ╭───┬───╮ │
│ │ │ │ 0 │ 2 │ │
│ │ │ ╰───┴───╯ │
╰─#─┴─group─┴───items───╯
> [1 3 1 3 2 1 1] | group-by { $in >= 2 }
╭───────┬───────────╮
│ │ ╭───┬───╮ │
│ false │ │ 0 │ 1 │ │
│ │ │ 1 │ 1 │ │
│ │ │ 2 │ 1 │ │
│ │ │ 3 │ 1 │ │
│ │ ╰───┴───╯ │
│ │ ╭───┬───╮ │
│ true │ │ 0 │ 3 │ │
│ │ │ 1 │ 3 │ │
│ │ │ 2 │ 2 │ │
│ │ ╰───┴───╯ │
╰───────┴───────────╯
> [1 3 1 3 2 1 1] | group-by { $in >= 2 } --to-table
╭─#─┬─group0─┬───items───╮
│ 0 │ false │ ╭───┬───╮ │
│ │ │ │ 0 │ 1 │ │
│ │ │ │ 1 │ 1 │ │
│ │ │ │ 2 │ 1 │ │
│ │ │ │ 3 │ 1 │ │
│ │ │ ╰───┴───╯ │
│ 1 │ true │ ╭───┬───╮ │
│ │ │ │ 0 │ 3 │ │
│ │ │ │ 1 │ 3 │ │
│ │ │ │ 2 │ 2 │ │
│ │ │ ╰───┴───╯ │
╰─#─┴─group0─┴───items───╯
```
```nushell
let data = [
[name, lang, year];
[andres, rb, "2019"],
[jt, rs, "2019"],
[storm, rs, "2021"]
]
> $data
╭─#─┬──name──┬─lang─┬─year─╮
│ 0 │ andres │ rb │ 2019 │
│ 1 │ jt │ rs │ 2019 │
│ 2 │ storm │ rs │ 2021 │
╰─#─┴──name──┴─lang─┴─year─╯
```
```nushell
> $data | group-by lang
╭────┬──────────────────────────────╮
│ │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│ rb │ │ 0 │ andres │ rb │ 2019 │ │
│ │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│ │ ╭─#─┬─name──┬─lang─┬─year─╮ │
│ rs │ │ 0 │ jt │ rs │ 2019 │ │
│ │ │ 1 │ storm │ rs │ 2021 │ │
│ │ ╰─#─┴─name──┴─lang─┴─year─╯ │
╰────┴──────────────────────────────╯
```
Group column is now named after the grouper, to allow multiple groupers.
```nushell
> $data | group-by lang --to-table # column names changed!
╭─#─┬─lang─┬────────────items─────────────╮
│ 0 │ rb │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│ │ │ │ 0 │ andres │ rb │ 2019 │ │
│ │ │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│ 1 │ rs │ ╭─#─┬─name──┬─lang─┬─year─╮ │
│ │ │ │ 0 │ jt │ rs │ 2019 │ │
│ │ │ │ 1 │ storm │ rs │ 2021 │ │
│ │ │ ╰─#─┴─name──┴─lang─┴─year─╯ │
╰─#─┴─lang─┴────────────items─────────────╯
```
Grouping by multiple columns makes finer grained aggregations possible.
```nushell
> $data | group-by lang year --to-table
╭─#─┬─lang─┬─year─┬────────────items─────────────╮
│ 0 │ rb │ 2019 │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│ │ │ │ │ 0 │ andres │ rb │ 2019 │ │
│ │ │ │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│ 1 │ rs │ 2019 │ ╭─#─┬─name─┬─lang─┬─year─╮ │
│ │ │ │ │ 0 │ jt │ rs │ 2019 │ │
│ │ │ │ ╰─#─┴─name─┴─lang─┴─year─╯ │
│ 2 │ rs │ 2021 │ ╭─#─┬─name──┬─lang─┬─year─╮ │
│ │ │ │ │ 0 │ storm │ rs │ 2021 │ │
│ │ │ │ ╰─#─┴─name──┴─lang─┴─year─╯ │
╰─#─┴─lang─┴─year─┴────────────items─────────────╯
```
Grouping by multiple columns, without `--to-table` returns a nested
structure.
This is equivalent to `$data | group-by year | split-by lang`, making
`split-by` obsolete.
```nushell
> $data | group-by lang year
╭────┬─────────────────────────────────────────╮
│ │ ╭──────┬──────────────────────────────╮ │
│ rb │ │ │ ╭─#─┬──name──┬─lang─┬─year─╮ │ │
│ │ │ 2019 │ │ 0 │ andres │ rb │ 2019 │ │ │
│ │ │ │ ╰─#─┴──name──┴─lang─┴─year─╯ │ │
│ │ ╰──────┴──────────────────────────────╯ │
│ │ ╭──────┬─────────────────────────────╮ │
│ rs │ │ │ ╭─#─┬─name─┬─lang─┬─year─╮ │ │
│ │ │ 2019 │ │ 0 │ jt │ rs │ 2019 │ │ │
│ │ │ │ ╰─#─┴─name─┴─lang─┴─year─╯ │ │
│ │ │ │ ╭─#─┬─name──┬─lang─┬─year─╮ │ │
│ │ │ 2021 │ │ 0 │ storm │ rs │ 2021 │ │ │
│ │ │ │ ╰─#─┴─name──┴─lang─┴─year─╯ │ │
│ │ ╰──────┴─────────────────────────────╯ │
╰────┴─────────────────────────────────────────╯
```
From #2607:
> Here's a couple more examples without much explanation. This one shows
adding two grouping keys. I'm always wanting to add more columns when
using group-by and it just-work:tm: `gb.exe -f movies-2.csv -k 3,2 -s 7
--skip_header`
>
> ```
> k:3 | k:2 | count | sum:7
> -----------------------+-----------+-------+--------------------
> 20th Century Fox | Drama | 1 | 117.09
> 20th Century Fox | Romance | 1 | 39.66
> CBS | Comedy | 1 | 77.09
> Disney | Animation | 4 | 1264.23
> Disney | Comedy | 4 | 950.27
> Fox | Comedy | 5 | 661.85
> Independent | Comedy | 7 | 399.07
> Independent | Drama | 4 | 69.75
> Independent | Romance | 7 | 1048.75
> Independent | romance | 1 | 29.37
> ...
> ```
This example can be achieved like this:
```nushell
> open movies-2.csv
| group-by "Lead Studio" Genre --to-table
| insert count {get items | length}
| insert sum { get items."Worldwide Gross" | math sum}
| reject items
| sort-by "Lead Studio" Genre
╭─#──┬──────Lead Studio──────┬───Genre───┬─count─┬───sum───╮
│ 0 │ 20th Century Fox │ Drama │ 1 │ 117.09 │
│ 1 │ 20th Century Fox │ Romance │ 1 │ 39.66 │
│ 2 │ CBS │ Comedy │ 1 │ 77.09 │
│ 3 │ Disney │ Animation │ 4 │ 1264.23 │
│ 4 │ Disney │ Comedy │ 4 │ 950.27 │
│ 5 │ Fox │ Comedy │ 5 │ 661.85 │
│ 6 │ Fox │ comedy │ 1 │ 60.72 │
│ 7 │ Independent │ Comedy │ 7 │ 399.07 │
│ 8 │ Independent │ Drama │ 4 │ 69.75 │
│ 9 │ Independent │ Romance │ 7 │ 1048.75 │
│ 10 │ Independent │ romance │ 1 │ 29.37 │
...
```
2024-11-15 12:40:49 +00:00
| if $x.annotation in ["test", "test-skip"] {
2023-07-02 08:41:33 +00:00
$in
} else {
get 0
}
}
2023-10-08 11:12:46 +00:00
| transpose --ignore-titles -r -d
2023-07-02 08:41:33 +00:00
)
$template_record
| merge $test_record
}
2023-06-10 18:16:17 +00:00
def throw-error [error: record] {
error make {
msg: $"(ansi red)($error.msg)(ansi reset)"
label: {
text: ($error.label)
2023-11-03 15:09:33 +00:00
span: $error.span
2023-06-10 18:16:17 +00:00
}
}
}
# show a test record in a pretty way
#
# `$in` must be a `record<file: string, module: string, name: string, pass: bool>`.
#
# the output would be like
# - "<indentation> x <module> <test>" all in red if failed
# - "<indentation> s <module> <test>" all in yellow if skipped
# - "<indentation> <module> <test>" all in green if passed
def show-pretty-test [indent: int = 4] {
let test = $in
[
2023-09-14 18:11:08 +00:00
(1..$indent | each {" "} | str join)
2023-06-10 18:16:17 +00:00
(match $test.result {
"pass" => { ansi green },
"skip" => { ansi yellow },
_ => { ansi red }
})
(match $test.result {
"pass" => " ",
"skip" => "s",
_ => { char failed }
})
" "
$"($test.name) ($test.test)"
(ansi reset)
] | str join
}
2023-07-02 08:41:33 +00:00
# Takes a test record and returns the execution result
# Test is executed via following steps:
# * Public function with random name is generated that runs specified test in try/catch block
# * Module file is opened
# * Random public function is appended to the end of the file
# * Modified file is saved under random name
# * Nu subprocess is spawned
# * Inside subprocess the modified file is imported and random function called
# * Output of the random function is serialized into nuon and returned to parent process
# * Modified file is removed
2023-06-10 18:16:17 +00:00
def run-test [
test: record
] {
2023-10-19 20:04:33 +00:00
let test_file_name = (random chars --length 10)
let test_function_name = (random chars --length 10)
2023-06-10 18:16:17 +00:00
let rendered_module_path = ({parent: ($test.file|path dirname), stem: $test_file_name, extension: nu}| path join)
let test_function = $"
export def ($test_function_name) [] {
($test.before-each)
try {
$context | ($test.test)
($test.after-each)
} catch { |err|
($test.after-each)
2023-07-02 08:41:33 +00:00
$err | get raw
2023-06-10 18:16:17 +00:00
}
}
"
open $test.file
| lines
| append ($test_function)
| str join (char nl)
| save $rendered_module_path
let result = (
2023-07-07 07:20:36 +00:00
^$nu.current-exe --no-config-file -c $"use ($rendered_module_path) *; ($test_function_name)|to nuon"
2023-06-10 18:16:17 +00:00
| complete
)
rm $rendered_module_path
return $result
}
2023-07-02 08:41:33 +00:00
# Takes a module record and returns a table with following columns:
#
# * file - path to file under test
# * name - name of the module under test
# * test - name of specific test
# * result - test execution result
2023-06-10 18:16:17 +00:00
def run-tests-for-module [
2023-07-02 08:41:33 +00:00
module: record<file: path name: string before-each: string after-each: string before-all: string after-all: string test: list test-skip: list>
2023-07-08 09:27:56 +00:00
threads: int
Fix silent failure of parsing input output types (#14510)
- This PR should fix/close:
- #11266
- #12893
- #13736
- #13748
- #14170
- It doesn't fix #13736 though unfortunately. The issue there is at a
different level to this fix (I think probably in the lexing somewhere,
which I haven't touched).
# The Problem
The linked issues have many examples of the problem and the related
confusion it causes, but I'll give some more examples here for
illustration. It boils down to the following:
This doesn't type check (good):
```nu
def foo []: string -> int { false }
```
This does (bad):
```nu
def foo [] : string -> int { false }
```
Because the parser is completely ignoring all the characters. This also
compiles in 0.100.0:
```nu
def blue [] Da ba Dee da Ba da { false }
```
And this also means commands which have a completely fine type, but an
extra space before `:`, lose that type information and end up as `any ->
any`, e.g.
```nu
def foo [] : int -> int {$in + 3}
```
```bash
$ foo --help
Input/output types:
╭───┬───────┬────────╮
│ # │ input │ output │
├───┼───────┼────────┤
│ 0 │ any │ any │
╰───┴───────┴────────╯
```
# The Fix
Special thank you to @texastoland whose draft PR (#12358) I referenced
heavily while making this fix.
That PR seeks to fix the invalid parsing by disallowing whitespace
between `[]` and `:` in declarations, e.g. `def foo [] : int -> any {}`
This PR instead allows the whitespace while properly parsing the type
signature. I think this is the better choice for a few reasons:
- The parsing is still straightforward and the information is all there
anyway,
- It's more consistent with type annotations in other places, e.g. `do
{|nums : list<int>| $nums | describe} [ 1 2 3 ]` from the [Type
Signatures doc
page](https://www.nushell.sh/lang-guide/chapters/types/type_signatures.html)
- It's more consistent with the new nu parser, which allows `let x :
bool = false` (current nu doesn't, but this PR doesn't change that)
- It will be less disruptive and should only break code where the types
are actually wrong (if your types were correct, but you had a space
before the `:`, those declarations will still compile and now have more
type information vs. throwing an error in all cases and requiring spaces
to be deleted)
- It's the more intuitive syntax for most functional programmers like
myself (haskell/lean/coq/agda and many more either allow or require
whitespace for type annotations)
I don't use Rust a lot, so I tried to keep most things the same and the
rest I wrote as if it was Haskell (if you squint a bit). Code
review/suggestions very welcome. I added all the tests I could think of
and `toolkit check pr` gives it the all-clear.
# User-Facing Changes
This PR meets part of the goal of #13849, but doesn't do anything about
parsing signatures twice and doesn't do much to improve error messages,
it just enforces the existing errors and error messages.
This will no doubt be a breaking change, mostly because the code is
already broken and users don't realise yet (one of my personal scripts
stopped compiling after this fix because I thought `def foo [] -> string
{}` was valid syntax). It shouldn't break any type-correct code though.
2024-12-07 15:55:15 +00:00
]: nothing -> table<file: path, name: string, test: string, result: string> {
2023-07-02 08:41:33 +00:00
let global_context = if not ($module.before-all|is-empty) {
2023-06-10 18:16:17 +00:00
log info $"Running before-all for module ($module.name)"
run-test {
file: $module.file,
before-each: 'let context = {}',
after-each: '',
2023-07-02 08:41:33 +00:00
test: $module.before-all
2023-06-10 18:16:17 +00:00
}
| if $in.exit_code == 0 {
$in.stdout
} else {
throw-error {
msg: "Before-all failed"
label: "Failure in test setup"
span: (metadata $in | get span)
}
}
} else {
{}
}
2023-07-02 08:41:33 +00:00
# since tests are skipped based on their annotation and never actually executed we can generate their list in advance
let skipped_tests = (
if not ($module.test-skip|is-empty) {
$module
| update test $module.test-skip
| reject test-skip
| flatten
| insert result 'skip'
} else {
[]
}
)
2023-06-10 18:16:17 +00:00
let tests = (
$module
2023-07-02 08:41:33 +00:00
| reject test-skip
| flatten test
2023-06-10 18:16:17 +00:00
| update before-each {|x|
2023-07-02 08:41:33 +00:00
if not ($module.before-each|is-empty) {
$"let context = \(($global_context)|merge \(($module.before-each)\)\)"
2023-06-10 18:16:17 +00:00
} else {
$"let context = ($global_context)"
}
}
| update after-each {|x|
2023-07-02 08:41:33 +00:00
if not ($module.after-each|is-empty) {
$"$context | ($module.after-each)"
2023-06-10 18:16:17 +00:00
} else {
''
}
}
2023-07-08 09:27:56 +00:00
| par-each --threads $threads {|test|
2023-07-02 08:41:33 +00:00
log info $"Running ($test.test) in module ($module.name)"
log debug $"Global context is ($global_context)"
2023-06-10 18:16:17 +00:00
$test|insert result {|x|
run-test $test
2023-07-02 08:41:33 +00:00
| if $in.exit_code == 0 {
'pass'
} else {
'fail'
2023-06-10 18:16:17 +00:00
}
}
}
2023-07-02 08:41:33 +00:00
| append $skipped_tests
| select file name test result
2023-06-10 18:16:17 +00:00
)
2023-07-02 08:41:33 +00:00
if not ($module.after-all|is-empty) {
2023-06-10 18:16:17 +00:00
log info $"Running after-all for module ($module.name)"
run-test {
file: $module.file,
before-each: $"let context = ($global_context)",
after-each: '',
2023-07-02 08:41:33 +00:00
test: $module.after-all
2023-06-10 18:16:17 +00:00
}
}
return $tests
}
2023-07-02 08:41:33 +00:00
# Run tests for nushell code
#
# By default all detected tests are executed
# Test list can be filtered out by specifying either path to search for, name of the module to run tests for or specific test name
# In order for a function to be recognized as a test by the test runner it needs to be annotated with # test
# Following annotations are supported by the test runner:
# * test - test case to be executed during test run
# * test-skip - test case to be skipped during test run
# * before-all - function to run at the beginning of test run. Returns a global context record that is piped into every test function
# * before-each - function to run before every test case. Returns a per-test context record that is merged with global context and piped into test functions
# * after-each - function to run after every test case. Receives the context record just like the test cases
# * after-all - function to run after all test cases have been executed. Receives the global context record
2023-06-10 18:16:17 +00:00
export def run-tests [
2023-07-08 09:27:56 +00:00
--path: path, # Path to look for tests. Default: current directory.
--module: string, # Test module to run. Default: all test modules found.
--test: string, # Pattern to use to include tests. Default: all tests found in the files.
--exclude: string, # Pattern to use to exclude tests. Default: no tests are excluded
--exclude-module: string, # Pattern to use to exclude test modules. Default: No modules are excluded
--list, # list the selected tests without running them.
--threads: int@"nu-complete threads", # Amount of threads to use for parallel execution. Default: All threads are utilized
2023-06-10 18:16:17 +00:00
] {
2024-05-13 13:45:44 +00:00
let available_threads = (sys cpu | length)
2023-07-08 09:27:56 +00:00
# Can't use pattern matching here due to https://github.com/nushell/nushell/issues/9198
let threads = (if $threads == null {
$available_threads
} else if $threads < 1 {
1
} else if $threads <= $available_threads {
$threads
} else {
$available_threads
})
2023-06-10 18:16:17 +00:00
let module_search_pattern = ('**' | path join ({
stem: ($module | default "*")
extension: nu
} | path join))
let path = if $path == null {
$env.PWD
} else {
if not ($path | path exists) {
throw-error {
msg: "directory_not_found"
label: "no such directory"
span: (metadata $path | get span)
}
}
$path
}
if not ($module | is-empty) {
try { ls ($path | path join $module_search_pattern) | null } catch {
throw-error {
msg: "module_not_found"
label: $"no such module in ($path)"
span: (metadata $module | get span)
}
}
}
let modules = (
`open`, `rm`, `umv`, `cp`, `rm` and `du`: Don't globs if inputs are variables or string interpolation (#11886)
# Description
This is a follow up to
https://github.com/nushell/nushell/pull/11621#issuecomment-1937484322
Also Fixes: #11838
## About the code change
It applys the same logic when we pass variables to external commands:
https://github.com/nushell/nushell/blob/0487e9ffcbc57c2d5feca606e10c3f8221ff5e00/crates/nu-command/src/system/run_external.rs#L162-L170
That is: if user input dynamic things(like variables, sub-expression, or
string interpolation), it returns a quoted `NuPath`, then user input
won't be globbed
# User-Facing Changes
Given two input files: `a*c.txt`, `abc.txt`
* `let f = "a*c.txt"; rm $f` will remove one file: `a*c.txt`.
~* `let f = "a*c.txt"; rm --glob $f` will remove `a*c.txt` and
`abc.txt`~
* `let f: glob = "a*c.txt"; rm $f` will remove `a*c.txt` and `abc.txt`
## Rules about globbing with *variable*
Given two files: `a*c.txt`, `abc.txt`
| Cmd Type | example | Result |
| ----- | ------------------ | ------ |
| builtin | let f = "a*c.txt"; rm $f | remove `a*c.txt` |
| builtin | let f: glob = "a*c.txt"; rm $f | remove `a*c.txt` and
`abc.txt`
| builtin | let f = "a*c.txt"; rm ($f \| into glob) | remove `a*c.txt`
and `abc.txt`
| custom | def crm [f: glob] { rm $f }; let f = "a*c.txt"; crm $f |
remove `a*c.txt` and `abc.txt`
| custom | def crm [f: glob] { rm ($f \| into string) }; let f =
"a*c.txt"; crm $f | remove `a*c.txt`
| custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm $f |
remove `a*c.txt`
| custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm ($f \|
into glob) | remove `a*c.txt` and `abc.txt`
In general, if a variable is annotated with `glob` type, nushell will
expand glob pattern. Or else, we need to use `into | glob` to expand
glob pattern
# Tests + Formatting
Done
# After Submitting
I think `str glob-escape` command will be no-longer required. We can
remove it.
2024-02-23 01:17:09 +00:00
ls ($path | path join $module_search_pattern | into glob)
2023-10-01 13:37:51 +00:00
| par-each --threads $threads {|row|
{
file: $row.name
name: ($row.name | path parse | get stem)
commands: (get-annotated $row.name)
}
2023-07-02 08:41:33 +00:00
}
2023-10-01 13:37:51 +00:00
| filter {|x| ($x.commands|length) > 0}
2023-06-10 18:16:17 +00:00
| upsert commands {|module|
2023-07-02 08:41:33 +00:00
$module.commands
| create-test-record
2023-06-10 18:16:17 +00:00
}
2023-07-02 08:41:33 +00:00
| flatten
| filter {|x| ($x.test|length) > 0}
2023-07-07 07:20:36 +00:00
| filter {|x| if ($exclude_module|is-empty) {true} else {$x.name !~ $exclude_module}}
| filter {|x| if ($test|is-empty) {true} else {$x.test|any {|y| $y =~ $test}}}
2023-06-10 18:16:17 +00:00
| filter {|x| if ($module|is-empty) {true} else {$module == $x.name}}
2023-07-02 08:41:33 +00:00
| update test {|x|
$x.test
2023-07-07 07:20:36 +00:00
| filter {|y| if ($test|is-empty) {true} else {$y =~ $test}}
| filter {|y| if ($exclude|is-empty) {true} else {$y !~ $exclude}}
2023-06-10 18:16:17 +00:00
}
)
if $list {
return $modules
}
if ($modules | is-empty) {
error make --unspanned {msg: "no test to run"}
}
let results = (
$modules
2023-07-08 09:27:56 +00:00
| par-each --threads $threads {|module|
run-tests-for-module $module $threads
2023-06-10 18:16:17 +00:00
}
| flatten
)
2023-07-02 08:41:33 +00:00
if ($results | any {|x| $x.result == fail}) {
2023-06-10 18:16:17 +00:00
let text = ([
$"(ansi purple)some tests did not pass (char lparen)see complete errors below(char rparen):(ansi reset)"
""
2023-07-08 09:27:56 +00:00
($results | par-each --threads $threads {|test| ($test | show-pretty-test 4)} | str join "\n")
2023-06-10 18:16:17 +00:00
""
] | str join "\n")
error make --unspanned { msg: $text }
}
}