nushell/crates/nu-std/tests/test_iter.nu
132ikl 214714e0ab
Add run-time type checking for command pipeline input (#14741)
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->

This PR adds type checking of all command input types at run-time.
Generally, these errors should be caught by the parser, but sometimes we
can't know the type of a value at parse-time. The simplest example is
using the `echo` command, which has an output type of `any`, so
prefixing a literal with `echo` will bypass parse-time type checking.

Before this PR, each command has to individually check its input types.
This can result in scenarios where the input/output types don't match
the actual command behavior. This can cause valid usage with an
non-`any` type to become a parse-time error if a command is missing that
type in its pipeline input/output (`drop nth` and `history import` do
this before this PR). Alternatively, a command may not list a type in
its input/output types, but doesn't actually reject that type in its
code, which can have unintended side effects (`get` does this on an
empty pipeline input, and `sort` used to before #13154).

After this PR, the type of the pipeline input is checked to ensure it
matches one of the input types listed in the proceeding command's
input/output types. While each of the issues in the "before this PR"
section could be addressed with each command individually, this PR
solves this issue for _all_ commands.

**This will likely cause some breakage**, as some commands have
incorrect input/output types, and should be adjusted. Also, some scripts
may have erroneous usage of commands. In writing this PR, I discovered
that `toolkit.nu` was passing `null` values to `str join`, which doesn't
accept nothing types (if folks think it should, we can adjust it in this
PR or in a different PR). I found some issues in the standard library
and its tests. I also found that carapace's vendor script had an
incorrect chaining of `get -i`:

```nushell
let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion)
```

Before this PR, if the `get -i 0` ever actually did evaluate to `null`,
the second `get` invocation would error since `get` doesn't operate on
`null` values. After this PR, this is immediately a run-time error,
alerting the user to the problematic code. As a side note, we'll need to
PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to
carapace.

A notable exception to the type checking is commands with input type of
`nothing -> <type>`. In this case, any input type is allowed. This
allows piping values into the command without an error being thrown. For
example, `123 | echo $in` would be an error without this exception.
Additionally, custom types bypass type checking (I believe this also
happens during parsing, but not certain)

I added a `is_subtype` method to `Value` and `PipelineData`. It
functions slightly differently than `get_type().is_subtype()`, as noted
in the doccomments. Notably, it respects structural typing of lists and
tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]`
is a subtype of `table<a: int>`, whereas the type returned by
`Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some
special handling for `ListStream`s and `ByteStream`s. The latter was
needed for this PR to work properly with external commands.

Here's some examples.

Before:
```nu
1..2 | drop nth 1
Error: nu::parser::input_type_mismatch

  × Command does not support range input.
   ╭─[entry #9:1:8]
 1 │ 1..2 | drop nth 1
   ·        ────┬───
   ·            ╰── command doesn't support range input
   ╰────

echo 1..2 | drop nth 1
# => ╭───┬───╮
# => │ 0 │ 1 │
# => ╰───┴───╯
```

After this PR, I've adjusted `drop nth`'s input/output types to accept
range input.

Before this PR, zip accepted any value despite not being listed in its
input/output types. This caused different behavior depending on if you
triggered a parse error or not:
```nushell
1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support int input.
# =>    ╭─[entry #3:1:5]
# =>  1 │ 1 | zip [2]
# =>    ·     ─┬─
# =>    ·      ╰── command doesn't support int input
# =>    ╰────
echo 1 | zip [2]
# => ╭───┬───────────╮
# => │ 0 │ ╭───┬───╮ │
# => │   │ │ 0 │ 1 │ │
# => │   │ │ 1 │ 2 │ │
# => │   │ ╰───┴───╯ │
# => ╰───┴───────────╯
```

After this PR, it works the same in both cases. For cases like this, if
we do decide we want `zip` or other commands to accept any input value,
then we should explicitly add that to the input types.
```nushell
1 | zip [2]
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support int input.
# =>    ╭─[entry #3:1:5]
# =>  1 │ 1 | zip [2]
# =>    ·     ─┬─
# =>    ·      ╰── command doesn't support int input
# =>    ╰────
echo 1 | zip [2]
# => Error: nu:🐚:only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #14:2:6]
# =>  2 │ echo 1 | zip [2]
# =>    ·      ┬   ─┬─
# =>    ·      │    ╰── only list<any> and range input data is supported
# =>    ·      ╰── input type: int
# =>    ╰────
```

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

**Breaking change**: The type of a command's input is now checked
against the input/output types of that command at run-time. While these
errors should mostly be caught at parse-time, in cases where they can't
be detected at parse-time they will be caught at run-time instead. This
applies to both internal commands and custom commands.

Example function and corresponding parse-time error (same before and
after PR):
```nushell
def foo []: int -> nothing {
  print $"my cool int is ($in)"
}

1 | foo
# => my cool int is 1

"evil string" | foo
# => Error: nu::parser::input_type_mismatch
# => 
# =>   × Command does not support string input.
# =>    ╭─[entry #16:1:17]
# =>  1 │ "evil string" | foo
# =>    ·                 ─┬─
# =>    ·                  ╰── command doesn't support string input
# =>    ╰────
# => 
```

Before:
```nu
echo "evil string" | foo
# => my cool int is evil string
```

After:
```nu
echo "evil string" | foo
# => Error: nu:🐚:only_supports_this_input_type
# => 
# =>   × Input type not supported.
# =>    ╭─[entry #17:1:6]
# =>  1 │ echo "evil string" | foo
# =>    ·      ──────┬──────   ─┬─
# =>    ·            │          ╰── only int input data is supported
# =>    ·            ╰── input type: string
# =>    ╰────
```

Known affected internal commands which erroneously accepted any type:
* `str join`
* `zip`
* `reduce`

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`


# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
* Play whack-a-mole with the commands and scripts this will inevitably
break
2025-01-08 23:09:47 +01:00

138 lines
3.3 KiB
Text

use std *
use std/assert
#[test]
def iter_find [] {
let hastack1 = [1 2 3 4 5 6 7]
let hastack2 = [nushell rust shell iter std]
let hastack3 = [nu 69 2023-04-20 "std"]
let res = ($hastack1 | iter find {|it| $it mod 2 == 0})
assert equal $res 2
let res = ($hastack2 | iter find {|it| $it starts-with 's'})
assert equal $res 'shell'
let res = ($hastack2 | iter find {|it| ($it | length) == 50})
assert equal $res null
let res = ($hastack3 | iter find {|it| (it | describe) == filesize})
assert equal $res null
}
#[test]
def iter_intersperse [] {
let res = ([1 2 3 4] | iter intersperse 0)
assert equal $res [1 0 2 0 3 0 4]
let res = ([] | iter intersperse x)
assert equal $res []
let res = ([1] | iter intersperse 5)
assert equal $res [1]
let res = ([a b c d e] | iter intersperse 5)
assert equal $res [a 5 b 5 c 5 d 5 e]
let res = (1..4 | iter intersperse 0)
assert equal $res [1 0 2 0 3 0 4]
let res = ([4] | iter intersperse 1)
assert equal $res [4]
}
#[test]
def iter_scan [] {
let scanned = ([1 2 3] | iter scan 0 {|x, y| $x + $y} -n)
assert equal $scanned [1, 3, 6]
let scanned = ([1 2 3] | iter scan 0 {|x, y| $x + $y})
assert equal $scanned [0, 1, 3, 6]
let scanned = ([a b c d] | iter scan "" {|it, acc| [$acc, $it] | str join} -n)
assert equal $scanned ["a" "ab" "abc" "abcd"]
let scanned = ([a b c d] | iter scan "" {|it, acc| append $it | str join} -n)
assert equal $scanned ["a" "ab" "abc" "abcd"]
}
#[test]
def iter_filter_map [] {
let res = ([2 5 "4" 7] | iter filter-map {|it| $it ** 2})
assert equal $res [4 25 49]
let res = (
["3" "42" "69" "n" "x" ""]
| iter filter-map {|it| $it | into int}
)
assert equal $res [3 42 69]
}
#[test]
def iter_find_index [] {
let res = (
["iter", "abc", "shell", "around", "nushell", "std"]
| iter find-index {|x| $x starts-with 's'}
)
assert equal $res 2
let is_even = {|x| $x mod 2 == 0}
let res = ([3 5 13 91] | iter find-index $is_even)
assert equal $res (-1)
let res = (42 | iter find-index {|x| $x == 42})
assert equal $res 0
}
#[test]
def iter_zip_with [] {
let res = (
[1 2 3] | iter zip-with [2 3 4] {|a, b| $a + $b }
)
assert equal $res [3 5 7]
let res = ([42] | iter zip-with [1 2 3] {|a, b| $a // $b})
assert equal $res [42]
let res = (2..5 | iter zip-with 4 {|a, b| $a * $b})
assert equal $res [8]
let res = (
[[name repo]; [rust github] [haskell gitlab]]
| iter zip-with 1.. {|data, num|
{ name: $data.name, repo: $data.repo position: $num }
}
)
assert equal $res [
[name repo position];
[rust github 1]
[haskell gitlab 2]
]
}
#[test]
def iter_flat_map [] {
let res = (
[[1 2 3] [2 3 4] [5 6 7]] | iter flat-map {|it| $it | math sum}
)
assert equal $res [6 9 18]
let res = ([1 2 3] | iter flat-map {|it| $it + ($it * 10)})
assert equal $res [11 22 33]
}
#[test]
def iter_zip_into_record [] {
let headers = [name repo position]
let values = [rust github 1]
let res = (
$headers | iter zip-into-record $values
)
assert equal $res [
[name repo position];
[rust github 1]
]
}