2022-11-09 22:16:51 +00:00
|
|
|
use nu_test_support::{nu, pipeline};
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn by_invalid_types() {
|
|
|
|
let actual = nu!(
|
|
|
|
cwd: "tests/fixtures/formats", pipeline(
|
|
|
|
r#"
|
|
|
|
open cargo_sample.toml --raw
|
|
|
|
| echo ["foo" 1]
|
|
|
|
| sort
|
|
|
|
| to json -r
|
|
|
|
"#
|
|
|
|
));
|
|
|
|
|
|
|
|
let json_output = r#"[1,"foo"]"#;
|
|
|
|
assert_eq!(actual.out, json_output);
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_primitive_values() {
|
|
|
|
let actual = nu!(
|
|
|
|
cwd: "tests/fixtures/formats", pipeline(
|
2023-07-21 15:32:37 +00:00
|
|
|
"
|
2022-11-09 22:16:51 +00:00
|
|
|
open cargo_sample.toml --raw
|
|
|
|
| lines
|
|
|
|
| skip 1
|
|
|
|
| first 6
|
|
|
|
| sort
|
|
|
|
| first
|
2023-07-21 15:32:37 +00:00
|
|
|
"
|
2022-11-09 22:16:51 +00:00
|
|
|
));
|
|
|
|
|
|
|
|
assert_eq!(actual.out, "authors = [\"The Nushell Project Developers\"]");
|
|
|
|
}
|
|
|
|
|
Rework sorting and add cell path and closure comparators to `sort-by` (#13154)
# Description
Closes #12535
Implements sort-by functionality of #8322
Fixes sort-by part of #8667
This PR does two main things: add a new cell path and closure parameter
to `sort-by`, and attempt to make Nushell's sorting behavior
well-defined.
## `sort-by` features
The `columns` parameter is replaced with a `comparator` parameter, which
can be a cell path or a closure. Examples are from docs PR.
1. Cell paths
The basic interactive usage of `sort-by` is the same. For example, `ls |
sort-by modified` still works the same as before. It is not quite a
drop-in replacement, see [behavior changes](#behavior-changes).
Here's an example of how the cell path comparator might be useful:
```nu
> let cities = [
{name: 'New York', info: { established: 1624, population: 18_819_000 } }
{name: 'Kyoto', info: { established: 794, population: 37_468_000 } }
{name: 'São Paulo', info: { established: 1554, population: 21_650_000 }
}
]
> $cities | sort-by info.established
╭───┬───────────┬────────────────────────────╮
│ # │ name │ info │
├───┼───────────┼────────────────────────────┤
│ 0 │ Kyoto │ ╭─────────────┬──────────╮ │
│ │ │ │ established │ 794 │ │
│ │ │ │ population │ 37468000 │ │
│ │ │ ╰─────────────┴──────────╯ │
│ 1 │ São Paulo │ ╭─────────────┬──────────╮ │
│ │ │ │ established │ 1554 │ │
│ │ │ │ population │ 21650000 │ │
│ │ │ ╰─────────────┴──────────╯ │
│ 2 │ New York │ ╭─────────────┬──────────╮ │
│ │ │ │ established │ 1624 │ │
│ │ │ │ population │ 18819000 │ │
│ │ │ ╰─────────────┴──────────╯ │
╰───┴───────────┴────────────────────────────╯
```
2. Key closures
You can supply a closure which will transform each value into a sorting
key (without changing the underlying data). Here's an example of a key
closure, where we want to sort a list of assignments by their average
grade:
```nu
> let assignments = [
{name: 'Homework 1', grades: [97 89 86 92 89] }
{name: 'Homework 2', grades: [91 100 60 82 91] }
{name: 'Exam 1', grades: [78 88 78 53 90] }
{name: 'Project', grades: [92 81 82 84 83] }
]
> $assignments | sort-by { get grades | math avg }
╭───┬────────────┬───────────────────────╮
│ # │ name │ grades │
├───┼────────────┼───────────────────────┤
│ 0 │ Exam 1 │ [78, 88, 78, 53, 90] │
│ 1 │ Project │ [92, 81, 82, 84, 83] │
│ 2 │ Homework 2 │ [91, 100, 60, 82, 91] │
│ 3 │ Homework 1 │ [97, 89, 86, 92, 89] │
╰───┴────────────┴───────────────────────╯
```
3. Custom sort closure
The `--custom`, or `-c`, flag will tell `sort-by` to interpret closures
as custom sort closures. A custom sort closure has two parameters, and
returns a boolean. The closure should return `true` if the first
parameter comes _before_ the second parameter in the sort order.
For a simple example, we could rewrite a cell path sort as a custom sort
(see
[here](https://github.com/nushell/nushell.github.io/pull/1568/files#diff-a7a233e66a361d8665caf3887eb71d4288000001f401670c72b95cc23a948e86R231)
for a more complex example):
```nu
> ls | sort-by -c {|a, b| $a.size < $b.size }
╭───┬─────────────────────┬──────┬──────────┬────────────────╮
│ # │ name │ type │ size │ modified │
├───┼─────────────────────┼──────┼──────────┼────────────────┤
│ 0 │ my-secret-plans.txt │ file │ 100 B │ 10 minutes ago │
│ 1 │ shopping_list.txt │ file │ 100 B │ 2 months ago │
│ 2 │ myscript.nu │ file │ 1.1 KiB │ 2 weeks ago │
│ 3 │ bigfile.img │ file │ 10.0 MiB │ 3 weeks ago │
╰───┴─────────────────────┴──────┴──────────┴────────────────╯
```
## Making sort more consistent
I think it's important for something as essential as `sort` to have
well-defined semantics. This PR contains some changes to try to make the
behavior of `sort` and `sort-by` consistent. In addition, after working
with the internals of sorting code, I have a much deeper understanding
of all of the edge cases. Here is my attempt to try to better define
some of the semantics of sorting (if you are just interested in changes,
skip to "User-Facing changes")
- `sort`, `sort -v`, and `sort-by` now all work the same. Each
individual sort implementation has been refactored into two functions in
`sort_utils.rs`: `sort`, and `sort_by`. These can also be used in other
parts of Nushell where values need to be sorted.
- `sort` and `sort-by` used to handle `-i` and `-n` differently.
- `sort -n` would consider all values which can't be coerced into a
string to be equal
- `sort-by -i` and `sort-by -n` would only work if all values were
strings
- In this PR, insensitive sort only affects comparison between strings,
and natural sort only applies to numbers and strings (see below).
- (not a change) Before and after this PR, `sort` and `sort-by` support
sorting mixed types. There was a lot of discussion about potentially
making `sort` and `sort-by` only work on lists of homogeneous types, but
the general consensus was that `sort` should not error just because its
input contains incompatible types.
- In order to try to make working with data containing `null` values
easier, I changed the PartialOrd order to sort `Nothing` values to the
end of a list, regardless of what other types the list contains. Before,
`null` would be sorted before `Binary`, `CellPath`, and `Custom` values.
- (not a change) When sorted, lists of mixed types will contain sorted
values of each type in order, for the most part
- (not a change) For example, `[0x[1] (date now) "a" ("yesterday" | into
datetime) "b" 0x[0]]` will be sorted as `["a", "b", a day ago, now, [0],
[1]]`, where sorted strings appear first, then sorted datetimes, etc.
- (not a change) The exception to this is `Int`s and `Float`s, which
will intermix, `Strings` and `Glob`s, which will intermix, and `None` as
described above. Additionally, natural sort will intermix strings with
ints and floats (see below).
- Natural sort no longer coerce all inputs to strings.
- I did originally make natural only apply to strings, but @fdncred
pointed out that the previous behavior also allowed you to sort numeric
strings with numbers. This seems like a useful feature if we are trying
to support sorting with mixed types, so I settled on coercing only
numbers (int, float). This can be reverted if people don't like it.
- Here is an example of this behavior in action, which is the same
before and after this PR:
```nushell
$ [1 "4" 3 "2"] | sort --natural
╭───┬───╮
│ 0 │ 1 │
│ 1 │ 2 │
│ 2 │ 3 │
│ 3 │ 4 │
╰───┴───╯
```
# User-Facing Changes
## New features
- Replaces the `columns` string parameter of `sort-by` with a cell path
or a closure.
- The cell path parameter works exactly as you would expect
- By default, the `closure` parameter acts as a "key sort"; that is,
each element is transformed by the closure into a sorting key
- With the `--custom` (`-c`) parameter, you can define a comparison
function for completely custom sorting order.
## Behavior changes
<details>
<summary><code>sort -v</code> does not coerce record values to
strings</summary>
This was a bit of a surprising behavior, and is now unified with the
behavior of `sort` and `sort-by`. Here's an example where you can
observe the values being implicitly coerced into strings for sorting, as
they are sorted like strings rather than numbers:
Old behavior:
```nushell
$ {foo: 9 bar: 10} | sort -v
╭─────┬────╮
│ bar │ 10 │
│ foo │ 9 │
╰─────┴────╯
```
New behavior:
```nushell
$ {foo: 9 bar: 10} | sort -v
╭─────┬────╮
│ foo │ 9 │
│ bar │ 10 │
╰─────┴────╯
```
</details>
<details>
<summary>Changed <code>sort-by</code> parameters from
<code>string</code> to <code>cell-path</code> or <code>closure</code>.
Typical interactive usage is the same as before, but if passing a
variable to <code>sort-by</code> it must be a cell path (or closure),
not a string</summary>
Old behavior:
```nushell
$ let sort = "modified"
$ ls | sort-by $sort
╭───┬──────┬──────┬──────┬────────────────╮
│ # │ name │ type │ size │ modified │
├───┼──────┼──────┼──────┼────────────────┤
│ 0 │ foo │ file │ 0 B │ 10 hours ago │
│ 1 │ bar │ file │ 0 B │ 35 seconds ago │
╰───┴──────┴──────┴──────┴────────────────╯
```
New behavior:
```nushell
$ let sort = "modified"
$ ls | sort-by $sort
Error: nu::shell::type_mismatch
× Type mismatch.
╭─[entry #10:1:14]
1 │ ls | sort-by $sort
· ──┬──
· ╰── Cannot sort using a value which is not a cell path or closure
╰────
$ let sort = $."modified"
$ ls | sort-by $sort
╭───┬──────┬──────┬──────┬───────────────╮
│ # │ name │ type │ size │ modified │
├───┼──────┼──────┼──────┼───────────────┤
│ 0 │ foo │ file │ 0 B │ 10 hours ago │
│ 1 │ bar │ file │ 0 B │ 2 minutes ago │
╰───┴──────┴──────┴──────┴───────────────╯
```
</details>
<details>
<summary>Insensitve and natural sorting behavior reworked</summary>
Previously, the `-i` and `-n` worked differently for `sort` and
`sort-by` (see "Making sort more consistent"). Here are examples of how
these options result in different sorts now:
1. `sort -n`
- Old behavior (types other than numbers, strings, dates, and binary
sorted incorrectly)
```nushell
$ [2sec 1sec] | sort -n
╭───┬──────╮
│ 0 │ 2sec │
│ 1 │ 1sec │
╰───┴──────╯
```
- New behavior
```nushell
$ [2sec 1sec] | sort -n
╭───┬──────╮
│ 0 │ 1sec │
│ 1 │ 2sec │
╰───┴──────╯
```
2. `sort-by -i`
- Old behavior (uppercase words appear before lowercase words as they
would in a typical sort, indicating this is not actually an insensitive
sort)
```nushell
$ ["BAR" "bar" "foo" 2 "FOO" 1] | wrap a | sort-by -i a
╭───┬─────╮
│ # │ a │
├───┼─────┤
│ 0 │ 1 │
│ 1 │ 2 │
│ 2 │ BAR │
│ 3 │ FOO │
│ 4 │ bar │
│ 5 │ foo │
╰───┴─────╯
```
- New behavior (strings are sorted stably, indicating this is an
insensitive sort)
```nushell
$ ["BAR" "bar" "foo" 2 "FOO" 1] | wrap a | sort-by -i a
╭───┬─────╮
│ # │ a │
├───┼─────┤
│ 0 │ 1 │
│ 1 │ 2 │
│ 2 │ BAR │
│ 3 │ bar │
│ 4 │ foo │
│ 5 │ FOO │
╰───┴─────╯
```
3. `sort-by -n`
- Old behavior (natural sort does not work when data contains non-string
values)
```nushell
$ ["10" 8 "9"] | wrap a | sort-by -n a
╭───┬────╮
│ # │ a │
├───┼────┤
│ 0 │ 8 │
│ 1 │ 10 │
│ 2 │ 9 │
╰───┴────╯
```
- New behavior
```nushell
$ ["10" 8 "9"] | wrap a | sort-by -n a
╭───┬────╮
│ # │ a │
├───┼────┤
│ 0 │ 8 │
│ 1 │ 9 │
│ 2 │ 10 │
╰───┴────╯
```
</details>
<details>
<summary>
Sorting a list of non-record values with a non-existent column/path now
errors instead of sorting the values directly (<code>sort</code> should
be used for this, not <code>sort-by</code>)
</summary>
Old behavior:
```nushell
$ [2 1] | sort-by foo
╭───┬───╮
│ 0 │ 1 │
│ 1 │ 2 │
╰───┴───╯
```
New behavior:
```nushell
$ [2 1] | sort-by foo
Error: nu::shell::incompatible_path_access
× Data cannot be accessed with a cell path
╭─[entry #29:1:17]
1 │ [2 1] | sort-by foo
· ─┬─
· ╰── int doesn't support cell paths
╰────
```
</details>
<details>
<summary><code>sort</code> and <code>sort-by</code> output
<code>List</code> instead of <code>ListStream</code> </summary>
This isn't a meaningful change (unless I misunderstand the purpose of
ListStream), since `sort` and `sort-by` both need to collect in order to
do the sorting anyway, but is user observable.
Old behavior:
```nushell
$ ls | sort | describe -d
╭──────────┬───────────────────╮
│ type │ stream │
│ origin │ nushell │
│ subtype │ {record 3 fields} │
│ metadata │ {record 1 field} │
╰──────────┴───────────────────╯
```
```nushell
$ ls | sort-by name | describe -d
╭──────────┬───────────────────╮
│ type │ stream │
│ origin │ nushell │
│ subtype │ {record 3 fields} │
│ metadata │ {record 1 field} │
╰──────────┴───────────────────╯
```
New behavior:
```nushell
ls | sort | describe -d
╭────────┬─────────────────╮
│ type │ list │
│ length │ 22 │
│ values │ [table 22 rows] │
╰────────┴─────────────────╯
```
```nushell
$ ls | sort-by name | describe -d
╭────────┬─────────────────╮
│ type │ list │
│ length │ 22 │
│ values │ [table 22 rows] │
╰────────┴─────────────────╯
```
</details>
- `sort` now errors when nothing is piped in (`sort-by` already did
this)
# Tests + Formatting
I added lots of unit tests on the new sort implementation to enforce new
sort behaviors and prevent regressions.
# After Submitting
See [docs PR](https://github.com/nushell/nushell.github.io/pull/1568),
which is ~2/3 finished.
---------
Co-authored-by: NotTheDr01ds <32344964+NotTheDr01ds@users.noreply.github.com>
Co-authored-by: Ian Manske <ian.manske@pm.me>
2024-10-10 02:18:16 +00:00
|
|
|
#[test]
|
|
|
|
fn sort_table() {
|
|
|
|
// if a table's records are compared directly rather than holistically as a table,
|
|
|
|
// [100, 10, 5] will come before [100, 5, 8] because record comparison
|
|
|
|
// compares columns by alphabetical order, so price will be checked before quantity
|
|
|
|
let actual =
|
|
|
|
nu!("[[id, quantity, price]; [100, 10, 5], [100, 5, 8], [100, 5, 1]] | sort | to nuon");
|
|
|
|
|
|
|
|
assert_eq!(
|
|
|
|
actual.out,
|
|
|
|
r#"[[id, quantity, price]; [100, 5, 1], [100, 5, 8], [100, 10, 5]]"#
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
2022-11-09 22:16:51 +00:00
|
|
|
#[test]
|
|
|
|
fn sort_different_types() {
|
2023-04-28 11:25:44 +00:00
|
|
|
let actual = nu!("[a, 1, b, 2, c, 3, [4, 5, 6], d, 4, [1, 2, 3]] | sort | to json --raw");
|
2022-11-09 22:16:51 +00:00
|
|
|
|
|
|
|
let json_output = r#"[1,2,3,4,"a","b","c","d",[1,2,3],[4,5,6]]"#;
|
|
|
|
assert_eq!(actual.out, json_output);
|
|
|
|
}
|
2022-12-01 13:11:30 +00:00
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_natural() {
|
2023-04-28 11:25:44 +00:00
|
|
|
let actual = nu!("['1' '2' '3' '4' '5' '10' '100'] | sort -n | to nuon");
|
2022-12-01 13:11:30 +00:00
|
|
|
|
|
|
|
assert_eq!(actual.out, r#"["1", "2", "3", "4", "5", "10", "100"]"#);
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_record_natural() {
|
2023-04-28 11:25:44 +00:00
|
|
|
let actual = nu!("{10:0,99:0,1:0,9:0,100:0} | sort -n | to nuon");
|
2022-12-01 13:11:30 +00:00
|
|
|
|
|
|
|
assert_eq!(
|
|
|
|
actual.out,
|
|
|
|
r#"{"1": 0, "9": 0, "10": 0, "99": 0, "100": 0}"#
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_record_insensitive() {
|
2023-04-28 11:25:44 +00:00
|
|
|
let actual = nu!("{abe:1,zed:2,ABE:3} | sort -i | to nuon");
|
2022-12-01 13:11:30 +00:00
|
|
|
|
|
|
|
assert_eq!(actual.out, r#"{abe: 1, ABE: 3, zed: 2}"#);
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_record_insensitive_reverse() {
|
2023-04-28 11:25:44 +00:00
|
|
|
let actual = nu!("{abe:1,zed:2,ABE:3} | sort -ir | to nuon");
|
2022-12-01 13:11:30 +00:00
|
|
|
|
|
|
|
assert_eq!(actual.out, r#"{zed: 2, ABE: 3, abe: 1}"#);
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_record_values_natural() {
|
2023-04-28 11:25:44 +00:00
|
|
|
let actual = nu!(r#"{1:"1",2:"2",4:"100",3:"10"} | sort -vn | to nuon"#);
|
2022-12-01 13:11:30 +00:00
|
|
|
|
|
|
|
assert_eq!(actual.out, r#"{"1": "1", "2": "2", "3": "10", "4": "100"}"#);
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_record_values_insensitive() {
|
2023-04-28 11:25:44 +00:00
|
|
|
let actual = nu!("{1:abe,2:zed,3:ABE} | sort -vi | to nuon");
|
2022-12-01 13:11:30 +00:00
|
|
|
|
|
|
|
assert_eq!(actual.out, r#"{"1": abe, "3": ABE, "2": zed}"#);
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_record_values_insensitive_reverse() {
|
2023-04-28 11:25:44 +00:00
|
|
|
let actual = nu!("{1:abe,2:zed,3:ABE} | sort -vir | to nuon");
|
2022-12-01 13:11:30 +00:00
|
|
|
|
|
|
|
assert_eq!(actual.out, r#"{"2": zed, "3": ABE, "1": abe}"#);
|
|
|
|
}
|
Support passing an empty list to sort, uniq, sort-by, and uniq-by (issue #5957) (#8669)
# Description
Currently, all four of these commands return a (rather-confusing)
spanless error when passed an empty list:
```
> [] | sort
Error:
× no values to work with
help: no values to work with
```
This PR changes these commands to always output `[]` if the input is
`[]`.
```
> [] | sort
╭────────────╮
│ empty list │
╰────────────╯
> [] | uniq-by foo
╭────────────╮
│ empty list │
╰────────────╯
```
I'm not sure what the original logic was here, but in the case of `sort`
and `uniq`, I think the current behavior is straightforwardly wrong.
`sort-by` and `uniq-by` are a bit more complicated, since they currently
try to perform some validation that the specified column name is present
in the input (see #8667 for problems with this validation, where a
possible outcome is removing the validation entirely). When passed `[]`,
it's not possible to do any validation because there are no records.
This opens up the possibility for situations like the following:
```
> [[foo]; [5] [6]] | where foo < 3 | sort-by bar
╭────────────╮
│ empty list │
╰────────────╯
```
I think there's a strong argument that `[]` is the best output for these
commands as well, since it makes pipelines like `$table | filter
$condition | sort-by $column` more predictable. Currently, this pipeline
will throw an error if `filter` evaluates to `[]`, but work fine
otherwise. This makes it difficult to write reliable code, especially
since users are not likely to encounter the `filter -> []` case in
testing (issue #5957). The only workaround is to insert manual checks
for an empty result. IMO, this is significantly worse than the "you can
typo a column name without getting an error" problem shown above.
Other commands that take column arguments (`get`, `select`, `rename`,
etc) already have `[] -> []`, so there's existing precedent for this
behavior.
The core question here is "what columns does `[]` have"? The current
behavior of `sort-by` is "no columns", while the current behavior of
`select` is "all possible columns". Both answers lead to accepting some
likely-buggy code without throwing on error, but in order to do better
here we would need something like `Value::Table` that tracks columns on
empty tables.
If other people disagree with this logic, I'm happy to split out the
`sort-by` and `uniq-by` changes into another PR.
# User-Facing Changes
`sort`, `uniq`, `sort-by`, and `uniq-by` now return `[]` instead of
throwing an error when input is `[]`.
# After Submitting
> If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
The existing behavior was not documented, and the new behavior is what
you would expect by default, so I don't think we need to update
documentation.
---------
Co-authored-by: Reilly Wood <reilly.wood@icloud.com>
2023-03-30 02:55:38 +00:00
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn sort_empty() {
|
|
|
|
let actual = nu!("[] | sort | to nuon");
|
|
|
|
|
|
|
|
assert_eq!(actual.out, "[]");
|
|
|
|
}
|