nushell/crates/nu-cmd-extra/tests/commands/bytes/starts_with.rs

154 lines
3.7 KiB
Rust
Raw Normal View History

special-case ExternalStream in bytes starts-with (#8203) # Description `bytes starts-with` converts the input into a `Value` before running .starts_with to find if the binary matches. This has two side effects: it makes the code simpler, only dealing in whole values, and simplifying a lot of input pipeline handling and value transforming it would otherwise have to do. _Especially_ in the presence of a cell path to drill into. It also makes buffers the entire input into memory, which can take up a lot of memory when dealing with large files, especially if you only want to check the first few bytes (like for a magic number). This PR adds a special branch on PipelineData::ExternalStream with a streaming version of starts_with. # User-Facing Changes Opening large files and running bytes starts-with on them will not take a long time. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # Drawbacks Streaming checking is more complicated, and there may be bugs. I tested it with multiple chunks with string data and binary data and it seems to work alright up to 8k and over bytes, though. The existing `operate` method still exists because the way it handles cell paths and values is complicated. This causes some "code duplication", or at least some intent duplication, between the value code and the streaming code. This might be worthwhile considering the performance gains (approaching infinity on larger inputs). Another thing to consider is that my ExternalStream branch considers string data as valid input. The operate branch only parses Binary values, so it would fail. `open` is kind of unpredictable on whether it returns string data or binary data, even when passing `--raw`. I think this can be a problem but not really one I'm trying to tackle in this PR, so, it's worth considering.
2023-02-26 14:17:44 +00:00
use nu_test_support::nu;
#[test]
fn basic_binary_starts_with() {
let actual = nu!(
cwd: ".",
r#"
"hello world" | into binary | bytes starts-with 0x[68 65 6c 6c 6f]
"#
);
assert_eq!(actual.out, "true");
}
#[test]
fn basic_string_fails() {
let actual = nu!(
cwd: ".",
r#"
"hello world" | bytes starts-with 0x[68 65 6c 6c 6f]
"#
);
Input output checking (#9680) # Description This PR tights input/output type-checking a bit more. There are a lot of commands that don't have correct input/output types, so part of the effort is updating them. This PR now contains updates to commands that had wrong input/output signatures. It doesn't add examples for these new signatures, but that can be follow-up work. # User-Facing Changes BREAKING CHANGE BREAKING CHANGE This work enforces many more checks on pipeline type correctness than previous nushell versions. This strictness may uncover incompatibilities in existing scripts or shortcomings in the type information for internal commands. # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect -A clippy::result_large_err` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass - `cargo run -- -c "use std testing; testing run-tests --path crates/nu-std"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->
2023-07-14 03:20:35 +00:00
assert!(actual.err.contains("command doesn't support"));
special-case ExternalStream in bytes starts-with (#8203) # Description `bytes starts-with` converts the input into a `Value` before running .starts_with to find if the binary matches. This has two side effects: it makes the code simpler, only dealing in whole values, and simplifying a lot of input pipeline handling and value transforming it would otherwise have to do. _Especially_ in the presence of a cell path to drill into. It also makes buffers the entire input into memory, which can take up a lot of memory when dealing with large files, especially if you only want to check the first few bytes (like for a magic number). This PR adds a special branch on PipelineData::ExternalStream with a streaming version of starts_with. # User-Facing Changes Opening large files and running bytes starts-with on them will not take a long time. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # Drawbacks Streaming checking is more complicated, and there may be bugs. I tested it with multiple chunks with string data and binary data and it seems to work alright up to 8k and over bytes, though. The existing `operate` method still exists because the way it handles cell paths and values is complicated. This causes some "code duplication", or at least some intent duplication, between the value code and the streaming code. This might be worthwhile considering the performance gains (approaching infinity on larger inputs). Another thing to consider is that my ExternalStream branch considers string data as valid input. The operate branch only parses Binary values, so it would fail. `open` is kind of unpredictable on whether it returns string data or binary data, even when passing `--raw`. I think this can be a problem but not really one I'm trying to tackle in this PR, so, it's worth considering.
2023-02-26 14:17:44 +00:00
assert_eq!(actual.out, "");
}
#[test]
fn short_stream_binary() {
let actual = nu!(
cwd: ".",
r#"
nu --testbin repeater (0x[01]) 5 | bytes starts-with 0x[010101]
"#
);
assert_eq!(actual.out, "true");
}
#[test]
fn short_stream_mismatch() {
let actual = nu!(
cwd: ".",
r#"
nu --testbin repeater (0x[010203]) 5 | bytes starts-with 0x[010204]
"#
);
assert_eq!(actual.out, "false");
}
#[test]
fn short_stream_binary_overflow() {
let actual = nu!(
cwd: ".",
r#"
nu --testbin repeater (0x[01]) 5 | bytes starts-with 0x[010101010101]
"#
);
assert_eq!(actual.out, "false");
}
#[test]
fn long_stream_binary() {
let actual = nu!(
cwd: ".",
r#"
nu --testbin repeater (0x[01]) 32768 | bytes starts-with 0x[010101]
"#
);
assert_eq!(actual.out, "true");
}
#[test]
fn long_stream_binary_overflow() {
// .. ranges are inclusive..inclusive, so we don't need to +1 to check for an overflow
let actual = nu!(
cwd: ".",
r#"
Restrict closure expression to be something like `{|| ...}` (#8290) # Description As title, closes: #7921 closes: #8273 # User-Facing Changes when define a closure without pipe, nushell will raise error for now: ``` ❯ let x = {ss ss} Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #2:1:1] 1 │ let x = {ss ss} · ───┬─── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `any`, `each`, `all`, `where` command accepts closure, it forces user input closure like `{||`, or parse error will returned. ``` ❯ {major:2, minor:1, patch:4} | values | each { into string } Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #4:1:1] 1 │ {major:2, minor:1, patch:4} | values | each { into string } · ───────┬─────── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `with-env`, `do`, `def`, `try` are special, they still remain the same, although it says that it accepts a closure, but they don't need to be written like `{||`, it's more likely a block but can capture variable outside of scope: ``` ❯ def test [input] { echo [0 1 2] | do { do { echo $input } } }; test aaa aaa ``` Just realize that It's a big breaking change, we need to update config and scripts... # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2023-03-17 12:36:28 +00:00
nu --testbin repeater (0x[01]) 32768 | bytes starts-with (0..32768 | each {|| 0x[01] } | bytes collect)
special-case ExternalStream in bytes starts-with (#8203) # Description `bytes starts-with` converts the input into a `Value` before running .starts_with to find if the binary matches. This has two side effects: it makes the code simpler, only dealing in whole values, and simplifying a lot of input pipeline handling and value transforming it would otherwise have to do. _Especially_ in the presence of a cell path to drill into. It also makes buffers the entire input into memory, which can take up a lot of memory when dealing with large files, especially if you only want to check the first few bytes (like for a magic number). This PR adds a special branch on PipelineData::ExternalStream with a streaming version of starts_with. # User-Facing Changes Opening large files and running bytes starts-with on them will not take a long time. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # Drawbacks Streaming checking is more complicated, and there may be bugs. I tested it with multiple chunks with string data and binary data and it seems to work alright up to 8k and over bytes, though. The existing `operate` method still exists because the way it handles cell paths and values is complicated. This causes some "code duplication", or at least some intent duplication, between the value code and the streaming code. This might be worthwhile considering the performance gains (approaching infinity on larger inputs). Another thing to consider is that my ExternalStream branch considers string data as valid input. The operate branch only parses Binary values, so it would fail. `open` is kind of unpredictable on whether it returns string data or binary data, even when passing `--raw`. I think this can be a problem but not really one I'm trying to tackle in this PR, so, it's worth considering.
2023-02-26 14:17:44 +00:00
"#
);
assert_eq!(actual.out, "false");
}
#[test]
fn long_stream_binary_exact() {
// ranges are inclusive..inclusive, so we don't need to +1 to check for an overflow
let actual = nu!(
cwd: ".",
r#"
Restrict closure expression to be something like `{|| ...}` (#8290) # Description As title, closes: #7921 closes: #8273 # User-Facing Changes when define a closure without pipe, nushell will raise error for now: ``` ❯ let x = {ss ss} Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #2:1:1] 1 │ let x = {ss ss} · ───┬─── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `any`, `each`, `all`, `where` command accepts closure, it forces user input closure like `{||`, or parse error will returned. ``` ❯ {major:2, minor:1, patch:4} | values | each { into string } Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #4:1:1] 1 │ {major:2, minor:1, patch:4} | values | each { into string } · ───────┬─────── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `with-env`, `do`, `def`, `try` are special, they still remain the same, although it says that it accepts a closure, but they don't need to be written like `{||`, it's more likely a block but can capture variable outside of scope: ``` ❯ def test [input] { echo [0 1 2] | do { do { echo $input } } }; test aaa aaa ``` Just realize that It's a big breaking change, we need to update config and scripts... # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2023-03-17 12:36:28 +00:00
nu --testbin repeater (0x[01020304]) 8192 | bytes starts-with (0..<8192 | each {|| 0x[01020304] } | bytes collect)
special-case ExternalStream in bytes starts-with (#8203) # Description `bytes starts-with` converts the input into a `Value` before running .starts_with to find if the binary matches. This has two side effects: it makes the code simpler, only dealing in whole values, and simplifying a lot of input pipeline handling and value transforming it would otherwise have to do. _Especially_ in the presence of a cell path to drill into. It also makes buffers the entire input into memory, which can take up a lot of memory when dealing with large files, especially if you only want to check the first few bytes (like for a magic number). This PR adds a special branch on PipelineData::ExternalStream with a streaming version of starts_with. # User-Facing Changes Opening large files and running bytes starts-with on them will not take a long time. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # Drawbacks Streaming checking is more complicated, and there may be bugs. I tested it with multiple chunks with string data and binary data and it seems to work alright up to 8k and over bytes, though. The existing `operate` method still exists because the way it handles cell paths and values is complicated. This causes some "code duplication", or at least some intent duplication, between the value code and the streaming code. This might be worthwhile considering the performance gains (approaching infinity on larger inputs). Another thing to consider is that my ExternalStream branch considers string data as valid input. The operate branch only parses Binary values, so it would fail. `open` is kind of unpredictable on whether it returns string data or binary data, even when passing `--raw`. I think this can be a problem but not really one I'm trying to tackle in this PR, so, it's worth considering.
2023-02-26 14:17:44 +00:00
"#
);
assert_eq!(actual.out, "true");
}
#[test]
fn long_stream_string_exact() {
// ranges are inclusive..inclusive, so we don't need to +1 to check for an overflow
let actual = nu!(
cwd: ".",
r#"
Restrict closure expression to be something like `{|| ...}` (#8290) # Description As title, closes: #7921 closes: #8273 # User-Facing Changes when define a closure without pipe, nushell will raise error for now: ``` ❯ let x = {ss ss} Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #2:1:1] 1 │ let x = {ss ss} · ───┬─── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `any`, `each`, `all`, `where` command accepts closure, it forces user input closure like `{||`, or parse error will returned. ``` ❯ {major:2, minor:1, patch:4} | values | each { into string } Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #4:1:1] 1 │ {major:2, minor:1, patch:4} | values | each { into string } · ───────┬─────── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `with-env`, `do`, `def`, `try` are special, they still remain the same, although it says that it accepts a closure, but they don't need to be written like `{||`, it's more likely a block but can capture variable outside of scope: ``` ❯ def test [input] { echo [0 1 2] | do { do { echo $input } } }; test aaa aaa ``` Just realize that It's a big breaking change, we need to update config and scripts... # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2023-03-17 12:36:28 +00:00
nu --testbin repeater hell 8192 | bytes starts-with (0..<8192 | each {|| "hell" | into binary } | bytes collect)
special-case ExternalStream in bytes starts-with (#8203) # Description `bytes starts-with` converts the input into a `Value` before running .starts_with to find if the binary matches. This has two side effects: it makes the code simpler, only dealing in whole values, and simplifying a lot of input pipeline handling and value transforming it would otherwise have to do. _Especially_ in the presence of a cell path to drill into. It also makes buffers the entire input into memory, which can take up a lot of memory when dealing with large files, especially if you only want to check the first few bytes (like for a magic number). This PR adds a special branch on PipelineData::ExternalStream with a streaming version of starts_with. # User-Facing Changes Opening large files and running bytes starts-with on them will not take a long time. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # Drawbacks Streaming checking is more complicated, and there may be bugs. I tested it with multiple chunks with string data and binary data and it seems to work alright up to 8k and over bytes, though. The existing `operate` method still exists because the way it handles cell paths and values is complicated. This causes some "code duplication", or at least some intent duplication, between the value code and the streaming code. This might be worthwhile considering the performance gains (approaching infinity on larger inputs). Another thing to consider is that my ExternalStream branch considers string data as valid input. The operate branch only parses Binary values, so it would fail. `open` is kind of unpredictable on whether it returns string data or binary data, even when passing `--raw`. I think this can be a problem but not really one I'm trying to tackle in this PR, so, it's worth considering.
2023-02-26 14:17:44 +00:00
"#
);
assert_eq!(actual.out, "true");
}
#[test]
fn long_stream_mixed_exact() {
// ranges are inclusive..inclusive, so we don't need to +1 to check for an overflow
let actual = nu!(
cwd: ".",
r#"
Restrict closure expression to be something like `{|| ...}` (#8290) # Description As title, closes: #7921 closes: #8273 # User-Facing Changes when define a closure without pipe, nushell will raise error for now: ``` ❯ let x = {ss ss} Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #2:1:1] 1 │ let x = {ss ss} · ───┬─── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `any`, `each`, `all`, `where` command accepts closure, it forces user input closure like `{||`, or parse error will returned. ``` ❯ {major:2, minor:1, patch:4} | values | each { into string } Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #4:1:1] 1 │ {major:2, minor:1, patch:4} | values | each { into string } · ───────┬─────── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `with-env`, `do`, `def`, `try` are special, they still remain the same, although it says that it accepts a closure, but they don't need to be written like `{||`, it's more likely a block but can capture variable outside of scope: ``` ❯ def test [input] { echo [0 1 2] | do { do { echo $input } } }; test aaa aaa ``` Just realize that It's a big breaking change, we need to update config and scripts... # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2023-03-17 12:36:28 +00:00
let binseg = (0..<2048 | each {|| 0x[003d9fbf] } | bytes collect)
let strseg = (0..<2048 | each {|| "hell" | into binary } | bytes collect)
special-case ExternalStream in bytes starts-with (#8203) # Description `bytes starts-with` converts the input into a `Value` before running .starts_with to find if the binary matches. This has two side effects: it makes the code simpler, only dealing in whole values, and simplifying a lot of input pipeline handling and value transforming it would otherwise have to do. _Especially_ in the presence of a cell path to drill into. It also makes buffers the entire input into memory, which can take up a lot of memory when dealing with large files, especially if you only want to check the first few bytes (like for a magic number). This PR adds a special branch on PipelineData::ExternalStream with a streaming version of starts_with. # User-Facing Changes Opening large files and running bytes starts-with on them will not take a long time. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # Drawbacks Streaming checking is more complicated, and there may be bugs. I tested it with multiple chunks with string data and binary data and it seems to work alright up to 8k and over bytes, though. The existing `operate` method still exists because the way it handles cell paths and values is complicated. This causes some "code duplication", or at least some intent duplication, between the value code and the streaming code. This might be worthwhile considering the performance gains (approaching infinity on larger inputs). Another thing to consider is that my ExternalStream branch considers string data as valid input. The operate branch only parses Binary values, so it would fail. `open` is kind of unpredictable on whether it returns string data or binary data, even when passing `--raw`. I think this can be a problem but not really one I'm trying to tackle in this PR, so, it's worth considering.
2023-02-26 14:17:44 +00:00
nu --testbin repeat_bytes 003d9fbf 2048 68656c6c 2048 | bytes starts-with (bytes build $binseg $strseg)
"#
);
assert_eq!(
actual.err, "",
"invocation failed. command line limit likely reached"
);
assert_eq!(actual.out, "true");
}
#[test]
fn long_stream_mixed_overflow() {
// ranges are inclusive..inclusive, so we don't need to +1 to check for an overflow
let actual = nu!(
cwd: ".",
r#"
Restrict closure expression to be something like `{|| ...}` (#8290) # Description As title, closes: #7921 closes: #8273 # User-Facing Changes when define a closure without pipe, nushell will raise error for now: ``` ❯ let x = {ss ss} Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #2:1:1] 1 │ let x = {ss ss} · ───┬─── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `any`, `each`, `all`, `where` command accepts closure, it forces user input closure like `{||`, or parse error will returned. ``` ❯ {major:2, minor:1, patch:4} | values | each { into string } Error: nu::parser::closure_missing_pipe × Missing || inside closure ╭─[entry #4:1:1] 1 │ {major:2, minor:1, patch:4} | values | each { into string } · ───────┬─────── · ╰── Parsing as a closure, but || is missing ╰──── help: Try add || to the beginning of closure ``` `with-env`, `do`, `def`, `try` are special, they still remain the same, although it says that it accepts a closure, but they don't need to be written like `{||`, it's more likely a block but can capture variable outside of scope: ``` ❯ def test [input] { echo [0 1 2] | do { do { echo $input } } }; test aaa aaa ``` Just realize that It's a big breaking change, we need to update config and scripts... # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # After Submitting If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date.
2023-03-17 12:36:28 +00:00
let binseg = (0..<2048 | each {|| 0x[003d9fbf] } | bytes collect)
let strseg = (0..<2048 | each {|| "hell" | into binary } | bytes collect)
special-case ExternalStream in bytes starts-with (#8203) # Description `bytes starts-with` converts the input into a `Value` before running .starts_with to find if the binary matches. This has two side effects: it makes the code simpler, only dealing in whole values, and simplifying a lot of input pipeline handling and value transforming it would otherwise have to do. _Especially_ in the presence of a cell path to drill into. It also makes buffers the entire input into memory, which can take up a lot of memory when dealing with large files, especially if you only want to check the first few bytes (like for a magic number). This PR adds a special branch on PipelineData::ExternalStream with a streaming version of starts_with. # User-Facing Changes Opening large files and running bytes starts-with on them will not take a long time. # Tests + Formatting Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass # Drawbacks Streaming checking is more complicated, and there may be bugs. I tested it with multiple chunks with string data and binary data and it seems to work alright up to 8k and over bytes, though. The existing `operate` method still exists because the way it handles cell paths and values is complicated. This causes some "code duplication", or at least some intent duplication, between the value code and the streaming code. This might be worthwhile considering the performance gains (approaching infinity on larger inputs). Another thing to consider is that my ExternalStream branch considers string data as valid input. The operate branch only parses Binary values, so it would fail. `open` is kind of unpredictable on whether it returns string data or binary data, even when passing `--raw`. I think this can be a problem but not really one I'm trying to tackle in this PR, so, it's worth considering.
2023-02-26 14:17:44 +00:00
nu --testbin repeat_bytes 003d9fbf 2048 68656c6c 2048 | bytes starts-with (bytes build $binseg $strseg 0x[01])
"#
);
assert_eq!(
actual.err, "",
"invocation failed. command line limit likely reached"
);
assert_eq!(actual.out, "false");
}