nushell/crates/nu-command/src
alex-tdrn 40e629beb1
Fix multibyte codepoint handling in detect columns --guess (#13272)
<!--
if this PR closes one or more issues, you can automatically link the PR
with
them by using one of the [*linking
keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword),
e.g.
- this PR should close #xxxx
- fixes #xxxx

you can also mention related issues, PRs or discussions!
-->

# Description
<!--
Thank you for improving Nushell. Please, check our [contributing
guide](../CONTRIBUTING.md) and talk to the core team before making major
changes.

Description of your pull request goes here. **Provide examples and/or
screenshots** if your changes affect the user experience.
-->
This PR fixes #13269. The splitting code in `guess_width.rs` was
creating slices from char indices, instead of byte indices. This works
perfectly fine for 1-byte code points, but panics or returns wrong
results as soon as multibyte codepoints appear in the input. I
originally discovered this by piping `winget list` into `detect columns
--guess`, since winget sometimes uses the unicode ellipsis symbol (`…`)
which is 3 bytes long when encoded in utf-8.

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->
`detect columns --guess` should not crash due to multibyte unicode input
anymore

before:

![image](https://github.com/nushell/nushell/assets/20356389/833cd732-be3b-4158-97f7-0ca2616ce23f)

after:

![image](https://github.com/nushell/nushell/assets/20356389/15358b40-4083-4a33-9f2c-87e63f39d985)


# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->
- Added tests to `guess_width.rs` for testing handling of multibyte as
well as combining diacritical marks

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
2024-06-29 16:12:17 -05:00
..
bytes Improves commands that support range input (#13113) 2024-06-18 07:19:13 -05:00
charting Add derive macros for FromValue and IntoValue to ease the use of Values in Rust code (#13031) 2024-06-17 16:05:11 -07:00
conversions Add derive macros for FromValue and IntoValue to ease the use of Values in Rust code (#13031) 2024-06-17 16:05:11 -07:00
database Replace ExternalStream with new ByteStream type (#12774) 2024-05-16 07:11:18 -07:00
date Make get_full_help take &dyn Command (#12903) 2024-05-19 19:56:33 +02:00
debug Small improvements to debug profile (#12930) 2024-05-22 19:56:51 +03:00
env Define keywords (#13213) 2024-06-25 18:32:54 -07:00
experimental Add command_prelude module (#12291) 2024-03-26 21:17:30 +00:00
filesystem Update and add ls examples (#13222) 2024-06-26 17:49:52 -05:00
filters Fix find command output bug in the case of taking ByteStream input. (#13246) 2024-06-27 09:46:10 -05:00
formats Make the subcommands (from {csv, tsv, ssv}) 0-based for consistency (#13209) 2024-06-26 17:51:47 -05:00
generators Suppress column index for default cal output (#13188) 2024-06-22 07:41:29 -05:00
hash Make get_full_help take &dyn Command (#12903) 2024-05-19 19:56:33 +02:00
help Fix display formatting for command type in help commands (#12996) 2024-06-07 08:03:31 -05:00
math Make get_full_help take &dyn Command (#12903) 2024-05-19 19:56:33 +02:00
misc Use CommandType in more places (#12832) 2024-05-18 23:37:31 +00:00
network Add string/binary type color to ByteStream (#12897) 2024-05-20 00:35:32 +00:00
path path type error and not found changes (#13007) 2024-06-11 05:40:09 +08:00
platform Add Span merging functions (#12511) 2024-05-16 22:34:49 +00:00
random Make get_full_help take &dyn Command (#12903) 2024-05-19 19:56:33 +02:00
removed Add command_prelude module (#12291) 2024-03-26 21:17:30 +00:00
shells Add command_prelude module (#12291) 2024-03-26 21:17:30 +00:00
stor Allow stor insert and stor update to accept pipeline input (#12882) 2024-06-06 10:30:06 -05:00
strings Fix multibyte codepoint handling in detect columns --guess (#13272) 2024-06-29 16:12:17 -05:00
system Mitigate the poor interaction between ndots expansion and non-path strings (#13218) 2024-06-24 16:39:01 -07:00
viewers Table help rendering (#13182) 2024-06-19 20:12:25 -05:00
default_context.rs Make which-support feature non-optional (#13125) 2024-06-12 20:04:12 -05:00
example_test.rs Initial --params implementation (#12249) 2024-03-24 15:40:21 -05:00
lib.rs Initial --params implementation (#12249) 2024-03-24 15:40:21 -05:00
progress_bar.rs Replace ExternalStream with new ByteStream type (#12774) 2024-05-16 07:11:18 -07:00
sort_utils.rs Add derive macros for FromValue and IntoValue to ease the use of Values in Rust code (#13031) 2024-06-17 16:05:11 -07:00