Optimize `escape_ascii` using a lookup table
Based upon my suggestion here: https://github.com/rust-lang/rust/pull/125340#issuecomment-2130441817
Effectively, we can take advantage of the fact that ASCII only needs 7 bits to make the eighth bit store whether the value should be escaped or not. This adds a 256-byte lookup table, but 256 bytes *should* be small enough that very few people will mind, according to my probably not incontrovertible opinion.
The generated assembly isn't clearly better (although has fewer branches), so, I decided to benchmark on three inputs: first on a random 200KiB, then on `/bin/cat`, then on `Cargo.toml` for this repo. In all cases, the generated code ran faster on my machine. (an old i7-8700)
But, if you want to try my benchmarking code for yourself:
<details><summary>Criterion code below. Replace <code>/home/ltdk/rustsrc</code> with the appropriate directory.</summary>
```rust
#![feature(ascii_char)]
#![feature(ascii_char_variants)]
#![feature(const_option)]
#![feature(let_chains)]
use core::ascii;
use core::ops::Range;
use criterion::{criterion_group, criterion_main, Criterion};
use rand::{thread_rng, Rng};
const HEX_DIGITS: [ascii::Char; 16] = *b"0123456789abcdef".as_ascii().unwrap();
#[inline]
const fn backslash<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 2) };
let mut output = [ascii::Char::Null; N];
output[0] = ascii::Char::ReverseSolidus;
output[1] = a;
(output, 0..2)
}
#[inline]
const fn hex_escape<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 4) };
let mut output = [ascii::Char::Null; N];
let hi = HEX_DIGITS[(byte >> 4) as usize];
let lo = HEX_DIGITS[(byte & 0xf) as usize];
output[0] = ascii::Char::ReverseSolidus;
output[1] = ascii::Char::SmallX;
output[2] = hi;
output[3] = lo;
(output, 0..4)
}
#[inline]
const fn verbatim<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 1) };
let mut output = [ascii::Char::Null; N];
output[0] = a;
(output, 0..1)
}
/// Escapes an ASCII character.
///
/// Returns a buffer and the length of the escaped representation.
const fn escape_ascii_old<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
const { assert!(N >= 4) };
match byte {
b'\t' => backslash(ascii::Char::SmallT),
b'\r' => backslash(ascii::Char::SmallR),
b'\n' => backslash(ascii::Char::SmallN),
b'\\' => backslash(ascii::Char::ReverseSolidus),
b'\'' => backslash(ascii::Char::Apostrophe),
b'\"' => backslash(ascii::Char::QuotationMark),
0x00..=0x1F => hex_escape(byte),
_ => match ascii::Char::from_u8(byte) {
Some(a) => verbatim(a),
None => hex_escape(byte),
},
}
}
/// Escapes an ASCII character.
///
/// Returns a buffer and the length of the escaped representation.
const fn escape_ascii_new<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
/// Lookup table helps us determine how to display character.
///
/// Since ASCII characters will always be 7 bits, we can exploit this to store the 8th bit to
/// indicate whether the result is escaped or unescaped.
///
/// We additionally use 0x80 (escaped NUL character) to indicate hex-escaped bytes, since
/// escaped NUL will not occur.
const LOOKUP: [u8; 256] = {
let mut arr = [0; 256];
let mut idx = 0;
loop {
arr[idx as usize] = match idx {
// use 8th bit to indicate escaped
b'\t' => 0x80 | b't',
b'\r' => 0x80 | b'r',
b'\n' => 0x80 | b'n',
b'\\' => 0x80 | b'\\',
b'\'' => 0x80 | b'\'',
b'"' => 0x80 | b'"',
// use NUL to indicate hex-escaped
0x00..=0x1F | 0x7F..=0xFF => 0x80 | b'\0',
_ => idx,
};
if idx == 255 {
break;
}
idx += 1;
}
arr
};
let lookup = LOOKUP[byte as usize];
// 8th bit indicates escape
let lookup_escaped = lookup & 0x80 != 0;
// SAFETY: We explicitly mask out the eighth bit to get a 7-bit ASCII character.
let lookup_ascii = unsafe { ascii::Char::from_u8_unchecked(lookup & 0x7F) };
if lookup_escaped {
// NUL indicates hex-escaped
if matches!(lookup_ascii, ascii::Char::Null) {
hex_escape(byte)
} else {
backslash(lookup_ascii)
}
} else {
verbatim(lookup_ascii)
}
}
fn escape_bytes(bytes: &[u8], f: impl Fn(u8) -> ([ascii::Char; 4], Range<u8>)) -> Vec<ascii::Char> {
let mut vec = Vec::new();
for b in bytes {
let (buf, range) = f(*b);
vec.extend_from_slice(&buf[range.start as usize..range.end as usize]);
}
vec
}
pub fn criterion_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("escape_ascii");
group.sample_size(1000);
let rand_200k = &mut [0; 200 * 1024];
thread_rng().fill(&mut rand_200k[..]);
let cat = include_bytes!("/bin/cat");
let cargo_toml = include_bytes!("/home/ltdk/rustsrc/Cargo.toml");
group.bench_function("old_rand", |b| {
b.iter(|| escape_bytes(rand_200k, escape_ascii_old));
});
group.bench_function("new_rand", |b| {
b.iter(|| escape_bytes(rand_200k, escape_ascii_new));
});
group.bench_function("old_bin", |b| {
b.iter(|| escape_bytes(cat, escape_ascii_old));
});
group.bench_function("new_bin", |b| {
b.iter(|| escape_bytes(cat, escape_ascii_new));
});
group.bench_function("old_cargo_toml", |b| {
b.iter(|| escape_bytes(cargo_toml, escape_ascii_old));
});
group.bench_function("new_cargo_toml", |b| {
b.iter(|| escape_bytes(cargo_toml, escape_ascii_new));
});
group.finish();
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
```
</details>
My benchmark results:
```
escape_ascii/old_rand time: [1.6965 ms 1.7006 ms 1.7053 ms]
Found 22 outliers among 1000 measurements (2.20%)
4 (0.40%) high mild
18 (1.80%) high severe
escape_ascii/new_rand time: [1.6749 ms 1.6953 ms 1.7158 ms]
Found 38 outliers among 1000 measurements (3.80%)
38 (3.80%) high mild
escape_ascii/old_bin time: [224.59 µs 225.40 µs 226.33 µs]
Found 39 outliers among 1000 measurements (3.90%)
17 (1.70%) high mild
22 (2.20%) high severe
escape_ascii/new_bin time: [164.86 µs 165.63 µs 166.58 µs]
Found 107 outliers among 1000 measurements (10.70%)
43 (4.30%) high mild
64 (6.40%) high severe
escape_ascii/old_cargo_toml
time: [23.397 µs 23.699 µs 24.014 µs]
Found 204 outliers among 1000 measurements (20.40%)
21 (2.10%) high mild
183 (18.30%) high severe
escape_ascii/new_cargo_toml
time: [16.404 µs 16.438 µs 16.483 µs]
Found 88 outliers among 1000 measurements (8.80%)
56 (5.60%) high mild
32 (3.20%) high severe
```
Random: 1.7006ms => 1.6953ms (<1% speedup)
Binary: 225.40µs => 165.63µs (26% speedup)
Text: 23.699µs => 16.438µs (30% speedup)
Run subprocesses async in vscode extension
Extensions should not block the vscode extension host. Replace uses of `spawnSync` with `spawnAsync`, a convenience wrapper around `spawn`.
These `spawnSync`s are unlikely to cause a real issue in practice, because they spawn very short-lived processes, so we aren't blocking for very long. That said, blocking the extension host is poor practice, and if they _do_ block for too long for whatever reason, vscode becomes useless.
Retire the `unnamed_fields` feature for now
`#![feature(unnamed_fields)]` was implemented in part in #115131 and #115367, however work on that feature has (afaict) stalled and in the mean time there have been some concerns raised (e.g.[^1][^2]) about whether `unnamed_fields` is worthwhile to have in the language, especially in its current desugaring. Because it represents a compiler implementation burden including a new kind of anonymous ADT and additional complication to field selection, and is quite prone to bugs today, I'm choosing to remove the feature.
However, since I'm not one to really write a bunch of words, I'm specifically *not* going to de-RFC this feature. This PR essentially *rolls back* the state of this feature to "RFC accepted but not yet implemented"; however if anyone wants to formally unapprove the RFC from the t-lang side, then please be my guest. I'm just not totally willing to summarize the various language-facing reasons for why this feature is or is not worthwhile, since I'm coming from the compiler side mostly.
Fixes#117942Fixes#121161Fixes#121263Fixes#121299Fixes#121722Fixes#121799Fixes#126969Fixes#131041
Tracking:
* https://github.com/rust-lang/rust/issues/49804
[^1]: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Unnamed.20struct.2Funion.20fields
[^2]: https://github.com/rust-lang/rust/issues/49804#issuecomment-1972619108
disable `download-rustc` if LLVM submodule has changes in CI
We can't use CI rustc while using in-tree LLVM (which happens in LLVM submodule update PRs) and this PR handles that by ignoring CI-rustc in CI and failing in non-CI environments.
Decouple WASIp2 sockets from WasiFd
This is a follow up to #129638, decoupling WASIp2's socket implementation from WASIp1's `WasiFd` as discussed with `@alexcrichton.`
Quite a few trait implementations in `std::os::fd` rely on the fact that there is an additional layer of abstraction between `Socket` and `OwnedFd`. I thus had to add a thin `WasiSocket` wrapper struct that just "forwards" to `OwnedFd`. Alternatively, I could have added a lot of conditional compilation to `std::os::fd`, which feels even worse.
Since `WasiFd::sock_accept` is no longer accessible from `TcpListener` and since WASIp2 has proper support for accepting sockets through `Socket::accept`, the `std::os::wasi::net` module has been removed from WASIp2, which only contains a single `TcpListenerExt` trait with a `sock_accept` method as well as an implementation for `TcpListener`. Let me know if this is an acceptable solution.
Fix needless_lifetimes in rustc_serialize
Hi,
This PR fixes the following clipy warnings:
```
warning: the following explicit lifetimes could be elided: 'a
--> compiler/rustc_serialize/src/serialize.rs:328:6
|
328 | impl<'a, S: Encoder, T: Encodable<S>> Encodable<S> for Cow<'a, [T]>
| ^^ ^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_lifetimes
= note: `#[warn(clippy::needless_lifetimes)]` on by default
help: elide the lifetimes
|
328 - impl<'a, S: Encoder, T: Encodable<S>> Encodable<S> for Cow<'a, [T]>
328 + impl<S: Encoder, T: Encodable<S>> Encodable<S> for Cow<'_, [T]>
|
warning: the following explicit lifetimes could be elided: 'a
--> compiler/rustc_serialize/src/serialize.rs:348:6
|
348 | impl<'a, S: Encoder> Encodable<S> for Cow<'a, str> {
| ^^ ^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_lifetimes
help: elide the lifetimes
|
348 - impl<'a, S: Encoder> Encodable<S> for Cow<'a, str> {
348 + impl<S: Encoder> Encodable<S> for Cow<'_, str> {
|
warning: the following explicit lifetimes could be elided: 'a
--> compiler/rustc_serialize/src/serialize.rs:355:6
|
355 | impl<'a, D: Decoder> Decodable<D> for Cow<'a, str> {
| ^^ ^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_lifetimes
help: elide the lifetimes
|
355 - impl<'a, D: Decoder> Decodable<D> for Cow<'a, str> {
355 + impl<D: Decoder> Decodable<D> for Cow<'_, str> {
```
Best regards,
Michal
Reserve guarded string literals (RFC 3593)
Implementation for RFC 3593, including:
- lexer / parser changes
- diagnostics
- migration lint
- tests
We reserve `#"`, `##"`, `###"`, `####`, and any other string of four or more repeated `#`. This avoids infinite lookahead in the lexer, though we still use infinite lookahead in the parser to provide better forward compatibility diagnostics.
This PR does not implement any special lexing of the string internals:
- strings preceded by one or more `#` are denied
- regardless of the number of trailing `#`
- string contents are lexed as if it was just a bare `"string"`
Tracking issue: #123735
RFC: rust-lang/rfcs#3593
fix: include description in label details when detail field is marked for …
Fixes https://github.com/rust-lang/rust-analyzer/issues/18231.
When omitting the autocomplete detail field, the autocomplete label details can still be returned. Currently the label details are missing the description field if the detail field is included in resolveSupport since it is being overwritten as None and opted to be sent with `completionItem/resolve`.
Example completion capabilities.
```
completion = {
completionItem = {
commitCharactersSupport = true,
deprecatedSupport = true,
documentationFormat = { "markdown", "plaintext" },
insertReplaceSupport = true,
insertTextModeSupport = {
valueSet = { 1, 2 }
},
labelDetailsSupport = true,
preselectSupport = true,
resolveSupport = {
properties = { "documentation", "detail", "additionalTextEdits", "sortText", "filterText", "insertText", "textEdit", "insertTextFormat", "insertTextMode" }
},
snippetSupport = true,
tagSupport = {
valueSet = { 1 }
}
}
```
lsp: fix completion_item something_to_resolve not being a latch to true
while looking at #18245 i noticed that `something_to_resolve` could technically flap between true -> false if some subsequent fields that were requested to be resolved were empty.
this fixes that by using `|=` instead of `=` when assigning to `something_to_resolve` which will prevent it from going back to false once set.
although some cases it's simply assigning to `true` i opted to continue to use `|=` there for uniformity sake. but happy to change those back to `=`'s.
cc `@SomeoneToIgnore`
hir-ty: change struct + enum variant constructor formatting.
before, when formatting struct constructor for `struct S(usize, usize)` it would format as:
extern "rust-call" S(usize, usize) -> S
but after this change, we'll format as:
fn S(usize, usize) -> S
likewise the second commit, also makes this uniform for enum variants as well.
fixes#18259
Prevent building cargo from invalidating build cache of other tools due to conditionally applied `-Zon-broken-pipe=kill` via tracked `RUSTFLAGS`
This PR fixes#130980 where building cargo invalidated the tool build caches of other tools (such as rustdoc) because `-Zon-broken-pipe=kill` was conditionally passed via `RUSTFLAGS` for other tools *except* for cargo. The differing `RUSTFLAGS` triggered tool build cache invalidation as `RUSTFLAGS` is a tracked env var -- any changes in `RUSTFLAGS` requires a rebuild.
`-Zon-broken-pipe=kill` is load-bearing for rustc and rustdoc to not ICE on broken pipes due to usages of raw std `println!` that panics without the flag being set, which manifests in ICEs.
I can't say I like the changes here, but it is what it is...
See detailed discussions and history of `-Zon-broken-pipe=kill` usage in https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Internal.20lint.20for.20raw.20.60print!.60.20and.20.60println!.60.3F/near/474593815.
## Approach
This PR fixes the tool build cache invalidation by informing the `rustc` binary shim when to apply `-Zon-broken-pipe=kill` (i.e. when the rustc binary shim is not used to build cargo). This information is not communicated by `RUSTFLAGS`, which is an env var tracked by cargo, and instead uses an untracked env var `UNTRACKED_BROKEN_PIPE_FLAG` so we won't trigger tool build cache invalidation. We preserve bootstrap's behavior of not setting that flag for cargo by conditionally omitting setting `UNTRACKED_BROKEN_PIPE_FLAG` when building cargo.
Notably, the `-Zon-broken-pipe=kill` instance in 1e5719bdc4/src/bootstrap/src/core/build_steps/compile.rs (L1058) is not modified because that is used to build rustc only and not cargo itself.
Thanks to `@cuviper` for the idea!
## Testing
### Integration testing
This PR introduces a run-make test for rustc and rustdoc that checks that when they do not ICE/panic when they encounter a broken pipe of the stdout stream.
I checked this test will catch the broken pipe ICE regression for rustc on Linux (at least) by commenting out 1e5719bdc4/src/bootstrap/src/core/build_steps/compile.rs (L1058), and the test failed because rustc ICE'd.
### Manual testing
I have manually tried:
1. `./x clean && `./x test build --stage 1` -> `rustc +stage1 --print=sysroot | false`: no ICE.
2. `./x clean` -> `./x test run-make` twice: no stage 1 cargo rebuilds.
3. `./x clean` -> `./x build rustdoc` -> `rustdoc +stage1 --version | false`: no panics.
4. `./x test src/tools/cargo`: tests pass, notably `build::close_output` and `cargo_command::closed_output_ok` do not fail which would fail if cargo was built with `-Zon-broken-pipe=kill`.
## Related discussions
Thanks to everyone who helped!
- https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo/topic/Applying.20.60-Zon-broken-pipe.3Dkill.60.20flags.20in.20bootstrap.3F
- https://rust-lang.zulipchat.com/#narrow/stream/326414-t-infra.2Fbootstrap/topic/Modifying.20run-make.20tests.20unnecessarily.20rebuild.20stage.201.20.2E.2E.2E
- https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Internal.20lint.20for.20raw.20.60print!.60.20and.20.60println!.60.3F
Fixes https://github.com/rust-lang/rust/issues/130980
Closes https://github.com/rust-lang/rust/issues/131059
---
try-job: aarch64-apple
try-job: x86_64-msvc
try-job: x86_64-mingw
use precompiled rustc for non-dist builders
Makes non-dist builders to use precompiled CI rustc by default if they are available for the target triple.
As we are going to make `rust.download-rustc=if-unchanged` default option with https://github.com/rust-lang/rust/pull/119899, we need to make sure `if-unchanged` logic never breaks and works as expected.
As an addition, this will significantly improve the build times on CI when there's no change on the compiler.
blocker for #119899
try-job: x86_64-gnu-nopt
try-job: aarch64-apple
before, when formatting struct constructor for `struct S(usize, usize)` it would format as:
extern "rust-call" S(usize, usize) -> S
but after this change, we'll format as:
fn S(usize, usize) -> S
Use external stack in borrowck DFS
Because damnit, it can crash r-a. Why do people make this stupid DFSes anyway (I get it, it's easier until it blows).
Fixes#18223 (who thought DFS will be the problem).