Optimize large array creation in const-eval
This changes repeated memcpy's to a memset for the case that we're propagating a single byte into a region of memory. It also optimizes the element-by-element copies to have a tighter loop; I'm pretty sure the old code was actually doing a multiply within each loop iteration.
For an 8GB array (`static SLICE: [u8; SIZE] = [0u8; 1 << 33];`) this takes us from ~23 seconds to ~6 seconds locally, which is spent roughly 50/50 in (a) memset to zero and (b) memcpy of the original place into a new place, when popping stack frame. The latter seems hard to avoid but is a big memcpy (since we're copying the type rather than initializing a region, so it's pretty fast), and the first is as good as it's going to get without special casing constant-valued arrays.
Closes https://github.com/rust-lang/rust/issues/55795. (That issue's references to lint checking don't appear true anymore, but I think this closes that case as something that is slow due to *time* pretty fully. An 8GB array taking only 6 seconds feels reasonable enough to not merit further tracking).
Get rid of the hir_owner query.
This query was meant as a firewall between `hir_owner_nodes` which is supposed to change often, and the queries that only depend on the item signature. That firewall was inefficient, leaking the contents of the HIR body through `HirId`s.
`hir_owner` incurs a significant cost, as we need to hash HIR twice in multiple modes. This PR proposes to remove it, and simplify the hashing scheme.
For the future, `def_kind`, `def_span`... are much more efficient for incremental decoupling, and should be preferred.
[rustc_data_structures] Use partition_point to find slice range end.
This PR uses approach introduced in https://github.com/rust-lang/rust/pull/114152 to find
the end of the range. It's much easier to understand and reason about invariants of such
implementation.
Technically it's possible to make it even shorter by returning `&[start..end]` unconditionally
because even if searched item is not present in the slice, `start` and `end` would point at
the same index, so the range would be empty. The reason I decided not to use this shorter
implementation is because it would involve more comparisons in case there are no elements
in the slice with key equal to `key`.
Also, not that it matters much, but this implementation also improves perf according to the
benchmark below:
https://gist.github.com/ttsugriy/63c0ed39ae132b131931fa1f8a3dea55
The results on my M1 macbook air are:
```
Running benches/bin_search_slice_benchmark.rs (target/release/deps/bin_search_slice_benchmark-90fa6d68c3bd1298)
Benchmarking multiply add/binary_search_slice: Collecting 100 samples in estimated 5.0002 s (1
multiply add/binary_search_slice
time: [44.719 ns 44.918 ns 45.158 ns]
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking multiply add/binary_search_slice_new: Collecting 100 samples in estimated 5.0001
multiply add/binary_search_slice_new
time: [36.955 ns 37.060 ns 37.221 ns]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
```
Rollup of 8 pull requests
Successful merges:
- #119172 (Detect `NulInCStr` error earlier.)
- #119833 (Make tcx optional from StableMIR run macro and extend it to accept closures)
- #119967 (Add `PatKind::Err` to AST/HIR)
- #119978 (Move async closure parameters into the resultant closure's future eagerly)
- #120021 (don't store const var origins for known vars)
- #120038 (Don't create a separate "basename" when naming and opening a MIR dump file)
- #120057 (Don't ICE when deducing future output if other errors already occurred)
- #120073 (Remove spastorino from users_on_vacation)
r? `@ghost`
`@rustbot` modify labels: rollup
Don't ICE when deducing future output if other errors already occurred
The situation can't really happen outside of erroneous code. What was interesting is that it ICEd before emitting any other diagnostics. This was because the other errors were silenced due to cycle_delay_bug cycle errors.
r? ```@compiler-errors```
fixes#119890
Don't create a separate "basename" when naming and opening a MIR dump file
These functions were split up by #77080, in order to support passing the dump file's “basename” (filename without extension) to the implementation of `-Zdump-mir-spanview`, so that it could be used as a page title.
That flag has since been removed (#119566), so now there's no particular reason for this code to handle the basename separately from the filename or full path.
This PR therefore restores things to (roughly) how they were before #77080.
Move async closure parameters into the resultant closure's future eagerly
Move async closure parameters into the closure's resultant future eagerly.
Before, we used to desugar `async |p1, p2, ..| { body }` as `|p1, p2, ..| { || async { body } }`. Now, we desugar the above like `|p1, p2, ..| { async move { let p1 = p1; let p2 = p2; ... body } }`. This mirrors the same desugaring that `async fn` does with its parameter types, and the compiler literally uses the same code via a shared helper function.
This removes the necessity for E0708, since now expressions like `async |x: i32| { x }` will not give you confusing borrow errors.
This does *not* fix the case where async closures have self-borrows. This will come with a general implementation of async closures, which is still in the works.
r? oli-obk
Make tcx optional from StableMIR run macro and extend it to accept closures
Change `run` macro to avoid sometimes unnecessary dependency on `TyCtxt`, and introduce `run_with_tcx` to capture use cases where `tcx` is required. Additionally, extend both macros to accept closures that may capture variables.
I've also modified the `internal()` method to make it safer, by accepting the type context to force the `'tcx` lifetime to match the context lifetime.
These are non-backward compatible changes, but they only affect internal APIs which are provided today as helper functions until we have a stable API to start the compiler.
Detect `NulInCStr` error earlier.
By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars.
NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together.
One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.
r? ```@fee1-dead```
Add way to express that no values are expected with check-cfg
This PR adds way to express no-values (no values expected) with `--check-cfg` by making empty `values()` no longer mean `values(none())` (internal: `&[None]`) and now be an empty list (internal: `&[]`).
### Context
Currently `--check-cfg` has a way to express that _any value is expected_ with `values(any())`, but has no way to do the inverse and say that _no value is expected_.
This would be particularly useful for build systems that control a config name and it's values as they could always declare a config name as expected and if in the current state they have values pass them and if not pass an empty list.
To give a more concrete example, Cargo `--check-cfg` currently needs to generate:
- `--check-cfg=cfg(feature, values(...))` for the case with declared features
- and `--check-cfg=cfg()` for the case without any features declared
This means that when there are no features declared, users will get an `unexpected config name` but from the point of view of Cargo the config name `feature` is expected, it's just that for now there aren't any values for it.
See [Cargo `check_cfg_args` function](92395d9010/src/cargo/core/compiler/mod.rs (L1263-L1281)) for more details.
### De-specializing *empty* `values()`
To solve this issue I propose that we "de-specialize" empty `values()` to no longer mean `values(none())` but to actually mean empty set/list. This is one of the last source of confusion for my-self and others with the `--check-cfg` syntax.
> The confusing part here is that an empty `values()` currently means the same as `values(none())`, i.e. an expected list of values with the _none_ variant (as in `#[cfg(name)]` where the value is none) instead of meaning an empty set.
Before the new `cfg()` syntax, defining the _none_ variant was only possible under certain circumstances, so in https://github.com/rust-lang/rust/pull/111068 I decided to make `values()` to mean the _none_ variant, but it is no longer necessary since https://github.com/rust-lang/rust/pull/119473 which introduced the `none()` syntax.
A simplified representation of the proposed "de-specialization" would be:
| Syntax | List/set of expected values |
|-----------------------------------------|-----------------------------|
| `cfg(name)`/`cfg(name, values(none()))` | `&[None]` |
| `cfg(name, values())` | `&[]` |
Note that I have my-self made the mistake of using an empty `values()` as meaning empty set, see https://github.com/rust-lang/cargo/pull/13011.
`@rustbot` label +F-check-cfg
r? `@petrochenkov`
cc `@epage`
Rework how diagnostic lints are stored.
`Diagnostic::code` has the type `DiagnosticId`, which has `Error` and
`Lint` variants. Plus `Diagnostic::is_lint` is a bool, which should be
redundant w.r.t. `Diagnostic::code`.
Seems simple. Except it's possible for a lint to have an error code, in
which case its `code` field is recorded as `Error`, and `is_lint` is
required to indicate that it's a lint. This is what happens with
`derive(LintDiagnostic)` lints. Which means those lints don't have a
lint name or a `has_future_breakage` field because those are stored in
the `DiagnosticId::Lint`.
It's all a bit messy and confused and seems unintentional.
This commit:
- removes `DiagnosticId`;
- changes `Diagnostic::code` to `Option<String>`, which means both
errors and lints can straightforwardly have an error code;
- changes `Diagnostic::is_lint` to `Option<IsLint>`, where `IsLint` is a
new type containing a lint name and a `has_future_breakage` bool, so
all lints can have those, error code or not.
r? `@oli-obk`
Cache local DefId-keyed queries without hashing
This caches local DefId-keyed queries using just an IndexVec. This costs ~5% extra max-rss at most but brings significant runtime improvement, up to 13% cycle counts (mean: 4%) on primary benchmarks. It's possible that further tweaks could reduce the memory overhead further but this win seems worth landing despite the increased memory, particularly with regards to eliminating the present set in non-incr or storing it inline (skip list?) with the main data.
We tried applying this scheme to all keys in the [first perf run] but found that it carried a significant memory hit (50%). instructions/cycle counts were also much more mixed, though that may have been due to the lack of the present set optimization (needed for fast iter() calls in incremental scenarios).
Closes https://github.com/rust-lang/rust/issues/45275
[first perf run]: https://perf.rust-lang.org/compare.html?start=30dfb9e046aeb878db04332c74de76e52fb7db10&end=6235575300d8e6e2cc6f449cb9048722ef43f9c7&stat=instructions:u
Make sure to instantiate placeholders correctly in old solver
When creating the query substitution guess for an input placeholder type like `!1_T` (in universe 1), we were guessing the response substitution with something like `!0_T`. This failed to unify with `!1_T`, causing an ICE.
This PR reworks the query substitution guess code to work a bit more like the new solver. I'm *pretty* sure this is correct, though I'd really appreciate some scrutiny from someone (*cough* lcnr) who knows a bit more about query instantiation :)
Fixes#119941
r? lcnr
Sandwich MIR optimizations between DSE.
This PR reorders MIR optimization passes in an attempt to increase their efficiency.
- Stop running CopyProp before GVN, it's useless as GVN will do the same thing anyway. Instead, we perform CopyProp at the end of the pipeline, to ensure we do not emit copy/move chains.
- Run DSE before GVN, as it increases the probability to have single-assignment locals.
- Run DSE after the final CopyProp to turn copies into moves.
r? `@ghost`
Avoid some redundant work in GVN
The first 2 commits are about reducing the perf effect.
Third commit avoids doing redundant work: is a local is SSA, it already has been simplified, and the resulting value is in `self.locals`. No need to call any code on it.
The last commit avoids removing some storage statements.
r? wg-mir-opt
never patterns: Check bindings wrt never patterns
Never patterns:
- Shouldn't contain bindings since they never match anything;
- Don't count when checking that or-patterns have consistent bindings.
r? `@compiler-errors`
Fix unused_parens issue when cast is followed LT
Fixes#117142
The original check only checks `a as (i32) < 0`, this fix extends it to handle `b + a as (i32) < 0`.
A better way is maybe we suggest `(a as i32) < 0` instead of suppressing the warning, maybe following PR could improve it.
next solver: provisional cache
this adds the cache removed in #115843. However, it should now correctly track whether a provisional result depends on an inductive or coinductive stack.
While working on this, I was using the following doc: https://hackmd.io/VsQPjW3wSTGUSlmgwrDKOA. I don't think it's too helpful to understanding this, but am somewhat hopeful that the inline comments are more useful.
There are quite a few future perf improvements here. Given that this is already very involved I don't believe it is worth it (for now). While working on this PR one of my few attempts to significantly improve perf ended up being unsound again because I was not careful enough ✨
r? `@compiler-errors`
By making it an `EscapeError` instead of a `LitError`. This makes it
like the other errors produced when checking string literals contents,
e.g. for invalid escape sequences or bare CR chars.
NOTE: this means these errors are issued earlier, before expansion,
which changes behaviour. It will be possible to move the check back to
the later point if desired. If that happens, it's likely that all the
string literal contents checks will be delayed together.
One nice thing about this: the old approach had some code in
`report_lit_error` to calculate the span of the nul char from a range.
This code used a hardwired `+2` to account for the `c"` at the start of
a C string literal, but this should have changed to a `+3` for raw C
string literals to account for the `cr"`, which meant that the caret in
`cr"` nul error messages was one short of where it should have been. The
new approach doesn't need any of this and avoids the off-by-one error.
libtest: Fix padding of benchmarks run as tests
### Summary
The first commit adds regression tests for libtest padding.
The second commit fixes padding for benches run as tests and updates the blessed output of the regression tests to make it clear what effect the fix has on padding.
Closes#104092 which is **E-help-wanted** and **regression-from-stable-to-stable**
### More details
Before this fix we applied padding _before_ manually doing what `convert_benchmarks_to_tests()` does which affects padding calculations. Instead use `convert_benchmarks_to_tests()` first if applicable and then apply padding afterwards so it becomes correct.
Benches should only be padded when run as benches to make it easy to compare the benchmark numbers. Not when run as tests.
r? `@ghost` until CI passes.
internal: Consider all kinds of explicit private imports in find_path
Builds on top of https://github.com/rust-lang/rust-analyzer/pull/16265 to make things a bit more general, now we consider all explicit private imports.
bump bootstrap dependencies
This PR removes hard-coded patch versions, updates bootstrap's dependency stack to recent versions (some of the versions were released 3-4 years ago), and removes a few dependencies from bootstrap.
Removed dependencies:
![image](https://github.com/rust-lang/rust/assets/39852038/95e86325-aea0-4055-bee5-245c144f662e)
fix: Acknowledge `pub(crate)` imports in import suggestions
rust-analyzer has logic that discounts suggesting `use`s for private imports, but that logic is unnecessarily strict - for instance given this code:
```rust
mod foo {
pub struct Foo;
}
pub(crate) use self::foo::*;
mod bar {
fn main() {
Foo$0;
}
}
```
... RA will suggest to add `use crate::foo::Foo;`, which not only makes the code overly verbose (especially in larger code bases), but also is disjoint with what rustc itself suggests.
This commit adjusts the logic, so that `pub(crate)` imports are taken into account when generating the suggestions; considering rustc's behavior, I think this change doesn't warrant any extra configuration flag.
Note that this is my first commit to RA, so I guess the approach taken here might be suboptimal - certainly feels somewhat hacky, maybe there's some better way of finding out the optimal import path 😅
Reorder early post-inlining passes.
`RemoveZsts`, `RemoveUnneededDrops` and `UninhabitedEnumBranching` only depend on types, so they should be executed together early after MIR inlining introduces those types.
This does not change the end-result, but this makes the pipeline a bit more consistent.
Exhaustiveness: use an `Option` instead of allocating fictitious patterns
In the process of exhaustiveness checking, `Matrix` stores a 2D array of patterns. Those are subpatterns of the patterns we were provided as input, _except_ sometimes we allocate some extra wildcard patterns to fill a hole during specialization.
Morally though, we could store `Option<&'p DeconstructedPat>` in the matrix, where `None` signifies a wildcard. That way we'd only have "real" patterns in the matrix and we wouldn't need the arena to allocate these wildcards. This is what this PR does.
This is part of me splitting up https://github.com/rust-lang/rust/pull/119581 for ease of review.
r? `@compiler-errors`
rust-analyzer has logic that discounts suggesting `use`s for private
imports, but that logic is unnecessarily strict - for instance given
this code:
```rust
mod foo {
pub struct Foo;
}
pub(crate) use self::foo::*;
mod bar {
fn main() {
Foo$0;
}
}
```
... RA will suggest to add `use crate::foo::Foo;`, which not only makes
the code overly verbose (especially in larger code bases), but also is
disjoint with what rustc itself suggests.
This commit adjusts the logic, so that `pub(crate)` imports are taken
into account when generating the suggestions; considering rustc's
behavior, I think this change doesn't warrant any extra configuration
flag.
Note that this is my first commit to RA, so I guess the approach taken
here might be suboptimal - certainly feels somewhat hacky, maybe there's
some better way of finding out the optimal import path 😅
Replace SourceRootCrates hashset output with slice for deterministic order
We only iterate over the result, and its pretty small in general so no point for the `HashSet` (additionally this way we get a more defined iteration order).
minor: Mark unresolved associated item diagnostic as experimental
Per #16327 unresolved associated item has false positives. Mark the diagnostic as experimental until this is more dependable.
Preserve comments for extracted block expression in 'extract_function'
Fix#14371
Preserve comments for extracted block expression in 'extract_function'.
In the original implementation, `block.statements()` was used to construct a new function, removing the comments within the block. In the updated implementation, we use manual traversal of nodes and `hacky_block_expr` to generate a new block, thereby preserving the comments.