Use edition of `macro_rules` when compiling the macro
This changes the edition assigned to a macro_rules macro when it is compiled to use the edition of where the macro came from instead of the local crate's edition.
This fixes a problem when a macro_rules macro is created by a proc-macro. Previously that macro would be tagged with the local edition, which would cause problems with using the correct edition behavior inside the macro. For example, the check for unsafe attributes would cause errors in 2024 when using proc-macros from older editions.
This is partially related to https://github.com/rust-lang/rust/issues/132906. Unfortunately this is only a half fix for that issue. It fixes the error that happens in 2024, but does not fix the lint firing in 2021. I'm still trying to think of some way to fix that, but I'm running low on ideas.
Reduce integer `Display` implementation size
I was thinking about #128204 and how we could reduce the size of the code and just realized that we didn't need the `_fmt` method to be implemented on signed integers, which in turns allow to simplify greatly the macro call.
r? `@workingjubilee`
rustc: Fail fast when compiling a source file larger than 4 GiB
Currently if you try to compile a file that is larger than 4 GiB, `rustc` will first read the whole into memory before failing.
If we can read the metadata of the file, we can fail before reading the file.
Add `AsyncFn*` to the prelude in all editions
The general vibe is that we will most likely stabilize the `feature(async_closure)` *without* the `async Fn()` trait bound modifier.
Without `async Fn()` bound syntax, this necessitates users to spell the bound like `AsyncFn()`. Since `core::ops::AsyncFn` is not in the prelude, users will need to import these any time they actually want to use the trait. This seems annoying, so let's add these traits to the prelude unstably.
We're trying to work on the general vision of `async` trait bound modifier in general in: https://github.com/rust-lang/rfcs/pull/3710, however that RFC still needs more time for consensus to converge, and we've decided that the value that users get from calling the bound `async Fn()` is *not really* worth blocking landing async closures in general.
btree: don't leak value if destructor of key panics
This PR fixes a regression from https://github.com/rust-lang/rust/pull/84904.
The `BTreeMap` already attempts to handle panicking destructors of the key-value pairs by continuing to execute the remaining destructors after one destructor panicked. However, after #84904 the destructor of a value in a key-value pair gets skipped if the destructor of the key panics, only continuing with the next key-value pair. This PR reverts to the behavior before #84904 to also drop the corresponding value if the destructor of a key panics.
This avoids potential memory leaks and can fix the soundness of programs that rely on the destructors being executed (even though this should not be relied upon, because the std collections currently do not guarantee that the remaining elements are dropped after a panic in a destructor).
cc `@Amanieu` because you had opinions on panicking destructors
Stabilize `Ipv6Addr::is_unique_local` and `Ipv6Addr::is_unicast_link_local`
Make `Ipv6Addr::is_unique_local` and `Ipv6Addr::is_unicast_link_local` stable (+const).
Newly stable API:
```rust
impl Ipv6Addr {
// Newly stable under `ipv6_is_unique_local`
const fn is_unique_local(&self) -> bool;
// Newly stable under `ipv6_is_unique_local`
const fn is_unicast_link_local(&self) -> bool;
}
```
These stabilise a subset of the following tracking issue:
- #27709
I have looked and could not find any issues with `is_unique_local` and `is_unicast_link_local`. There is a well received comment calling for stabilisation of the latter function.
Both functions are well defined and consistent with implementations in other languages:
- [Go](https://cs.opensource.google/go/go/+/refs/tags/go1.23.0:src/net/netip/netip.go;l=518)
- [Python](e9d1bf353c/Lib/ipaddress.py (L2319-L2321))
- [Ruby (unique local)](https://ruby-doc.org/stdlib-2.5.1/libdoc/ipaddr/rdoc/IPAddr.html#private-3F-source)
- [Ruby (unicast link local)](https://ruby-doc.org/stdlib-2.5.1/libdoc/ipaddr/rdoc/IPAddr.html#link_local-3F-source)
cc implementor `@little-dude`
(I can't find the original PR for `is_unqiue_local`)
r? libs-api
`@rustbot` label +T-libs-api +needs-fcp
Update cargo
5 commits in 69e595908e2c420e7f0d1be34e6c5b984c8cfb84..66221abdeca2002d318fde6efff516aab091df0e
2024-11-16 01:26:11 +0000 to 2024-11-19 21:30:02 +0000
- Docs for optional registry JSON fields (rust-lang/cargo#14839)
- Allow registries to omit empty/default fields in JSON (rust-lang/cargo#14838)
- docs(unstable): Link to -Zwarnings issue, tracking issue (rust-lang/cargo#14836)
- fix(): error context for git_fetch refspec not found (rust-lang/cargo#14806)
- you we distinction (rust-lang/cargo#14829)
Drop debug info instead of panicking if we exceed LLVM's capability to represent it
Recapping a bit of history here:
In #128861 I made debug info correctly represent parameters to inline functions by removing a fake lexical block that had been inserted to suppress LLVM assertions and by deduplicating those parameters.
LLVM, however, expects to see a single parameter _with distinct locations_, particularly distinct inlinedAt values on the DILocations. This generally worked because no matter how deep the chain of inlines it takes two different call sites in the original function to result in the same function being present multiple times, and a function call requires a non-zero number of characters, but macros threw a wrench in that in #131944. At the time I thought the issue there was limited to proc-macros, where an arbitrary amount of code can be generated at a single point in the source text.
In #132613 I added discriminators to DILocations that would otherwise be the same to repair #131944[^1]. This works, but LLVM's capacity for discriminators is not infinite (LLVM actually only allocates 12 bits for this internally). At the time I thought it would be very rare for anyone to hit the limit, but #132900 proved me wrong. In the relatively-minimized test case it also became clear to me that the issue affects regular macros too, because the call to the inlined function will (without collapse_debuginfo on the macro) be attributed to the (repeated, if the macro is used more than once) textual callsite in the macro definition.
This PR fixes the panic by dropping debug info when we exceed LLVM's maximum discriminator value. There's also a preceding commit for a related but distinct issue: macros that use collapse_debuginfo should in fact have their inlinedAts collapsed to the macro callsite and thus not need discriminators at all (and not panic/warn accordingly when the discriminator limit is exhausted).
Fixes#132900
r? `@jieyouxu`
[^1]: Editor's note: `fix` is a magic keyword in PR description that apparently will close the linked issue (it's closed already in this case, but still).
Default-enable `llvm_tools_enabled` when no `config.toml` is present
Fixes#133195. cc `@wesleywiser` could you double check if with this patch and no `config.toml` that you can run `./x test tests/ui --stage 1`?
`llvm-objcopy` is usually required by cg_ssa on macOS to workaround bad `strip`s.
cc `@bjorn3` I hope this doesn't break cg_clif...
r? bootstrap
Add `visit` methods to ast nodes that already have `walk`s on ast visitors
Some `walk` functions are called directly, because there were no correspondent visit functions.
related to #128974 & #127615
r? `@petrochenkov`
Add vec_deque::Iter::as_slices and friends
Add the following methods, that work similarly to VecDeque::as_slices:
- alloc::collections::vec_deque::Iter::as_slices
- alloc::collections::vec_deque::IterMut::into_slices
- alloc::collections::vec_deque::IterMut::as_slices
- alloc::collections::vec_deque::IterMut::as_mut_slices
Obtaining slices from a VecDeque iterator was not previously possible.
It was added in #123752 to handle some cases involving emoji, but it
isn't necessary because it's always treated the same as
`TokenKind::InvalidIdent`. This commit removes it, which makes things a
little simpler.
Improve VecCache under parallel frontend
This replaces the single Vec allocation with a series of progressively larger buckets. With the cfg for parallel enabled but with -Zthreads=1, this looks like a slight regression in i-count and cycle counts (~1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5% wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based approach, likely due to reducing the bouncing of the cache line holding the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319 seconds).
try-job: i686-gnu-nopt
try-job: dist-x86_64-linux
Use `TypingMode` throughout the compiler instead of `ParamEnv`
Hopefully the biggest single PR as part of https://github.com/rust-lang/types-team/issues/128.
## `infcx.typing_env` while defining opaque types
I don't know how'll be able to correctly handle opaque types when using something taking a `TypingEnv` while defining opaque types. To correctly handle the opaques we need to be able to pass in the current `opaque_type_storage` and return constraints, i.e. we need to use a proper canonical query. We should migrate all the queries used during HIR typeck and borrowck where this matters to proper canonical queries. This is
## `layout_of` and `Reveal::All`
We convert the `ParamEnv` to `Reveal::All` right at the start of the `layout_of` query, so I've changed callers of `layout_of` to already use a post analysis `TypingEnv` when encountering it.
ca87b535a0/compiler/rustc_ty_utils/src/layout.rs (L51)
## `Ty::is_[unpin|sized|whatever]`
I haven't migrated `fn is_item_raw` to use `TypingEnv`, will do so in a followup PR, this should significantly reduce the amount of `typing_env.param_env`. At some point there will probably be zero such uses as using the type system while ignoring the `typing_mode` is incorrect.
## `MirPhase` and phase-transitions
When inside of a MIR-body, we can mostly use its `MirPhase` to figure out the right `typing_mode`. This does not work during phase transitions, most notably when transitioning from `Analysis` to `Runtime`:
dae7ac133b/compiler/rustc_mir_transform/src/lib.rs (L606-L625)
All these passes still run with `MirPhase::Analysis`, but we should only use `Reveal::All` once we're run the `RevealAll` pass. This required me to manually construct the right `TypingEnv` in all these passes. Given that it feels somewhat easy to accidentally miss this going forward, I would maybe like to change `Body::phase` to an `Option` and replace it at the start of phase transitions. This then makes it clear that the MIR is currently in a weird state.
r? `@ghost`
Rwlock downgrade
Tracking Issue: #128203
This PR adds a `downgrade` method for `RwLock` / `RwLockWriteGuard` on all currently supported platforms.
Outstanding questions:
- [x] ~~Does the `futex.rs` change affect performance at all? It doesn't seem like it will but we can't be certain until we bench it...~~
- [x] ~~Should the SOLID platform implementation [be ported over](https://github.com/rust-lang/rust/pull/128219#discussion_r1693470090) to the `queue.rs` implementation to allow it to support downgrades?~~
Liberate `aarch64-gnu-debug` from the shackles of `--test-args=clang`
### Changes
- Drop `--test-args=clang` from `aarch64-gnu-debug` so run-make tests that are `//@ needs-force-clang-based-tests` no longer only run if their test name contains `clang` (which is a very cool footgun).
- Reorganize run-make-suport library slightly to accommodate a raw gcc invocation.
- Fix `tests/run-make/mte-ffi/rmake.rs` to use `gcc` instead of *a* c compiler.
try-job: aarch64-gnu
try-job: aarch64-gnu-debug
improve codegen of fmt_num to delete unreachable panic
it seems LLVM doesn't realize that `curr` is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later `&buf[curr..]` generates a check for out-of-bounds access and panic. this is unreachable in reality as even for `x == T::zero()` we'll produce at least the character `Self::digit(T::zero())`, yielding at least one character output, and `curr` will always be at least one below `buf.len()`.
adjust `fmt_int` to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers.
in the program i'd noticed this in, you can see the `cmp $0x80,%rdi; ja 7c` here, which branches to a slice index fail helper:
<img width="660" alt="before" src="https://github.com/rust-lang/rust/assets/4615790/ac482d54-21f8-494b-9c83-4beadc3ca0ef">
where after this change the function is broadly similar, but smaller, with one fewer registers updated in each pass through the loop in addition the never-taken `cmp/ja` being gone:
<img width="646" alt="after" src="https://github.com/rust-lang/rust/assets/4615790/1bee1d76-b674-43ec-9b21-4587364563aa">
this represents a ~2-3% difference in runtime in my [admittedly comically i32-formatting-bound](https://github.com/athre0z/disas-bench/blob/master/bench/yaxpeax/src/main.rs#L58-L67) use case (printing x86 instructions, including i32 displacements and immediates) as measured on a ryzen 9 3950x.
the impact on `<impl LowerHex for i8>::fmt` is both more dramatic and less impactful: it continues to have a loop that is evaluated at most twice, though the compiler doesn't know that to unroll it. the generated code there is identical to the impl for `i32`. there, the smaller loop body has less effect on runtime, and removing the never-taken slice bounds check is offset by whatever address recalculation is happening with the `lea/add/neg` at the end of the loop. it behaves about the same before and after.
---
i initially measured slightly better outcomes using `unreachable_unchecked()` here instead, but that was hacking on std and rebuilding with `-Z build-std` on an older rustc (nightly 5b377cece, 2023-06-30). it does not yield better outcomes now, so i see no reason to proceed with that approach at all.
<details>
<summary>initial notes about that, seemingly irrelevant on modern rustc</summary>
i went through a few tries at getting llvm to understand the bounds check isn't necessary, but i should mention the _best_ i'd seen here was actually from the existing `fmt_int` with a diff like
```diff
if x == zero {
// No more digits left to accumulate.
break;
};
}
}
+
+ if curr >= buf.len() {
+ unsafe { core::hint::unreachable_unchecked(); }
+ }
let buf = &buf[curr..];
```
posting a random PR to `rust-lang/rust` to do that without a really really compelling reason seemed a bit absurd, so i tried to work that into something that seems more palatable at a glance. but if you're interested, that certainly produced better (x86_64) code through LLVM. in that case with `buf.iter_mut().rev()` as the iterator, `<impl LowerHex for i8>::fmt` actually unrolls into something like
```
put_char(x & 0xf);
let mut len = 1;
if x > 0xf {
put_char((x >> 4) & 0xf);
len = 2;
}
pad_integral(buf[buf.len() - len..]);
```
it's pretty cool! `<impl LowerHex for i32>::fmt` also was slightly better. that all resulted in closer to an 6% difference in my use case.
</details>
---
i have not looked at formatters other than LowerHex/UpperHex with this change, though i'd be a bit shocked if any were _worse_.
(i have absolutely _no_ idea how you'd regression test this, but that might be just my not knowing what the right tool for that would be in rust-lang/rust. i'm of half a mind that this is small and fiddly enough to not be worth landing lest it quietly regress in the future anyway. but i didn't want to discard the idea without at least offering it upstream here)
[perf] rustdoc: Perform less work when cleaning middle::ty parenthesized generic args
CC #132697. I presume the perf regression it caused (if real) boils down to query invocation overhead, namely of `def_kind` & `trait_def` as we don't seem to be decoding more often from the crate metadata.
I won't try the obvious and reduce the amount of query calls by threading information via params as that would render the code awkward.
So instead I'm simply trying to attack some low-hanging fruits in the vicinity.
---
Previously, we would `clean_middle_generic_args` *unconditionally* inside `clean_middle_generic_args_with_constraints` even though we didn't actually use its result for parenthesized generic args (`Trait(...) -> ...`).
Now, we only call `clean_middle_generic_args` when necessary. Lastly, I've simplified `clean_middle_generic_args_with_constraints`.
---
r? ghost
Delete the `cfg(not(parallel))` serial compiler
Since it's inception a long time ago, the parallel compiler and its cfgs have been a maintenance burden. This was a necessary evil the allow iteration while not degrading performance because of synchronization overhead.
But this time is over. Thanks to the amazing work by the parallel working group (and the dyn sync crimes), the parallel compiler has now been fast enough to be shipped by default in nightly for quite a while now.
Stable and beta have still been on the serial compiler, because they can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has served us well in the years since it was split from the parallel one, but it's over now.
Let the knight slay one head of the two-headed dragon!
#113349
Note that the default is still 1 thread, as more than 1 thread is still fairly broken.
cc `@onur-ozkan` to see if i did the bootstrap field removal correctly, `@SparrowLii` on the sync parts
move all mono-time checks into their own folder, and their own query
The mono item collector currently also drives two mono-time checks: the lint for "large moves", and the check whether function calls are done with all the required target features.
Instead of doing this "inside" the collector, this PR refactors things so that we have a new `rustc_monomorphize::mono_checks` module providing a per-instance query that does these checks. We already have a per-instance query for the ABI checks, so this should be "free" for incremental builds. Non-incremental builds might do a bit more work now since we now have two separate MIR visits (in the collector and the mono-time checks) -- but one of them is cached in case the MIR doesn't change, which is nice.
This slightly changes behavior of the large-move check since the "move_size_spans" deduplication logic now only works per-instance, not globally across the entire collector.
Cc `@saethlin` since you're also doing some work related to queries and caching and monomorphization, though I don't know if there's any interaction here.