mirror of
https://github.com/bevyengine/bevy
synced 2024-11-10 07:04:33 +00:00
713b1d8fa4
(This is my first PR here, so I've probably missed some things. Please let me know what else I should do to help you as a reviewer!) # Objective Due to https://github.com/rust-lang/rust/issues/117800, the `derive`'d `PartialEq::eq` on `Entity` isn't as good as it could be. Since that's used in hashtable lookup, let's improve it. ## Solution The derived `PartialEq::eq` short-circuits if the generation doesn't match. However, having a branch there is sub-optimal, especially on 64-bit systems like x64 that could just load the whole `Entity` in one load anyway. Due to complications around `poison` in LLVM and the exact details of what unsafe code is allowed to do with reference in Rust (https://github.com/rust-lang/unsafe-code-guidelines/issues/346), LLVM isn't allowed to completely remove the short-circuiting. `&Entity` is marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8 bytes -- and does so -- but it has to assume that the `index` might be undef/poison if the `generation` doesn't match, and thus while it finds a way to do it without needing a branch, it has to do something slightly more complicated than optimal to combine the results. (LLVM is allowed to change non-short-circuiting code to use branches, but not the other way around.) Here's a link showing the codegen today: <https://rust.godbolt.org/z/9WzjxrY7c> ```rust #[no_mangle] pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool { a == b } ``` ends up generating the following assembly: ```asm demo_eq_ref: movq xmm0, qword ptr [rdi] movq xmm1, qword ptr [rsi] pcmpeqd xmm1, xmm0 pshufd xmm0, xmm1, 80 movmskpd eax, xmm0 cmp eax, 3 sete al ret ``` (It's usually not this bad in real uses after inlining and LTO, but it makes a strong demo.) This PR manually implements `PartialEq::eq` *without* short-circuiting, and because that tells LLVM that neither the generations nor the index can be poison, it doesn't need to be so careful and can generate the "just compare the two 64-bit values" code you'd have probably already expected: ```asm demo_eq_ref: mov rax, qword ptr [rsi] cmp qword ptr [rdi], rax sete al ret ``` Since this doesn't change the representation of `Entity`, if it's instead passed by *value*, then each `Entity` is two `u32` registers, and the old and the new code do exactly the same thing. (Other approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect this case.) This should hopefully merge easily with changes like https://github.com/bevyengine/bevy/pull/9907 that also want to change `Entity`. ## Benchmarks I'm not super-confident that I got my machine fully consistent for benchmarking, but whether I run the old or the new one first I get reasonably consistent results. Here's a fairly typical example of the benchmarks I added in this PR: ![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1) Building the sets seems to be basically the same. It's usually reported as noise, but sometimes I see a few percent slower or faster. But lookup hits in particular -- since a hit checks that the key is equal -- consistently shows around 10% improvement. `cargo run --example many_cubes --features bevy/trace_tracy --release -- --benchmark` showed as slightly faster with this change, though if I had to bet I'd probably say it's more noise than meaningful (but at least it's not worse either): ![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0) This is my first PR here -- and my first time running Tracy -- so please let me know what else I should run, or run things on your own more reliable machines to double-check. --- ## Changelog (probably not worth including) Changed: micro-optimized `Entity::eq` to help LLVM slightly. ## Migration Guide (I really hope nobody was using this on uninitialized entities where sufficiently tortured `unsafe` could could technically notice that this has changed.) |
||
---|---|---|
.. | ||
benches | ||
Cargo.toml |