bevy/benches/Cargo.toml

68 lines
1.7 KiB
TOML
Raw Normal View History

[package]
name = "benches"
edition = "2021"
description = "Benchmarks that test Bevy's performance"
publish = false
license = "MIT OR Apache-2.0"
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
# Do not automatically discover benchmarks, we specify them manually instead.
autobenches = false
[dev-dependencies]
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
# Bevy crates
bevy_app = { path = "../crates/bevy_app" }
bevy_ecs = { path = "../crates/bevy_ecs", features = ["multi_threaded"] }
Minimal Bubbling Observers (#13991) # Objective Add basic bubbling to observers, modeled off `bevy_eventlistener`. ## Solution - Introduce a new `Traversal` trait for components which point to other entities. - Provide a default `TraverseNone: Traversal` component which cannot be constructed. - Implement `Traversal` for `Parent`. - The `Event` trait now has an associated `Traversal` which defaults to `TraverseNone`. - Added a field `bubbling: &mut bool` to `Trigger` which can be used to instruct the runner to bubble the event to the entity specified by the event's traversal type. - Added an associated constant `SHOULD_BUBBLE` to `Event` which configures the default bubbling state. - Added logic to wire this all up correctly. Introducing the new associated information directly on `Event` (instead of a new `BubblingEvent` trait) lets us dispatch both bubbling and non-bubbling events through the same api. ## Testing I have added several unit tests to cover the common bugs I identified during development. Running the unit tests should be enough to validate correctness. The changes effect unsafe portions of the code, but should not change any of the safety assertions. ## Changelog Observers can now bubble up the entity hierarchy! To create a bubbling event, change your `Derive(Event)` to something like the following: ```rust #[derive(Component)] struct MyEvent; impl Event for MyEvent { type Traverse = Parent; // This event will propagate up from child to parent. const AUTO_PROPAGATE: bool = true; // This event will propagate by default. } ``` You can dispatch a bubbling event using the normal `world.trigger_targets(MyEvent, entity)`. Halting an event mid-bubble can be done using `trigger.propagate(false)`. Events with `AUTO_PROPAGATE = false` will not propagate by default, but you can enable it using `trigger.propagate(true)`. If there are multiple observers attached to a target, they will all be triggered by bubbling. They all share a bubbling state, which can be accessed mutably using `trigger.propagation_mut()` (`trigger.propagate` is just sugar for this). You can choose to implement `Traversal` for your own types, if you want to bubble along a different structure than provided by `bevy_hierarchy`. Implementers must be careful never to produce loops, because this will cause bevy to hang. ## Migration Guide + Manual implementations of `Event` should add associated type `Traverse = TraverseNone` and associated constant `AUTO_PROPAGATE = false`; + `Trigger::new` has new field `propagation: &mut Propagation` which provides the bubbling state. + `ObserverRunner` now takes the same `&mut Propagation` as a final parameter. --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2024-07-15 13:39:41 +00:00
bevy_hierarchy = { path = "../crates/bevy_hierarchy" }
bevy_math = { path = "../crates/bevy_math" }
bevy_picking = { path = "../crates/bevy_picking", features = [
"bevy_mesh_picking_backend",
] }
bevy_reflect = { path = "../crates/bevy_reflect", features = ["functions"] }
Minimal Bubbling Observers (#13991) # Objective Add basic bubbling to observers, modeled off `bevy_eventlistener`. ## Solution - Introduce a new `Traversal` trait for components which point to other entities. - Provide a default `TraverseNone: Traversal` component which cannot be constructed. - Implement `Traversal` for `Parent`. - The `Event` trait now has an associated `Traversal` which defaults to `TraverseNone`. - Added a field `bubbling: &mut bool` to `Trigger` which can be used to instruct the runner to bubble the event to the entity specified by the event's traversal type. - Added an associated constant `SHOULD_BUBBLE` to `Event` which configures the default bubbling state. - Added logic to wire this all up correctly. Introducing the new associated information directly on `Event` (instead of a new `BubblingEvent` trait) lets us dispatch both bubbling and non-bubbling events through the same api. ## Testing I have added several unit tests to cover the common bugs I identified during development. Running the unit tests should be enough to validate correctness. The changes effect unsafe portions of the code, but should not change any of the safety assertions. ## Changelog Observers can now bubble up the entity hierarchy! To create a bubbling event, change your `Derive(Event)` to something like the following: ```rust #[derive(Component)] struct MyEvent; impl Event for MyEvent { type Traverse = Parent; // This event will propagate up from child to parent. const AUTO_PROPAGATE: bool = true; // This event will propagate by default. } ``` You can dispatch a bubbling event using the normal `world.trigger_targets(MyEvent, entity)`. Halting an event mid-bubble can be done using `trigger.propagate(false)`. Events with `AUTO_PROPAGATE = false` will not propagate by default, but you can enable it using `trigger.propagate(true)`. If there are multiple observers attached to a target, they will all be triggered by bubbling. They all share a bubbling state, which can be accessed mutably using `trigger.propagation_mut()` (`trigger.propagate` is just sugar for this). You can choose to implement `Traversal` for your own types, if you want to bubble along a different structure than provided by `bevy_hierarchy`. Implementers must be careful never to produce loops, because this will cause bevy to hang. ## Migration Guide + Manual implementations of `Event` should add associated type `Traverse = TraverseNone` and associated constant `AUTO_PROPAGATE = false`; + `Trigger::new` has new field `propagation: &mut Propagation` which provides the bubbling state. + `ObserverRunner` now takes the same `&mut Propagation` as a final parameter. --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2024-07-15 13:39:41 +00:00
bevy_render = { path = "../crates/bevy_render" }
bevy_tasks = { path = "../crates/bevy_tasks" }
bevy_utils = { path = "../crates/bevy_utils" }
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
# Other crates
criterion = { version = "0.5.1", features = ["html_reports"] }
glam = "0.29"
rand = "0.8"
rand_chacha = "0.3"
# Make `bevy_render` compile on Linux with x11 windowing. x11 vs. Wayland does not matter here
# because the benches do not actually open any windows.
[target.'cfg(target_os = "linux")'.dev-dependencies]
bevy_winit = { path = "../crates/bevy_winit", features = ["x11"] }
Remove unnecesary branches/panics from Query accesses (#6461) # Objective Supercedes #6452. Upon inspection of the [generated assembly](https://gist.github.com/james7132/c2740c6941b80d7912f1e8888e223cbb#file-original-s) of a [simple Bevy binary](https://gist.github.com/james7132/c2740c6941b80d7912f1e8888e223cbb#file-source-rs) compiled with `cargo rustc --release -- --emit asm`, it's apparent that there are multiple unnecessary branches in the generated assembly: ```assembly .LBB5_5: cmpq %r10, %r11 je .LBB5_15 movq (%r11), %rcx movq 328(%r15), %rdx cmpq %rdx, %rcx jae .LBB5_14 movq 312(%r15), %rdi leaq (%rcx,%rcx,2), %rcx shlq $5, %rcx movq 336(%r12), %rdx movq 64(%rdi,%rcx), %rax cmpq %rdx, %rax jbe .LBB5_4 leaq (%rdi,%rcx), %rsi movq 48(%rsi), %rbp shlq $4, %rdx cmpq $0, (%rbp,%rdx) je .LBB5_4 movq 344(%r12), %rbx cmpq %rbx, %rax jbe .LBB5_4 shlq $4, %rbx cmpq $0, (%rbp,%rbx) je .LBB5_4 addq $8, %r11 movq 88(%rdi,%rcx), %rcx testq %rcx, %rcx je .LBB5_5 movq (%rsi), %rax movq 8(%rbp,%rdx), %rdx leaq (%rdx,%rdx,4), %rdi shlq $4, %rdi movq 32(%rax,%rdi), %rdx movq 56(%rax,%rdi), %r8 movq 8(%rbp,%rbx), %rbp leaq (%rbp,%rbp,4), %rbp shlq $4, %rbp movq 32(%rax,%rbp), %r9 xorl %ebp, %ebp jmp .LBB5_13 .p2align 4, 0x90 ``` Almost every one of the instructions starting with `j` is a potential branch, which can significantly slow down accesses. Of these, two labels are both common and never used: ```asm .LBB5_14: leaq __unnamed_2(%rip), %r8 callq _ZN4core9panicking18panic_bounds_check17h70367088e72af65aE ud2 .LBB5_4: callq _ZN8bevy_ecs5query25debug_checked_unreachable17h0855ff520ceaea77E ud2 .seh_endproc ``` These correpsond to subprocedure calls to panicking due to out of bounds from indexing `Tables` and `debug_checked_unreadable`. Both of which should be inlined and optimized out, but are not. ## Solution Make `debug_checked_unreachable` a macro to forcibly inline either `unreachable!()` in debug builds, and `std::hint::unreachable_unchecked()` in release mode. Replace the `Tables` and `Archetype` index access with `get(id).unwrap_or_else(|| debug_checked_unreachable!())` to assume that the table or archetype provided exists. This has no external breaking change of any kind. The equivalent section of code with these changes removes most of the conditional jump instructions: ```asm .LBB5_5: movss (%rbx,%rbp,4), %xmm0 movl %r14d, 4(%r8,%rbp,8) addss (%rdi,%rbp,4), %xmm0 movss %xmm0, (%rdi,%rbp,4) incq %rbp .LBB5_1: cmpq %rdx, %rbp jne .LBB5_5 .p2align 4, 0x90 .LBB5_2: cmpq %rcx, %rax je .LBB5_6 movq (%rax), %rdx addq $8, %rax movq 312(%rsi), %rbp leaq (%rdx,%rdx,2), %rbx shlq $5, %rbx movq 88(%rbp,%rbx), %rdx testq %rdx, %rdx je .LBB5_2 leaq (%rbx,%rbp), %r8 movq 336(%r15), %rdi movq 344(%r15), %r9 movq 48(%rbp,%rbx), %r10 shlq $4, %rdi movq (%r8), %rbx movq 8(%r10,%rdi), %rdi leaq (%rdi,%rdi,4), %rbp shlq $4, %rbp movq 32(%rbx,%rbp), %rdi movq 56(%rbx,%rbp), %r8 shlq $4, %r9 movq 8(%r10,%r9), %rbp leaq (%rbp,%rbp,4), %rbp shlq $4, %rbp movq 32(%rbx,%rbp), %rbx xorl %ebp, %ebp jmp .LBB5_5 .LBB5_6: addq $40, %rsp popq %rbx popq %rbp popq %rdi popq %rsi popq %r14 popq %r15 retq .seh_endproc ``` ## Performance Microbenchmarks results: <details> ``` group main no-panic-query ----- ---- -------------- busy_systems/01x_entities_03_systems 1.20 42.4±2.66µs ? ?/sec 1.00 35.3±1.68µs ? ?/sec busy_systems/01x_entities_06_systems 1.32 83.8±3.50µs ? ?/sec 1.00 63.6±1.72µs ? ?/sec busy_systems/01x_entities_09_systems 1.15 113.3±8.90µs ? ?/sec 1.00 98.2±6.15µs ? ?/sec busy_systems/01x_entities_12_systems 1.27 160.8±32.44µs ? ?/sec 1.00 126.6±4.70µs ? ?/sec busy_systems/01x_entities_15_systems 1.12 179.6±3.71µs ? ?/sec 1.00 160.3±11.03µs ? ?/sec busy_systems/02x_entities_03_systems 1.18 76.8±3.14µs ? ?/sec 1.00 65.2±3.17µs ? ?/sec busy_systems/02x_entities_06_systems 1.16 144.6±6.10µs ? ?/sec 1.00 124.5±5.14µs ? ?/sec busy_systems/02x_entities_09_systems 1.19 215.3±9.18µs ? ?/sec 1.00 181.5±5.67µs ? ?/sec busy_systems/02x_entities_12_systems 1.20 266.7±8.33µs ? ?/sec 1.00 222.0±9.53µs ? ?/sec busy_systems/02x_entities_15_systems 1.23 338.8±10.53µs ? ?/sec 1.00 276.3±6.94µs ? ?/sec busy_systems/03x_entities_03_systems 1.43 113.5±5.06µs ? ?/sec 1.00 79.6±1.49µs ? ?/sec busy_systems/03x_entities_06_systems 1.38 217.3±12.67µs ? ?/sec 1.00 157.5±3.07µs ? ?/sec busy_systems/03x_entities_09_systems 1.23 308.8±24.75µs ? ?/sec 1.00 251.6±8.93µs ? ?/sec busy_systems/03x_entities_12_systems 1.05 347.7±12.43µs ? ?/sec 1.00 330.6±11.43µs ? ?/sec busy_systems/03x_entities_15_systems 1.13 455.5±13.88µs ? ?/sec 1.00 401.7±17.29µs ? ?/sec busy_systems/04x_entities_03_systems 1.24 144.7±5.89µs ? ?/sec 1.00 116.9±6.29µs ? ?/sec busy_systems/04x_entities_06_systems 1.24 282.8±21.40µs ? ?/sec 1.00 228.6±21.31µs ? ?/sec busy_systems/04x_entities_09_systems 1.35 431.8±14.10µs ? ?/sec 1.00 319.6±9.83µs ? ?/sec busy_systems/04x_entities_12_systems 1.16 493.8±22.87µs ? ?/sec 1.00 424.9±15.24µs ? ?/sec busy_systems/04x_entities_15_systems 1.10 587.5±23.25µs ? ?/sec 1.00 531.7±16.32µs ? ?/sec busy_systems/05x_entities_03_systems 1.14 148.2±9.61µs ? ?/sec 1.00 129.5±4.32µs ? ?/sec busy_systems/05x_entities_06_systems 1.31 359.7±17.46µs ? ?/sec 1.00 273.6±10.55µs ? ?/sec busy_systems/05x_entities_09_systems 1.22 473.5±23.11µs ? ?/sec 1.00 389.3±13.62µs ? ?/sec busy_systems/05x_entities_12_systems 1.05 562.9±20.76µs ? ?/sec 1.00 536.5±24.35µs ? ?/sec busy_systems/05x_entities_15_systems 1.23 818.5±28.70µs ? ?/sec 1.00 666.6±45.87µs ? ?/sec contrived/01x_entities_03_systems 1.27 27.5±0.49µs ? ?/sec 1.00 21.6±1.71µs ? ?/sec contrived/01x_entities_06_systems 1.22 49.9±1.18µs ? ?/sec 1.00 40.7±2.62µs ? ?/sec contrived/01x_entities_09_systems 1.30 72.3±2.39µs ? ?/sec 1.00 55.4±2.60µs ? ?/sec contrived/01x_entities_12_systems 1.28 94.3±9.44µs ? ?/sec 1.00 73.7±3.62µs ? ?/sec contrived/01x_entities_15_systems 1.25 118.0±2.43µs ? ?/sec 1.00 94.1±3.99µs ? ?/sec contrived/02x_entities_03_systems 1.23 41.6±1.71µs ? ?/sec 1.00 33.7±2.30µs ? ?/sec contrived/02x_entities_06_systems 1.19 78.6±2.63µs ? ?/sec 1.00 65.9±2.35µs ? ?/sec contrived/02x_entities_09_systems 1.28 113.6±3.60µs ? ?/sec 1.00 88.6±3.60µs ? ?/sec contrived/02x_entities_12_systems 1.20 146.4±5.75µs ? ?/sec 1.00 121.7±3.35µs ? ?/sec contrived/02x_entities_15_systems 1.23 178.5±4.86µs ? ?/sec 1.00 145.7±4.00µs ? ?/sec contrived/03x_entities_03_systems 1.42 58.3±2.77µs ? ?/sec 1.00 41.1±1.54µs ? ?/sec contrived/03x_entities_06_systems 1.32 108.5±7.30µs ? ?/sec 1.00 82.4±4.86µs ? ?/sec contrived/03x_entities_09_systems 1.23 153.7±4.61µs ? ?/sec 1.00 125.0±4.76µs ? ?/sec contrived/03x_entities_12_systems 1.18 197.5±5.12µs ? ?/sec 1.00 166.8±8.14µs ? ?/sec contrived/03x_entities_15_systems 1.23 238.8±6.38µs ? ?/sec 1.00 194.6±4.55µs ? ?/sec contrived/04x_entities_03_systems 1.34 66.4±3.42µs ? ?/sec 1.00 49.5±1.98µs ? ?/sec contrived/04x_entities_06_systems 1.27 134.3±4.86µs ? ?/sec 1.00 105.8±3.58µs ? ?/sec contrived/04x_entities_09_systems 1.26 193.2±3.83µs ? ?/sec 1.00 153.0±5.60µs ? ?/sec contrived/04x_entities_12_systems 1.16 237.1±5.78µs ? ?/sec 1.00 204.9±18.77µs ? ?/sec contrived/04x_entities_15_systems 1.17 289.2±4.76µs ? ?/sec 1.00 246.3±8.57µs ? ?/sec contrived/05x_entities_03_systems 1.26 80.4±2.90µs ? ?/sec 1.00 63.7±3.07µs ? ?/sec contrived/05x_entities_06_systems 1.27 161.6±13.47µs ? ?/sec 1.00 127.2±5.59µs ? ?/sec contrived/05x_entities_09_systems 1.22 228.0±7.76µs ? ?/sec 1.00 186.2±7.68µs ? ?/sec contrived/05x_entities_12_systems 1.20 289.5±6.21µs ? ?/sec 1.00 241.8±7.52µs ? ?/sec contrived/05x_entities_15_systems 1.18 357.3±11.24µs ? ?/sec 1.00 302.7±7.21µs ? ?/sec heavy_compute/base 1.01 302.4±3.52µs ? ?/sec 1.00 300.2±3.40µs ? ?/sec iter_fragmented/base 1.00 348.1±7.51ns ? ?/sec 1.01 351.9±8.32ns ? ?/sec iter_fragmented/foreach 1.03 239.8±23.78ns ? ?/sec 1.00 233.8±18.12ns ? ?/sec iter_fragmented/foreach_wide 1.00 3.9±0.13µs ? ?/sec 1.02 4.0±0.22µs ? ?/sec iter_fragmented/wide 1.18 4.6±0.15µs ? ?/sec 1.00 3.9±0.10µs ? ?/sec iter_fragmented_sparse/base 1.02 8.1±0.15ns ? ?/sec 1.00 7.9±0.56ns ? ?/sec iter_fragmented_sparse/foreach 1.00 7.8±0.22ns ? ?/sec 1.01 7.9±0.62ns ? ?/sec iter_fragmented_sparse/foreach_wide 1.00 37.2±1.17ns ? ?/sec 1.10 40.9±0.95ns ? ?/sec iter_fragmented_sparse/wide 1.09 48.4±2.13ns ? ?/sec 1.00 44.5±18.34ns ? ?/sec iter_simple/base 1.02 8.4±0.10µs ? ?/sec 1.00 8.2±0.14µs ? ?/sec iter_simple/foreach 1.01 8.3±0.07µs ? ?/sec 1.00 8.2±0.09µs ? ?/sec iter_simple/foreach_sparse_set 1.00 25.3±0.32µs ? ?/sec 1.02 25.7±0.42µs ? ?/sec iter_simple/foreach_wide 1.03 41.1±0.94µs ? ?/sec 1.00 39.9±0.41µs ? ?/sec iter_simple/foreach_wide_sparse_set 1.05 123.6±2.05µs ? ?/sec 1.00 118.1±2.78µs ? ?/sec iter_simple/sparse_set 1.14 30.5±1.40µs ? ?/sec 1.00 26.9±0.64µs ? ?/sec iter_simple/system 1.01 8.4±0.25µs ? ?/sec 1.00 8.4±0.11µs ? ?/sec iter_simple/wide 1.18 48.2±0.62µs ? ?/sec 1.00 40.7±0.38µs ? ?/sec iter_simple/wide_sparse_set 1.12 140.8±21.56µs ? ?/sec 1.00 126.0±2.30µs ? ?/sec query_get/50000_entities_sparse 1.17 378.6±7.60µs ? ?/sec 1.00 324.1±23.17µs ? ?/sec query_get/50000_entities_table 1.08 330.9±10.90µs ? ?/sec 1.00 306.8±4.98µs ? ?/sec query_get_component/50000_entities_sparse 1.00 976.7±19.55µs ? ?/sec 1.00 979.8±35.87µs ? ?/sec query_get_component/50000_entities_table 1.00 1029.0±15.11µs ? ?/sec 1.05 1080.0±59.18µs ? ?/sec query_get_component_simple/system 1.13 839.7±14.18µs ? ?/sec 1.00 742.8±10.72µs ? ?/sec query_get_component_simple/unchecked 1.01 909.0±15.17µs ? ?/sec 1.00 898.0±13.56µs ? ?/sec query_get_many_10/50000_calls_sparse 1.04 5.5±0.54ms ? ?/sec 1.00 5.3±0.67ms ? ?/sec query_get_many_10/50000_calls_table 1.01 4.9±0.49ms ? ?/sec 1.00 4.8±0.45ms ? ?/sec query_get_many_2/50000_calls_sparse 1.28 848.4±210.89µs ? ?/sec 1.00 664.8±47.69µs ? ?/sec query_get_many_2/50000_calls_table 1.05 779.0±73.85µs ? ?/sec 1.00 739.2±83.02µs ? ?/sec query_get_many_5/50000_calls_sparse 1.05 2.4±0.37ms ? ?/sec 1.00 2.3±0.33ms ? ?/sec query_get_many_5/50000_calls_table 1.00 1939.9±75.22µs ? ?/sec 1.04 2.0±0.19ms ? ?/sec run_criteria/yes_using_query/001_systems 1.00 3.7±0.38µs ? ?/sec 1.30 4.9±0.14µs ? ?/sec run_criteria/yes_using_query/006_systems 1.00 8.9±0.40µs ? ?/sec 1.17 10.3±0.57µs ? ?/sec run_criteria/yes_using_query/011_systems 1.00 13.9±0.49µs ? ?/sec 1.08 15.0±0.89µs ? ?/sec run_criteria/yes_using_query/016_systems 1.00 18.8±0.74µs ? ?/sec 1.00 18.8±1.43µs ? ?/sec run_criteria/yes_using_query/021_systems 1.07 24.1±0.87µs ? ?/sec 1.00 22.6±1.58µs ? ?/sec run_criteria/yes_using_query/026_systems 1.04 27.9±0.62µs ? ?/sec 1.00 26.8±1.71µs ? ?/sec run_criteria/yes_using_query/031_systems 1.09 33.3±1.03µs ? ?/sec 1.00 30.5±2.18µs ? ?/sec run_criteria/yes_using_query/036_systems 1.14 38.7±0.80µs ? ?/sec 1.00 33.9±1.75µs ? ?/sec run_criteria/yes_using_query/041_systems 1.18 43.7±1.07µs ? ?/sec 1.00 37.0±2.39µs ? ?/sec run_criteria/yes_using_query/046_systems 1.14 47.6±1.16µs ? ?/sec 1.00 41.9±2.09µs ? ?/sec run_criteria/yes_using_query/051_systems 1.17 52.9±2.04µs ? ?/sec 1.00 45.3±1.75µs ? ?/sec run_criteria/yes_using_query/056_systems 1.25 59.2±2.38µs ? ?/sec 1.00 47.2±2.01µs ? ?/sec run_criteria/yes_using_query/061_systems 1.28 66.1±15.84µs ? ?/sec 1.00 51.5±2.47µs ? ?/sec run_criteria/yes_using_query/066_systems 1.28 70.2±2.57µs ? ?/sec 1.00 54.7±2.58µs ? ?/sec run_criteria/yes_using_query/071_systems 1.30 75.5±2.27µs ? ?/sec 1.00 58.2±3.31µs ? ?/sec run_criteria/yes_using_query/076_systems 1.26 81.5±2.66µs ? ?/sec 1.00 64.5±3.13µs ? ?/sec run_criteria/yes_using_query/081_systems 1.29 89.7±2.58µs ? ?/sec 1.00 69.3±3.47µs ? ?/sec run_criteria/yes_using_query/086_systems 1.33 95.6±3.39µs ? ?/sec 1.00 71.8±3.48µs ? ?/sec run_criteria/yes_using_query/091_systems 1.25 102.0±3.67µs ? ?/sec 1.00 81.4±4.82µs ? ?/sec run_criteria/yes_using_query/096_systems 1.33 111.7±3.29µs ? ?/sec 1.00 83.8±4.15µs ? ?/sec run_criteria/yes_using_query/101_systems 1.29 113.2±12.04µs ? ?/sec 1.00 87.7±5.15µs ? ?/sec world_query_for_each/50000_entities_sparse 1.00 47.4±0.51µs ? ?/sec 1.00 47.3±0.33µs ? ?/sec world_query_for_each/50000_entities_table 1.00 27.2±0.50µs ? ?/sec 1.00 27.2±0.17µs ? ?/sec world_query_get/50000_entities_sparse_wide 1.09 210.5±1.78µs ? ?/sec 1.00 192.5±2.61µs ? ?/sec world_query_get/50000_entities_table 1.00 127.7±2.09µs ? ?/sec 1.07 136.2±5.95µs ? ?/sec world_query_get/50000_entities_table_wide 1.00 209.8±2.37µs ? ?/sec 1.15 240.6±2.04µs ? ?/sec world_query_iter/50000_entities_sparse 1.00 54.2±0.36µs ? ?/sec 1.01 54.7±0.61µs ? ?/sec world_query_iter/50000_entities_table 1.00 27.2±0.31µs ? ?/sec 1.00 27.3±0.64µs ? ?/sec ``` </details> NOTE: This PR includes a change to enable LTO on our benchmarks to get a "fully optimized" baseline for our benchmarks. Both the main and the current PR's results were with LTO enabled.
2022-11-04 06:04:55 +00:00
[profile.release]
opt-level = 3
lto = true
[[bench]]
name = "ecs"
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
path = "benches/bevy_ecs/main.rs"
harness = false
[[bench]]
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
name = "math"
path = "benches/bevy_math/main.rs"
harness = false
[[bench]]
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
name = "picking"
path = "benches/bevy_picking/main.rs"
2020-09-10 19:54:24 +00:00
harness = false
Add Beziers to `bevy_math` (#7653) # Objective - Adds foundational math for Bezier curves, useful for UI/2D/3D animation and smooth paths. https://user-images.githubusercontent.com/2632925/218883143-e138f994-1795-40da-8c59-21d779666991.mp4 ## Solution - Adds the generic `Bezier` type, and a `Point` trait. The `Point` trait allows us to use control points of any dimension, as long as they support vector math. I've implemented it for `f32`(1D), `Vec2`(2D), and `Vec3`/`Vec3A`(3D). - Adds `CubicBezierEasing` on top of `Bezier` with the addition of an implementation of cubic Bezier easing, which is a foundational tool for UI animation. - This involves solving for $t$ in the parametric Bezier function $B(t)$ using the Newton-Raphson method to find a value with error $\leq$ 1e-7, capped at 8 iterations. - Added type aliases for common Bezier curves: `CubicBezier2d`, `CubicBezier3d`, `QuadraticBezier2d`, and `QuadraticBezier3d`. These types use `Vec3A` to represent control points, as this was found to have an 80-90% speedup over using `Vec3`. - Benchmarking shows quadratic/cubic Bezier evaluations $B(t)$ take \~1.8/2.4ns respectively. Easing, which requires an iterative solve takes \~50ns for cubic Beziers. --- ## Changelog - Added `CubicBezier2d`, `CubicBezier3d`, `QuadraticBezier2d`, and `QuadraticBezier3d` types with methods for sampling position, velocity, and acceleration. The generic `Bezier` type is also available, and generic over any degree of Bezier curve. - Added `CubicBezierEasing`, with additional methods to allow for smooth easing animations.
2023-02-20 18:34:52 +00:00
[[bench]]
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
name = "reflect"
path = "benches/bevy_reflect/main.rs"
Add Beziers to `bevy_math` (#7653) # Objective - Adds foundational math for Bezier curves, useful for UI/2D/3D animation and smooth paths. https://user-images.githubusercontent.com/2632925/218883143-e138f994-1795-40da-8c59-21d779666991.mp4 ## Solution - Adds the generic `Bezier` type, and a `Point` trait. The `Point` trait allows us to use control points of any dimension, as long as they support vector math. I've implemented it for `f32`(1D), `Vec2`(2D), and `Vec3`/`Vec3A`(3D). - Adds `CubicBezierEasing` on top of `Bezier` with the addition of an implementation of cubic Bezier easing, which is a foundational tool for UI animation. - This involves solving for $t$ in the parametric Bezier function $B(t)$ using the Newton-Raphson method to find a value with error $\leq$ 1e-7, capped at 8 iterations. - Added type aliases for common Bezier curves: `CubicBezier2d`, `CubicBezier3d`, `QuadraticBezier2d`, and `QuadraticBezier3d`. These types use `Vec3A` to represent control points, as this was found to have an 80-90% speedup over using `Vec3`. - Benchmarking shows quadratic/cubic Bezier evaluations $B(t)$ take \~1.8/2.4ns respectively. Easing, which requires an iterative solve takes \~50ns for cubic Beziers. --- ## Changelog - Added `CubicBezier2d`, `CubicBezier3d`, `QuadraticBezier2d`, and `QuadraticBezier3d` types with methods for sampling position, velocity, and acceleration. The generic `Bezier` type is also available, and generic over any degree of Bezier curve. - Added `CubicBezierEasing`, with additional methods to allow for smooth easing animations.
2023-02-20 18:34:52 +00:00
harness = false
Optimize `Entity::eq` (#10519) (This is my first PR here, so I've probably missed some things. Please let me know what else I should do to help you as a reviewer!) # Objective Due to https://github.com/rust-lang/rust/issues/117800, the `derive`'d `PartialEq::eq` on `Entity` isn't as good as it could be. Since that's used in hashtable lookup, let's improve it. ## Solution The derived `PartialEq::eq` short-circuits if the generation doesn't match. However, having a branch there is sub-optimal, especially on 64-bit systems like x64 that could just load the whole `Entity` in one load anyway. Due to complications around `poison` in LLVM and the exact details of what unsafe code is allowed to do with reference in Rust (https://github.com/rust-lang/unsafe-code-guidelines/issues/346), LLVM isn't allowed to completely remove the short-circuiting. `&Entity` is marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8 bytes -- and does so -- but it has to assume that the `index` might be undef/poison if the `generation` doesn't match, and thus while it finds a way to do it without needing a branch, it has to do something slightly more complicated than optimal to combine the results. (LLVM is allowed to change non-short-circuiting code to use branches, but not the other way around.) Here's a link showing the codegen today: <https://rust.godbolt.org/z/9WzjxrY7c> ```rust #[no_mangle] pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool { a == b } ``` ends up generating the following assembly: ```asm demo_eq_ref: movq xmm0, qword ptr [rdi] movq xmm1, qword ptr [rsi] pcmpeqd xmm1, xmm0 pshufd xmm0, xmm1, 80 movmskpd eax, xmm0 cmp eax, 3 sete al ret ``` (It's usually not this bad in real uses after inlining and LTO, but it makes a strong demo.) This PR manually implements `PartialEq::eq` *without* short-circuiting, and because that tells LLVM that neither the generations nor the index can be poison, it doesn't need to be so careful and can generate the "just compare the two 64-bit values" code you'd have probably already expected: ```asm demo_eq_ref: mov rax, qword ptr [rsi] cmp qword ptr [rdi], rax sete al ret ``` Since this doesn't change the representation of `Entity`, if it's instead passed by *value*, then each `Entity` is two `u32` registers, and the old and the new code do exactly the same thing. (Other approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect this case.) This should hopefully merge easily with changes like https://github.com/bevyengine/bevy/pull/9907 that also want to change `Entity`. ## Benchmarks I'm not super-confident that I got my machine fully consistent for benchmarking, but whether I run the old or the new one first I get reasonably consistent results. Here's a fairly typical example of the benchmarks I added in this PR: ![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1) Building the sets seems to be basically the same. It's usually reported as noise, but sometimes I see a few percent slower or faster. But lookup hits in particular -- since a hit checks that the key is equal -- consistently shows around 10% improvement. `cargo run --example many_cubes --features bevy/trace_tracy --release -- --benchmark` showed as slightly faster with this change, though if I had to bet I'd probably say it's more noise than meaningful (but at least it's not worse either): ![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0) This is my first PR here -- and my first time running Tracy -- so please let me know what else I should run, or run things on your own more reliable machines to double-check. --- ## Changelog (probably not worth including) Changed: micro-optimized `Entity::eq` to help LLVM slightly. ## Migration Guide (I really hope nobody was using this on uninitialized entities where sufficiently tortured `unsafe` could could technically notice that this has changed.)
2023-11-14 02:06:21 +00:00
[[bench]]
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
name = "render"
path = "benches/bevy_render/main.rs"
harness = false
Optimize `Entity::eq` (#10519) (This is my first PR here, so I've probably missed some things. Please let me know what else I should do to help you as a reviewer!) # Objective Due to https://github.com/rust-lang/rust/issues/117800, the `derive`'d `PartialEq::eq` on `Entity` isn't as good as it could be. Since that's used in hashtable lookup, let's improve it. ## Solution The derived `PartialEq::eq` short-circuits if the generation doesn't match. However, having a branch there is sub-optimal, especially on 64-bit systems like x64 that could just load the whole `Entity` in one load anyway. Due to complications around `poison` in LLVM and the exact details of what unsafe code is allowed to do with reference in Rust (https://github.com/rust-lang/unsafe-code-guidelines/issues/346), LLVM isn't allowed to completely remove the short-circuiting. `&Entity` is marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8 bytes -- and does so -- but it has to assume that the `index` might be undef/poison if the `generation` doesn't match, and thus while it finds a way to do it without needing a branch, it has to do something slightly more complicated than optimal to combine the results. (LLVM is allowed to change non-short-circuiting code to use branches, but not the other way around.) Here's a link showing the codegen today: <https://rust.godbolt.org/z/9WzjxrY7c> ```rust #[no_mangle] pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool { a == b } ``` ends up generating the following assembly: ```asm demo_eq_ref: movq xmm0, qword ptr [rdi] movq xmm1, qword ptr [rsi] pcmpeqd xmm1, xmm0 pshufd xmm0, xmm1, 80 movmskpd eax, xmm0 cmp eax, 3 sete al ret ``` (It's usually not this bad in real uses after inlining and LTO, but it makes a strong demo.) This PR manually implements `PartialEq::eq` *without* short-circuiting, and because that tells LLVM that neither the generations nor the index can be poison, it doesn't need to be so careful and can generate the "just compare the two 64-bit values" code you'd have probably already expected: ```asm demo_eq_ref: mov rax, qword ptr [rsi] cmp qword ptr [rdi], rax sete al ret ``` Since this doesn't change the representation of `Entity`, if it's instead passed by *value*, then each `Entity` is two `u32` registers, and the old and the new code do exactly the same thing. (Other approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect this case.) This should hopefully merge easily with changes like https://github.com/bevyengine/bevy/pull/9907 that also want to change `Entity`. ## Benchmarks I'm not super-confident that I got my machine fully consistent for benchmarking, but whether I run the old or the new one first I get reasonably consistent results. Here's a fairly typical example of the benchmarks I added in this PR: ![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1) Building the sets seems to be basically the same. It's usually reported as noise, but sometimes I see a few percent slower or faster. But lookup hits in particular -- since a hit checks that the key is equal -- consistently shows around 10% improvement. `cargo run --example many_cubes --features bevy/trace_tracy --release -- --benchmark` showed as slightly faster with this change, though if I had to bet I'd probably say it's more noise than meaningful (but at least it's not worse either): ![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0) This is my first PR here -- and my first time running Tracy -- so please let me know what else I should run, or run things on your own more reliable machines to double-check. --- ## Changelog (probably not worth including) Changed: micro-optimized `Entity::eq` to help LLVM slightly. ## Migration Guide (I really hope nobody was using this on uninitialized entities where sufficiently tortured `unsafe` could could technically notice that this has changed.)
2023-11-14 02:06:21 +00:00
[[bench]]
Make benchmark setup consistent (#16733) # Objective - Benchmarks are inconsistently setup, there are several that do not compile, and several others that need to be reformatted. - Related / precursor to #16647, this is part of my attempt migrating [`bevy-bencher`](https://github.com/TheBevyFlock/bevy-bencher) to the official benchmarks. ## Solution > [!TIP] > > I recommend reviewing this PR commit-by-commit, instead of all at once! In 5d26f56eb9404d14b8272ce8fa47e7eb318c2b18 I reorganized how benches were registered. Now this is one `[[bench]]` per Bevy crate. In each crate benchmark folder, there is a `main.rs` that calls `criterion_main!`. I also disabled automatic benchmark discovery, which isn't necessarily required, but may clear up confusion with our custom setup. I also fixed a few errors that were causing the benchmarks to fail to compile. In afc8d33a87d52df65d52392d207acfdfe189f5b2 I ran `rustfmt` on all of the benchmarks. In d6cdf960abe27bf68a19fdcd5aa97d4aedd2d314 I fixed all of the Clippy warnings. In ee94d48f5085002566a5a6eada9225a10ff067cb I fixed some of the benchmarks' usage of `black_box()`. I ended up opening https://github.com/rust-lang/rust/pull/133942 due to this, which should help prevent this in the future. In cbe1688dcd7998771ec99f8765896fa859c7da00 I renamed all of the ECS benchmark groups to be called `benches`, to be consistent with the other crate benchmarks. In e701c212cdc6bb57f1af37316e7834f2ac982630 and 8815bb78b0d8a87326f365da51940a01cf37e1f9 I re-ordered some imports and module definitions, and uplifted `fragmentation/mod.rs` to `fragementation.rs`. Finally, in b0065e0b0bf53c9159cb5c210214c2576a803ec5 I organized `Cargo.toml` and bumped Criterion to v0.5. ## Testing - `cd benches && cargo clippy --benches` - `cd benches && cargo fmt --all`
2024-12-10 20:39:14 +00:00
name = "tasks"
path = "benches/bevy_tasks/main.rs"
Optimize `Entity::eq` (#10519) (This is my first PR here, so I've probably missed some things. Please let me know what else I should do to help you as a reviewer!) # Objective Due to https://github.com/rust-lang/rust/issues/117800, the `derive`'d `PartialEq::eq` on `Entity` isn't as good as it could be. Since that's used in hashtable lookup, let's improve it. ## Solution The derived `PartialEq::eq` short-circuits if the generation doesn't match. However, having a branch there is sub-optimal, especially on 64-bit systems like x64 that could just load the whole `Entity` in one load anyway. Due to complications around `poison` in LLVM and the exact details of what unsafe code is allowed to do with reference in Rust (https://github.com/rust-lang/unsafe-code-guidelines/issues/346), LLVM isn't allowed to completely remove the short-circuiting. `&Entity` is marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8 bytes -- and does so -- but it has to assume that the `index` might be undef/poison if the `generation` doesn't match, and thus while it finds a way to do it without needing a branch, it has to do something slightly more complicated than optimal to combine the results. (LLVM is allowed to change non-short-circuiting code to use branches, but not the other way around.) Here's a link showing the codegen today: <https://rust.godbolt.org/z/9WzjxrY7c> ```rust #[no_mangle] pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool { a == b } ``` ends up generating the following assembly: ```asm demo_eq_ref: movq xmm0, qword ptr [rdi] movq xmm1, qword ptr [rsi] pcmpeqd xmm1, xmm0 pshufd xmm0, xmm1, 80 movmskpd eax, xmm0 cmp eax, 3 sete al ret ``` (It's usually not this bad in real uses after inlining and LTO, but it makes a strong demo.) This PR manually implements `PartialEq::eq` *without* short-circuiting, and because that tells LLVM that neither the generations nor the index can be poison, it doesn't need to be so careful and can generate the "just compare the two 64-bit values" code you'd have probably already expected: ```asm demo_eq_ref: mov rax, qword ptr [rsi] cmp qword ptr [rdi], rax sete al ret ``` Since this doesn't change the representation of `Entity`, if it's instead passed by *value*, then each `Entity` is two `u32` registers, and the old and the new code do exactly the same thing. (Other approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect this case.) This should hopefully merge easily with changes like https://github.com/bevyengine/bevy/pull/9907 that also want to change `Entity`. ## Benchmarks I'm not super-confident that I got my machine fully consistent for benchmarking, but whether I run the old or the new one first I get reasonably consistent results. Here's a fairly typical example of the benchmarks I added in this PR: ![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1) Building the sets seems to be basically the same. It's usually reported as noise, but sometimes I see a few percent slower or faster. But lookup hits in particular -- since a hit checks that the key is equal -- consistently shows around 10% improvement. `cargo run --example many_cubes --features bevy/trace_tracy --release -- --benchmark` showed as slightly faster with this change, though if I had to bet I'd probably say it's more noise than meaningful (but at least it's not worse either): ![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0) This is my first PR here -- and my first time running Tracy -- so please let me know what else I should run, or run things on your own more reliable machines to double-check. --- ## Changelog (probably not worth including) Changed: micro-optimized `Entity::eq` to help LLVM slightly. ## Migration Guide (I really hope nobody was using this on uninitialized entities where sufficiently tortured `unsafe` could could technically notice that this has changed.)
2023-11-14 02:06:21 +00:00
harness = false