## Objective
Currently, events are dropped after two frames. This cadence wasn't
*chosen* for a specific reason, double buffering just lets events
persist for at least two frames. Events only need to be dropped at a
predictable point so that the event queues don't grow forever (i.e.
events should never cause a memory leak).
Events (and especially input events) need to be observable by systems in
`FixedUpdate`, but as-is events are dropped before those systems even
get a chance to see them.
## Solution
Instead of unconditionally dropping events in `First`, require
`FixedUpdate` to first queue the buffer swap (if the `TimePlugin` has
been installed). This way, events are only dropped after a frame that
runs `FixedUpdate`.
## Future Work
In the same way we have independent copies of `Time` for tracking time
in `Main` and `FixedUpdate`, we will need independent copies of `Input`
for tracking press/release status correctly in `Main` and `FixedUpdate`.
--
Every run of `FixedUpdate` covers a specific timespan. For example, if
the fixed timestep `Δt` is 10ms, the first three `FixedUpdate` runs
cover `[0ms, 10ms)`, `[10ms, 20ms)`, and `[20ms, 30ms)`.
`FixedUpdate` can run many times in one frame. For truly
framerate-independent behavior, each `FixedUpdate` should only see the
events that occurred in its covered timespan, but what happens right now
is the first step in the frame reads all pending events.
Fixing that will require timestamped events.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
forward for bevy user to consume
# Objective
- since winit 0.27 an event WindowOccluded will be produced:
https://docs.rs/winit/latest/winit/event/enum.WindowEvent.html#variant.Occluded
- on ios this is currently the only way to know if an app comes back
from the background which is an important time to to things (like
resetting the badge)
## Solution
- pick up the winit event and forward it to a new `EventWriter`
---
## Changelog
### Added
- new Event `WindowOccluded` added allowing to hook into
`WindowEvent::Occluded` of winit
# Objective
I tried setting `ClearColorConfig` in my app via `Color::FOO.into()`
expecting it to work, but the impl was missing.
## Solution
- Add `impl From<Color> for ClearColorConfig`
- Change examples to use this impl
## Changelog
### Added
- `ClearColorConfig` can be constructed via `.into()` on a `Color`
---------
Signed-off-by: Torstein Grindvik <torstein.grindvik@muybridge.com>
Co-authored-by: Torstein Grindvik <torstein.grindvik@muybridge.com>
# Objective
`wgpu` has a helper method `texture.as_image_copy()` for a common
pattern when making a default-like `ImageCopyTexture` from a texture.
This is used in various places in Bevy for texture copy operations, but
it was not used where `write_texture` is called.
## Solution
- Replace struct `ImageCopyTexture` initialization with
`texture.as_image_copy()` where appropriate
Signed-off-by: Torstein Grindvik <torstein.grindvik@muybridge.com>
Co-authored-by: Torstein Grindvik <torstein.grindvik@muybridge.com>
# Objective
- Window size, scale and position are not correct on the first execution
of the systems
- Fixes#10407, fixes#10642
## Solution
- Don't run `update` before we get a chance to create the window in
winit
- Finish reverting #9826 after #10389
# Objective
The `generate_composite_uuid` utility function hidden in
`bevy_reflect::__macro_exports` could be generally useful to users.
For example, I previously relied on `Hash` to generate a `u64` to create
a deterministic `HandleId`. In v0.12, `HandleId` has been replaced by
`AssetId` which now requires a `Uuid`, which I could generate with this
function.
## Solution
Relocate `generate_composite_uuid` from `bevy_reflect::__macro_exports`
to `bevy_utils::uuid`.
It is still re-exported under `bevy_reflect::__macro_exports` so there
should not be any breaking changes (although, users should generally not
rely on pseudo-private/hidden modules like `__macro_exports`).
I chose to keep it in `bevy_reflect::__macro_exports` so as to not
clutter up our public API and to reduce the number of changes in this
PR. We could have also marked the export as `#[doc(hidden)]`, but
personally I like that we have a dedicated module for this (makes it
clear what is public and what isn't when just looking at the macro
code).
---
## Changelog
- Moved `generate_composite_uuid` to `bevy_utils::uuid` and made it
public
- Note: it was technically already public, just hidden
# Objective
The `nondeterministic_system_order` example doesn't actually detect and
log its deliberate order ambiguities! It should, tho.
## Solution
Update the schedule label, and explain in a comment that you can't turn
it on for the whole `Main` schedule in one go (alas, that would be nice,
but it makes sense that it doesn't work that way).
# Objective
It is currently impossible to control the relative ordering of two 2D
materials at the same depth. This is required to implement wireframes
for 2D meshes correctly
(https://github.com/bevyengine/bevy/issues/5881).
## Solution
Add a `Material2d::depth_bias` function that mirrors the existing 3D
`Material::depth_bias` function.
(this is pulled out of https://github.com/bevyengine/bevy/pull/10489)
---
## Changelog
### Added
- Added `Material2d::depth_bias`
## Migration Guide
`PreparedMaterial2d` has a new `depth_bias` field. A value of 0.0 can be
used to get the previous behavior.
# Objective
Resolves #10727.
`outline.width` was being assigned to `node.outline_offset` instead of
`outline.offset`.
## Solution
Changed `.width` to `.offset` in line 413.
# Objective
Bevy_hierarchy is very useful for ECS only usages of bevy_ecs, but it
currently pulls in bevy_reflect, bevy_app and bevy_core with no way to
opt out.
## Solution
This PR provides features `bevy_app` and `reflect` that are enabled by
default. If disabled, they should remove these dependencies from
bevy_hierarchy.
---
## Changelog
Added features `bevy_app` and `reflect` to bevy_hierarchy.
# Objective
Updates [`futures-lite`](https://github.com/smol-rs/futures-lite) in
bevy_tasks to the next major version (1 -> 2).
Also removes the duplication of `futures-lite`, as `async-fs` requires v
2, so there are currently 2 copies of futures-lite in the dependency
tree.
Futures-lite has received [a number of
updates](https://github.com/smol-rs/futures-lite/blob/master/CHANGELOG.md)
since bevy's current version `1.4`.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Mike <mike.hsu@gmail.com>
# Objective
- Fix#10499
## Solution
- Use `.get_represented_type_info()` module path and type ident instead
of `.reflect_*` module path and type ident when serializing the `Option`
enum
---
## Changelog
- Fix serialization bug
- Add simple test
- Add `serde_json` dev dependency
- Add `serde` with `derive` feature dev dependency (wouldn't compile for
me without it)
---------
Co-authored-by: hank <hank@hank.co.in>
Co-authored-by: Gino Valente <49806985+MrGVSV@users.noreply.github.com>
Explain https://github.com/bevyengine/bevy/issues/10625.
This might be obvious to those familiar with Bevy internals, but it
surprised me.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
I incorrectly assumed that moving a system from `Update` to
`FixedUpdate` would simplify logic without hurting performance.
But this is not the case: if there's single-threaded long computation in
the `FixedUpdate`, the machine won't do anything else in parallel with
it. Which might be not what users expect.
So this PR adds a note. But maybe it is obvious, I don't know.
# Objective
fix webgpu+chrome(119) textureSample in non-uniform control flow error
## Solution
modify view transmission texture sampling to use textureSampleLevel.
there are no mips for the view transmission texture, so this doesn't
change the result, but it removes the need for the samples to be in
uniform control flow.
note: in future we may add a mipchain to the transmission texture to
improve the blur effect. if uniformity analysis hasn't improved, this
would require switching to manual derivative calculations (which is
something we plan to do anyway).
## Objective
- Split different types of gizmos into their own files
## Solution
- Move `arc_2d` and `Arc2dBuilder` into `arcs.rs`
- turns out there's no 3d arc function! I'll add one Soon(TM) in another
MR
## Changelog
- Changed
- moved `gizmos::Arc2dBuilder` to `gizmos::arcs::Arc2dBuilder`
## Migration Guide
- `gizmos::Arc2dBuilder` -> `gizmos::arcs::Arc2dBuilder`
# Objective
Problems:
* The clipped, non-visible regions of UI nodes are interactive.
* `RelativeCursorPostion` is set relative to the visible part of the
node. It should be relative to the whole node.
* The `RelativeCursorPostion::mouse_over` method returns `true` when the
mouse is over a clipped part of a node.
fixes#10470
## Solution
Intersect a node's bounding rect with its clipping rect before checking
if it contains the cursor.
Added the field `normalized_visible_node_rect` to
`RelativeCursorPosition`. This is set to the bounds of the unclipped
area of the node rect by `ui_focus_system` expressed in normalized
coordinates relative to the entire node.
Instead of checking if the normalized cursor position lies within a unit
square, it instead checks if it is contained by
`normalized_visible_node_rect`.
Added outlines to the `overflow` example that appear when the cursor is
over the visible part of the images, but not the clipped area.
---
## Changelog
* `ui_focus_system` intersects a node's bounding rect with its clipping
rect before checking if mouse over.
* Added the field `normalized_visible_node_rect` to
`RelativeCursorPosition`. This is set to the bounds of the unclipped
area of the node rect by `ui_focus_system` expressed in normalized
coordinates relative to the entire node.
* `RelativeCursorPostion` is calculated relative to the whole node's
position and size, not only the visible part.
* `RelativeCursorPosition::mouse_over` only returns true when the mouse
is over an unclipped region of the UI node.
* Removed the `Deref` and `DerefMut` derives from
`RelativeCursorPosition` as it is no longer a single field struct.
* Added some outlines to the `overflow` example that respond to
`Interaction` changes.
## Migration Guide
The clipped areas of UI nodes are no longer interactive.
`RelativeCursorPostion` is now calculated relative to the whole node's
position and size, not only the visible part. Its `mouse_over` method
only returns true when the cursor is over an unclipped part of the node.
`RelativeCursorPosition` no longer implements `Deref` and `DerefMut`.
# Objective
- Fixes#10695
## Solution
Fixed obvious blunder in `PartialEq` implementation for
`UntypedAssetId`'s where the `TypeId` was not included in the
comparison. This was not picked up in the unit tests I added because
they only tested over a single asset type.
# Objective
- I've been experimenting with different patterns to try and make async
tasks more convenient. One of the better ones I've found is to return a
command queue to allow for deferred &mut World access. It can be
convenient to check for task completion in a normal system, but it is
hard to do something with the command queue after getting it back. This
pr adds a `append` to Commands. This allows appending the returned
command queue onto the system's commands.
## Solution
- I edited the async compute example to use the new `append`, but not
sure if I should keep the example changed as this might be too
opinionated.
## Future Work
- It would be very easy to pull the pattern used in the example out into
a plugin or a external crate, so users wouldn't have to add the checking
system.
---
## Changelog
- add `append` to `Commands` and `CommandQueue`
# Objective
Enables warning on `clippy::undocumented_unsafe_blocks` across the
workspace rather than only in `bevy_ecs`, `bevy_transform` and
`bevy_utils`. This adds a little awkwardness in a few areas of code that
have trivial safety or explain safety for multiple unsafe blocks with
one comment however automatically prevents these comments from being
missed.
## Solution
This adds `undocumented_unsafe_blocks = "warn"` to the workspace
`Cargo.toml` and fixes / adds a few missed safety comments. I also added
`#[allow(clippy::undocumented_unsafe_blocks)]` where the safety is
explained somewhere above.
There are a couple of safety comments I added I'm not 100% sure about in
`bevy_animation` and `bevy_render/src/view` and I'm not sure about the
use of `#[allow(clippy::undocumented_unsafe_blocks)]` compared to adding
comments like `// SAFETY: See above`.
# Objective
First, some terminology:
- **Minor radius**: The radius of the tube of a torus, i.e. the
"half-thickness"
- **Major radius**: The distance from the center of the tube to the
center of the torus
- **Inner radius**: The radius of the hole (if it exists), `major_radius
- minor_radius`
- **Outer radius**: The radius of the overall shape, `major_radius +
minor_radius`
- **Ring torus**: The familiar donut shape with a hole in the center,
`major_radius > minor_radius`
- **Horn torus**: A torus that doesn't have a hole but also isn't
self-intersecting, `major_radius == minor_radius`
- **Spindle torus**: A self-intersecting torus, `major_radius <
minor_radius`
Different tori from [Wikipedia](https://en.wikipedia.org/wiki/Torus),
where *R* is the major radius and *r* is the minor radius:
![kuva](https://github.com/bevyengine/bevy/assets/57632562/53ead786-2402-43a7-ae8a-5720e6e54dcc)
Currently, Bevy's torus is represented by a `radius` and `ring_radius`.
I believe these correspond to the outer radius and minor radius, but
they are rather confusing and inconsistent names, and they make the
assumption that the torus always has a ring.
I also couldn't find any other big engines using this representation;
[Godot](https://docs.godotengine.org/en/stable/classes/class_torusmesh.html)
and [Unity
ProBuilder](https://docs.unity3d.com/Packages/com.unity.probuilder@4.0/manual/Torus.html)
use the inner and outer radii, while
[Unreal](https://docs.unrealengine.com/5.3/en-US/BlueprintAPI/GeometryScript/Primitives/AppendTorus/)
uses the minor and major radii.
[Blender](https://docs.blender.org/manual/en/latest/modeling/meshes/primitives.html#torus)
supports both, but defaults to minor/major.
Bevy's `Torus` primitive should have an efficient, consistent, clear and
flexible representation, and the current `radius` and `ring_radius`
properties are not ideal for that.
## Solution
Change `Torus` to be represented by a `minor_radius` and `major_radius`.
- Mathematically correct and consistent
- Flexible, not restricted to ring tori
- Computations and conversions are efficient
- `inner_radius = major_radius - minor_radius`
- `outer_radius = major_radius + minor_radius`
- Mathematical formulae for things like area and volume rely on the
minor and major radii, no conversion needed
Perhaps the primary issue with this representation is that "minor
radius" and "major radius" are rather mathematical, and an inner/outer
radius can be more intuitive in some cases. However, this can be
mitigated with constructors and helpers.
# Objective
Make the impl block for RemovedSystem generic so that the methods can be
called for systems that have inputs or outputs.
## Solution
Simply adding generics to the impl block.
# Objective
Adds `.entry` to `EntityWorldMut` with `Entry`, `OccupiedEntry` and
`VacantEntry` for easier in-situ modification, based on `HashMap.entry`.
Fixes#10635
## Solution
This adds the `entry` method to `EntityWorldMut` which returns an
`Entry`. This is an enum of `OccupiedEntry` and `VacantEntry` and has
the methods `and_modify`, `insert_entry`, `or_insert`, `or_insert_with`
and `or_default`. The only difference between `OccupiedEntry` and
`VacantEntry` is the type, they are both a mutable reference to the
`EntityWorldMut` and a marker for the component type, `HashMap` also
stores things to make it quicker to access the data in `OccupiedEntry`
but I wasn't sure if we had anything it would be logical to store to
make accessing/modifying the component faster? As such, the differences
are that `OccupiedEntry` assumes the entity has the component (because
nothing else can have an `EntityWorldMut` so it can't be changed outside
the entry api) and has different methods.
All the methods are based very closely off `hashbrown::HashMap` (because
its easier to read the source of) with a couple of quirks like
`OccupiedEntry.insert` doesn't return the old value because we don't
appear to have an api for mem::replacing components.
---
## Changelog
- Added a new function `EntityWorldMut.entry` which returns an `Entry`,
allowing easier in-situ modification of a component.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Pascal Hertleif <killercup@gmail.com>
# Objective
Partially Addresses #10612
fix: add clippy::doc_markdown linting to cargo workspace
Rather than do all the warnings in `tools/ci/src/main.rs` in one-shot,
just wanted to have an initial pr adding the first one to get the form
correct as some may trigger build errors and require changes to get
merged more easily.
## Solution
- adding the doc_markdown and removing it from the ci check as it'll now
be a build error during normal compilation.
---------
Co-authored-by: François <mockersf@gmail.com>
# Objective
- Fixes#10629
- `UntypedAssetId` and `AssetId` (along with `UntypedHandle` and
`Handle`) do not hash to the same values when pointing to the same
`Asset`. Additionally, comparison and conversion between these types
does not follow idiomatic Rust standards.
## Solution
- Added new unit tests to validate/document expected behaviour
- Added trait implementations to make working with Un/Typed values more
ergonomic
- Ensured hashing and comparison between Un/Typed values is consistent
- Removed `From` trait implementations that panic, and replaced them
with `TryFrom`
---
## Changelog
- Ensured `Handle::<A>::hash` and `UntypedHandle::hash` will produce the
same value for the same `Asset`
- Added non-panicing `Handle::<A>::try_typed`
- Added `PartialOrd` to `UntypedHandle` to match `Handle<A>` (this will
return `None` for `UntypedHandles` for differing `Asset` types)
- Added `TryFrom<UntypedHandle>` for `Handle<A>`
- Added `From<Handle<A>>` for `UntypedHandle`
- Removed panicing `From<Untyped...>` implementations. These are
currently unused within the Bevy codebase, and shouldn't be used
externally, hence removal.
- Added cross-implementations of `PartialEq` and `PartialOrd` for
`UntypedHandle` and `Handle<A>` allowing direct comparison when
`TypeId`'s match.
- Near-identical changes to `AssetId` and `UntypedAssetId`
## Migration Guide
If you relied on any of the panicing `From<Untyped...>` implementations,
simply call the existing `typed` methods instead. Alternatively, use the
new `TryFrom` implementation instead to directly expose possible
mistakes.
## Notes
I've made these changes since `Handle` is such a fundamental type to the
entire `Asset` ecosystem within Bevy, and yet it had pretty unclear
behaviour with no direct testing. The fact that hashing untyped vs typed
versions of the same handle would produce different values is something
I expect to cause a very subtle bug for someone else one day.
I haven't included it in this PR to avoid any controversy, but I also
believe the `typed_unchecked` methods should be removed from these
types, or marked as `unsafe`. The `texture_atlas` example currently uses
it, and I believe it is a bad choice. The performance gained by not
type-checking before conversion would be entirely dwarfed by the act of
actually loading an asset and creating its handle anyway. If an end user
is in a tight loop repeatedly calling `typed_unchecked` on an
`UntypedHandle` for the entire runtime of their application, I think the
small performance drop caused by that safety check is ~~a form of cosmic
justice~~ reasonable.
# Objective
- Standardize fmt for toml files
## Solution
- Add [taplo](https://taplo.tamasfe.dev/) to CI (check for fmt and diff
for toml files), for context taplo is used by the most popular extension
in VScode [Even Better
TOML](https://marketplace.visualstudio.com/items?itemName=tamasfe.even-better-toml
- Add contribution section to explain toml fmt with taplo.
Now to pass CI you need to run `taplo fmt --option indent_string=" "` or
if you use vscode have the `Even Better TOML` extension with 4 spaces
for indent
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
- The current shader code is misleading since it makes it look like a
struct is passed to the bind group 0 but in reality only the color is
passed. They just happen to have the exact same memory layout so wgsl
doesn't complain and it works.
- The struct is defined after the `impl Material` block which is
backwards from pretty much every other usage of the `impl` block in
bevy.
## Solution
- Remove the unnecessary struct in the shader
- move the impl block
## Objective
- Give all the intuitive groups of gizmos their own file
- don't be a breaking change
- don't change Gizmos interface
- eventually do arcs too
- future types of gizmos should be in their own files
- see also https://github.com/bevyengine/bevy/issues/9400
## Solution
- Moved `gizmos.circle`, `gizmos.2d_circle`, and assorted helpers into
`circles.rs`
- Can also do arcs in this MR if y'all want; just figured I should do
one thing at a time.
## Changelog
- Changed
- `gizmos::CircleBuilder` moved to `gizmos::circles::Circle2dBuilder`
- `gizmos::Circle2dBuilder` moved to `gizmos::circles::Circle2dBuilder`
## Migration Guide
- change `gizmos::CircleBuilder` to `gizmos::circles::Circle2dBuilder`
- change `gizmos::Circle2dBuilder` to `gizmos::circles::Circle2dBuilder`
---------
Co-authored-by: François <mockersf@gmail.com>
# Objective
This PR adds some helpers for `Triangle2d` to work with its winding
order. This could also be extended to polygons (and `Triangle3d` once
it's added).
## Solution
- Add `WindingOrder` enum with `Clockwise`, `Counterclockwise` and
`Invalid` variants
- `Invalid` is for cases where the winding order can not be reliably
computed, i.e. the points lie on a single line and the area is zero
- Add `Triangle2d::winding_order` method that uses a signed surface area
to determine the winding order
- Add `Triangle2d::reverse` method that reverses the winding order by
swapping the second and third vertices
The API looks like this:
```rust
let mut triangle = Triangle2d::new(
Vec2::new(0.0, 2.0),
Vec2::new(-0.5, -1.2),
Vec2::new(-1.0, -1.0),
);
assert_eq!(triangle.winding_order(), WindingOrder::Clockwise);
// Reverse winding order
triangle.reverse();
assert_eq!(triangle.winding_order(), WindingOrder::Counterclockwise);
```
I also added tests to make sure the methods work correctly. For now,
they live in the same file as the primitives.
## Open questions
- Should it be `Counterclockwise` or `CounterClockwise`? The first one
is more correct but perhaps a bit less readable. Counter-clockwise is
also a valid spelling, but it seems to be a lot less common than
counterclockwise.
- Is `WindingOrder::Invalid` a good name? Parry uses
`TriangleOrientation::Degenerate`, but I'm not a huge fan, at least as a
non-native English speaker. Any better suggestions?
- Is `WindingOrder` fine in `bevy_math::primitives`? It's not specific
to a dimension, so I put it there for now.
# Objective
Fix the `bevy_asset/file_watcher` feature in practice depending on
multithreading, while not informing the user of it.
**As I understand it** (I didn't check it), the file watcher feature
depends on spawning a concurrent thread to receive file update events
from the `notify-debouncer-full` crate. But if multithreading is
disabled, that thread will never have time to read the events and
consume them.
- Fixes#10573
## Solution
Add a `compile_error!` causing compilation failure if `file_watcher` is
enabled while `multi-threaded` is disabled.
This is considered better than adding a dependency on `multi-threaded`
on the `file_watcher`, as (according to @mockersf) toggling on/off
`multi-threaded` has a non-zero chance of changing behavior. And we
shouldn't implicitly change behavior. A compilation failure prevents
compilation of code that is invalid, while informing the user of the
steps needed to fix it.
# Objective
- Sometimes it's very useful to know if a `Transform` contains any `NaN`
or infinite values. It's a bit boiler-plate heavy to check translation,
rotation and scale individually.
## Solution
- Add a new method `is_finite` that returns true if, and only if
translation, rotation and scale all are finite.
- It's a natural extension of `Quat::is_finite`, and `Vec3::is_finite`,
which return true if, and only if all their components' `is_finite()`
returns true.
---
## Changelog
- Added `Transform::is_finite`
# Objective
The `map_async` method involves a type `BufferAsyncError`:
https://docs.rs/bevy/latest/bevy/render/render_resource/struct.BufferSlice.html#method.map_async
This type is not re-exported in Bevy, so if a user wants to store a
struct involving this type they have to add wgpu manually to their
manifest.
## Solution
- Re-export wgpu::BufferAsyncError
---
## Changelog
### Added
- Re-export wgpu::BufferAsyncError
Signed-off-by: Torstein Grindvik <torstein.grindvik@muybridge.com>
Co-authored-by: Torstein Grindvik <torstein.grindvik@muybridge.com>
# Objective
- Fix adding `#![allow(clippy::type_complexity)]` everywhere. like #9796
## Solution
- Use the new [lints] table that will land in 1.74
(https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#lints)
- inherit lint to the workspace, crates and examples.
```
[lints]
workspace = true
```
## Changelog
- Bump rust version to 1.74
- Enable lints table for the workspace
```toml
[workspace.lints.clippy]
type_complexity = "allow"
```
- Allow type complexity for all crates and examples
```toml
[lints]
workspace = true
```
---------
Co-authored-by: Martín Maita <47983254+mnmaita@users.noreply.github.com>
# Objective
- Follow up on https://github.com/bevyengine/bevy/pull/10519, diving
deeper into optimising `Entity` due to the `derive`d `PartialOrd`
`partial_cmp` not being optimal with codegen:
https://github.com/rust-lang/rust/issues/106107
- Fixes#2346.
## Solution
Given the previous PR's solution and the other existing LLVM codegen
bug, there seemed to be a potential further optimisation possible with
`Entity`. In exploring providing manual `PartialOrd` impl, it turned out
initially that the resulting codegen was not immediately better than the
derived version. However, once `Entity` was given `#[repr(align(8)]`,
the codegen improved remarkably, even more once the fields in `Entity`
were rearranged to correspond to a `u64` layout (Rust doesn't
automatically reorder fields correctly it seems). The field order and
`align(8)` additions also improved `to_bits` codegen to be a single
`mov` op. In turn, this led me to replace the previous
"non-shortcircuiting" impl of `PartialEq::eq` to use direct `to_bits`
comparison.
The result was remarkably better codegen across the board, even for
hastable lookups.
The current baseline codegen is as follows:
https://godbolt.org/z/zTW1h8PnY
Assuming the following example struct that mirrors with the existing
`Entity` definition:
```rust
#[derive(Clone, Copy, Eq, PartialEq, PartialOrd, Ord)]
pub struct FakeU64 {
high: u32,
low: u32,
}
```
the output for `to_bits` is as follows:
```
example::FakeU64::to_bits:
shl rdi, 32
mov eax, esi
or rax, rdi
ret
```
Changing the struct to:
```rust
#[derive(Clone, Copy, Eq)]
#[repr(align(8))]
pub struct FakeU64 {
low: u32,
high: u32,
}
```
and providing manual implementations for `PartialEq`/`PartialOrd`/`Ord`,
`to_bits` now optimises to:
```
example::FakeU64::to_bits:
mov rax, rdi
ret
```
The full codegen example for this PR is here for reference:
https://godbolt.org/z/n4Mjx165a
To highlight, `gt` comparison goes from
```
example::greater_than:
cmp edi, edx
jae .LBB3_2
xor eax, eax
ret
.LBB3_2:
setne dl
cmp esi, ecx
seta al
or al, dl
ret
```
to
```
example::greater_than:
cmp rdi, rsi
seta al
ret
```
As explained on Discord by @scottmcm :
>The root issue here, as far as I understand it, is that LLVM's
middle-end is inexplicably unwilling to merge loads if that would make
them under-aligned. It leaves that entirely up to its target-specific
back-end, and thus a bunch of the things that you'd expect it to do that
would fix this just don't happen.
## Benchmarks
Before discussing benchmarks, everything was tested on the following
specs:
AMD Ryzen 7950X 16C/32T CPU
64GB 5200 RAM
AMD RX7900XT 20GB Gfx card
Manjaro KDE on Wayland
I made use of the new entity hashing benchmarks to see how this PR would
improve things there. With the changes in place, I first did an
implementation keeping the existing "non shortcircuit" `PartialEq`
implementation in place, but with the alignment and field ordering
changes, which in the benchmark is the `ord_shortcircuit` column. The
`to_bits` `PartialEq` implementation is the `ord_to_bits` column. The
main_ord column is the current existing baseline from `main` branch.
![Screenshot_20231114_132908](https://github.com/bevyengine/bevy/assets/3116268/cb9090c9-ff74-4cc5-abae-8e4561332261)
My machine is not super set-up for benchmarking, so some results are
within noise, but there's not just a clear improvement between the
non-shortcircuiting implementation, but even further optimisation taking
place with the `to_bits` implementation.
On my machine, a fair number of the stress tests were not showing any
difference (indicating other bottlenecks), but I was able to get a clear
difference with `many_foxes` with a fox count of 10,000:
Test with `cargo run --example many_foxes --features
bevy/trace_tracy,wayland --release -- --count 10000`:
![Screenshot_20231114_144217](https://github.com/bevyengine/bevy/assets/3116268/89bdc21c-7209-43c8-85ae-efbf908bfed3)
On avg, a framerate of about 28-29FPS was improved to 30-32FPS. "This
trace" represents the current PR's perf, while "External trace"
represents the `main` branch baseline.
## Changelog
Changed: micro-optimized Entity align and field ordering as well as
providing manual `PartialOrd`/`Ord` impls to help LLVM optimise further.
## Migration Guide
Any `unsafe` code relying on field ordering of `Entity` or sufficiently
cursed shenanigans should change to reflect the different internal
representation and alignment requirements of `Entity`.
Co-authored-by: james7132 <contact@jamessliu.com>
Co-authored-by: NathanW <nathansward@comcast.net>
Bevy introduced unintentional breaking behaviour along with the v0.12.0
release regarding the `App::set_runner` API. See: #10385, #10389 for
details. We weren't able to catch this before release because this API
is only used internally in one or two places (the very places which
motivated the break).
This commit adds a regression test to help guarantee some expected
behaviour for custom runners, namely that `app::update` won't be called
before the runner has a chance to initialise state.