# Objective
Improve the code quality of the multithreaded executor.
## Solution
* Remove some unused variables.
* Use `Mutex::get_mut` where applicable instead of locking.
* Use a `startup_systems` FixedBitset to pre-compute the starting
systems instead of building it bit-by-bit on startup.
* Instead of using `FixedBitset::clear` and `FixedBitset::union_with`,
use `FixedBitset::clone_from` instead, which does only a single copy and
will not allocate if the target bitset has a large enough allocation.
* Replace the `Mutex` around `Conditions` with `SyncUnsafeCell`, and add
a `Context::try_lock` that forces it to be synchronized fetched
alongside the executor lock.
This might produce minimal performance gains, but the focus here is on
the code quality improvements.
# Objective
- Attempts to solve two items from
https://github.com/bevyengine/bevy/issues/11478.
## Solution
- Moved `intern` module from `bevy_utils` into `bevy_ecs` crate and
updated all relevant imports.
- Moved `label` module from `bevy_utils` into `bevy_ecs` crate and
updated all relevant imports.
---
## Migration Guide
- Replace `bevy_utils::define_label` imports with
`bevy_ecs::define_label` imports.
- Replace `bevy_utils:🏷️:DynEq` imports with
`bevy_ecs:🏷️:DynEq` imports.
- Replace `bevy_utils:🏷️:DynHash` imports with
`bevy_ecs:🏷️:DynHash` imports.
- Replace `bevy_utils::intern::Interned` imports with
`bevy_ecs::intern::Interned` imports.
- Replace `bevy_utils::intern::Internable` imports with
`bevy_ecs::intern::Internable` imports.
- Replace `bevy_utils::intern::Interner` imports with
`bevy_ecs::intern::Interner` imports.
---------
Co-authored-by: James Liu <contact@jamessliu.com>
# Objective
Make it easy for crates.io / lib.rs users or automated tools to find the
repository of `bevy_utils_proc_macros`
## Solution
Add the `repository` field to the `Cargo.toml` of
`bevy_utils_proc_macros`
# Objective
- Fixes#12712
## Solution
- Move the `float_ord.rs` file to `bevy_math`
- Change any `bevy_utils::FloatOrd` statements to `bevy_math::FloatOrd`
---
## Changelog
- Moved `FloatOrd` from `bevy_utils` to `bevy_math`
## Migration Guide
- References to `bevy_utils::FloatOrd` should be changed to
`bevy_math::FloatOrd`
# Objective
Resolves#3824. `unsafe` code should be the exception, not the norm in
Rust. It's obviously needed for various use cases as it's interfacing
with platforms and essentially running the borrow checker at runtime in
the ECS, but the touted benefits of Bevy is that we are able to heavily
leverage Rust's safety, and we should be holding ourselves accountable
to that by minimizing our unsafe footprint.
## Solution
Deny `unsafe_code` workspace wide. Add explicit exceptions for the
following crates, and forbid it in almost all of the others.
* bevy_ecs - Obvious given how much unsafe is needed to achieve
performant results
* bevy_ptr - Works with raw pointers, even more low level than bevy_ecs.
* bevy_render - due to needing to integrate with wgpu
* bevy_window - due to needing to integrate with raw_window_handle
* bevy_utils - Several unsafe utilities used by bevy_ecs. Ideally moved
into bevy_ecs instead of made publicly usable.
* bevy_reflect - Required for the unsafe type casting it's doing.
* bevy_transform - for the parallel transform propagation
* bevy_gizmos - For the SystemParam impls it has.
* bevy_assets - To support reflection. Might not be required, not 100%
sure yet.
* bevy_mikktspace - due to being a conversion from a C library. Pending
safe rewrite.
* bevy_dynamic_plugin - Inherently unsafe due to the dynamic loading
nature.
Several uses of unsafe were rewritten, as they did not need to be using
them:
* bevy_text - a case of `Option::unchecked` could be rewritten as a
normal for loop and match instead of an iterator.
* bevy_color - the Pod/Zeroable implementations were replaceable with
bytemuck's derive macros.
# Objective
- `FloatOrd` currently has a different comparison behavior between its
derived `PartialOrd` impl and manually implemented `Ord` impl (The
[`Ord` doc](https://doc.rust-lang.org/std/cmp/trait.Ord.html) says this
is a logic error). This might be a problem for some `std`
containers/algorithms if they rely on both matching, and a footgun for
Bevy users.
## Solution
- Replace the `PartialEq` and `Ord` impls of `FloatOrd` with some
equivalent ones producing [better
assembly.](https://godbolt.org/z/jaWbjnMKx)
- Manually derive `PartialOrd` with the same behavior as `Ord`,
implement the comparison operators.
- Add some tests.
I first tried using a match-based implementation similar to the
`PartialOrd` impl [of the
std](https://doc.rust-lang.org/src/core/cmp.rs.html#1457) (with added
NaN ordering) but I couldn't get it to produce non-branching assembly.
The current implementation is based on [the one from the `ordered_float`
crate](3641f59e31/src/lib.rs (L121)),
adapted since it uses a different ordering. Should this be mentionned
somewhere in the code?
---
## Changelog
### Fixed
- `FloatOrd` now uses the same ordering for its `PartialOrd` and `Ord`
implementations.
## Migration Guide
- If you were depending on the `PartialOrd` behaviour of `FloatOrd`, it
has changed from matching `f32` to matching `FloatOrd`'s `Ord` ordering,
never returning `None`.
# Objective
Currently the built docs only shows the logo and favicon for the top
level `bevy` crate. This makes views like
https://docs.rs/bevy_ecs/latest/bevy_ecs/ look potentially unrelated to
the project at first glance.
## Solution
Reproduce the docs attributes for every crate that Bevy publishes.
Ideally this would be done with some workspace level Cargo.toml control,
but AFAICT, such support does not exist.
# Objective
Simplify implementing some asset traits without Box::pin(async move{})
shenanigans.
Fixes (in part) https://github.com/bevyengine/bevy/issues/11308
## Solution
Use async-fn in traits when possible in all traits. Traits with return
position impl trait are not object safe however, and as AssetReader and
AssetWriter are both used with dynamic dispatch, you need a Boxed
version of these futures anyway.
In the future, Rust is [adding
](https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html)proc
macros to generate these traits automatically, and at some point in the
future dyn traits should 'just work'. Until then.... this seemed liked
the right approach given more ErasedXXX already exist, but, no clue if
there's plans here! Especially since these are public now, it's a bit of
an unfortunate API, and means this is a breaking change.
In theory this saves some performance when these traits are used with
static dispatch, but, seems like most code paths go through dynamic
dispatch, which boxes anyway.
I also suspect a bunch of the lifetime annotations on these function
could be simplified now as the BoxedFuture was often the only thing
returned which needed a lifetime annotation, but I'm not touching that
for now as traits + lifetimes can be so tricky.
This is a revival of
[pull/11362](https://github.com/bevyengine/bevy/pull/11362) after a
spectacular merge f*ckup, with updates to the latest Bevy. Just to recap
some discussion:
- Overall this seems like a win for code quality, especially when
implementing these traits, but a loss for having to deal with ErasedXXX
variants.
- `ConditionalSend` was the preferred name for the trait that might be
Send, to deal with wasm platforms.
- When reviewing be sure to disable whitespace difference, as that's 95%
of the PR.
## Changelog
- AssetReader, AssetWriter, AssetLoader, AssetSaver and Process now use
async-fn in traits rather than boxed futures.
## Migration Guide
- Custom implementations of AssetReader, AssetWriter, AssetLoader,
AssetSaver and Process should switch to async fn rather than returning a
bevy_utils::BoxedFuture.
- Simultaniously, to use dynamic dispatch on these traits you should
instead use dyn ErasedXXX.
# Objective
Make bevy_utils less of a compilation bottleneck. Tackle #11478.
## Solution
* Move all of the directly reexported dependencies and move them to
where they're actually used.
* Remove the UUID utilities that have gone unused since `TypePath` took
over for `TypeUuid`.
* There was also a extraneous bytemuck dependency on `bevy_core` that
has not been used for a long time (since `encase` became the primary way
to prepare GPU buffers).
* Remove the `all_tuples` macro reexport from bevy_ecs since it's
accessible from `bevy_utils`.
---
## Changelog
Removed: Many of the reexports from bevy_utils (petgraph, uuid, nonmax,
smallvec, and thiserror).
Removed: bevy_core's reexports of bytemuck.
## Migration Guide
bevy_utils' reexports of petgraph, uuid, nonmax, smallvec, and thiserror
have been removed.
bevy_core' reexports of bytemuck's types has been removed.
Add them as dependencies in your own crate instead.
# Objective
`bevy_utils::Entry` is only useful when using
`BuildHasherDefault<AHasher>`. It would be great if we didn't have to
write out `bevy_utils::hashbrown::hash_map::Entry` whenever we want to
use a different `BuildHasher`, such as when working with
`bevy_utils::TypeIdMap`.
## Solution
Give `bevy_utils::Entry` a new optional type parameter for defining a
custom `BuildHasher`, such as `NoOpHash`. This parameter defaults to
`BuildHasherDefault<AHasher>`— the `BuildHasher` used by
`bevy_utils::HashMap`.
---
## Changelog
- Added an optional third type parameter to `bevy_utils::Entry` to
specify a custom `BuildHasher`
Fixes#12016.
Bump version after release
This PR has been auto-generated
Co-authored-by: Bevy Auto Releaser <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: François <mockersf@gmail.com>
# Objective
There's a repeating pattern of `ThreadLocal<Cell<Vec<T>>>` which is very
useful for low overhead, low contention multithreaded queues that have
cropped up in a few places in the engine. This pattern is surprisingly
useful when building deferred mutation across multiple threads, as noted
by it's use in `ParallelCommands`.
However, `ThreadLocal<Cell<Vec<T>>>` is not only a mouthful, it's also
hard to ensure the thread-local queue is replaced after it's been
temporarily removed from the `Cell`.
## Solution
Wrap the pattern into `bevy_utils::Parallel<T>` which codifies the
entire pattern and ensures the user follows the contract. Instead of
fetching indivdual cells, removing the value, mutating it, and replacing
it, `Parallel::get` returns a `ParRef<'a, T>` which contains the
temporarily removed value and a reference back to the cell, and will
write the mutated value back to the cell upon being dropped.
I would like to use this to simplify the remaining part of #4899 that
has not been adopted/merged.
---
## Changelog
TODO
---------
Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
# Objective
Bevy's animation system currently does tree traversals based on `Name`
that aren't necessary. Not only do they require in unsafe code because
tree traversals are awkward with parallelism, but they are also somewhat
slow, brittle, and complex, which manifested itself as way too many
queries in #11670.
# Solution
Divide animation into two phases: animation *advancement* and animation
*evaluation*, which run after one another. *Advancement* operates on the
`AnimationPlayer` and sets the current animation time to match the game
time. *Evaluation* operates on all animation bones in the scene in
parallel and sets the transforms and/or morph weights based on the time
and the clip.
To do this, we introduce a new component, `AnimationTarget`, which the
asset loader places on every bone. It contains the ID of the entity
containing the `AnimationPlayer`, as well as a UUID that identifies
which bone in the animation the target corresponds to. In the case of
glTF, the UUID is derived from the full path name to the bone. The rule
that `AnimationTarget`s are descendants of the entity containing
`AnimationPlayer` is now just a convention, not a requirement; this
allows us to eliminate the unsafe code.
# Migration guide
* `AnimationClip` now uses UUIDs instead of hierarchical paths based on
the `Name` component to refer to bones. This has several consequences:
- A new component, `AnimationTarget`, should be placed on each bone that
you wish to animate, in order to specify its UUID and the associated
`AnimationPlayer`. The glTF loader automatically creates these
components as necessary, so most uses of glTF rigs shouldn't need to
change.
- Moving a bone around the tree, or renaming it, no longer prevents an
`AnimationPlayer` from affecting it.
- Dynamically changing the `AnimationPlayer` component will likely
require manual updating of the `AnimationTarget` components.
* Entities with `AnimationPlayer` components may now possess descendants
that also have `AnimationPlayer` components. They may not, however,
animate the same bones.
* As they aren't specific to `TypeId`s,
`bevy_reflect::utility::NoOpTypeIdHash` and
`bevy_reflect::utility::NoOpTypeIdHasher` have been renamed to
`bevy_reflect::utility::NoOpHash` and
`bevy_reflect::utility::NoOpHasher` respectively.
# Objective
Reduce the size of `bevy_utils`
(https://github.com/bevyengine/bevy/issues/11478)
## Solution
Move `EntityHash` related types into `bevy_ecs`. This also allows us
access to `Entity`, which means we no longer need `EntityHashMap`'s
first generic argument.
---
## Changelog
- Moved `bevy::utils::{EntityHash, EntityHasher, EntityHashMap,
EntityHashSet}` into `bevy::ecs::entity::hash` .
- Removed `EntityHashMap`'s first generic argument. It is now hardcoded
to always be `Entity`.
## Migration Guide
- Uses of `bevy::utils::{EntityHash, EntityHasher, EntityHashMap,
EntityHashSet}` now have to be imported from `bevy::ecs::entity::hash`.
- Uses of `EntityHashMap` no longer have to specify the first generic
parameter. It is now hardcoded to always be `Entity`.
# Objective
`bevy_utils` only requires aHash 0.8.3, which is broken on Rust 1.7.6:
```
error: could not compile `ahash` (lib) due to 1 previous error
error[E0635]: unknown feature `stdsimd`
```
See https://github.com/tkaitchuck/aHash/issues/200
This is fixed in aHash 0.8.7, so require at least that version
(Cargo.lock is already up to date).
# Objective
- The exported hashtypes are just re-exports from hashbrown, we want to
drop that dependency and (in the future) let the user import their own
choice.
- Fixes#11717
## Solution
- Adding a deprecated tag on the re-exports, so in future releases these
can be safely removed.
# Objective
We currently over/underpromise hash stability:
- `HashMap`/`HashSet` use `BuildHasherDefault<AHasher>` instead of
`RandomState`. As a result, the hash is stable within the same run.
- [aHash isn't stable between devices (and
versions)](https://github.com/tkaitchuck/ahash?tab=readme-ov-file#goals-and-non-goals),
yet it's used for `StableHashMap`/`StableHashSet`
- the specialized hashmaps are stable
Interestingly, `StableHashMap`/`StableHashSet` aren't used by Bevy
itself (anymore).
## Solution
Add/fix documentation
## Alternatives
For `StableHashMap`/`StableHashSet`:
- remove them
- revive #7107
---
## Changelog
- added iteration stability guarantees for different hashmaps
# Objective
- Pipeline compilation is slow and blocks the frame
- Closes https://github.com/bevyengine/bevy/issues/8224
## Solution
- Compile pipelines in a Task on the AsyncComputeTaskPool
---
## Changelog
- Render/compute pipeline compilation is now done asynchronously over
multiple frames when the multi-threaded feature is enabled and on
non-wasm and non-macOS platforms
- Added `CachedPipelineState::Creating`
- Added `PipelineCache::block_on_render_pipeline()`
- Added `bevy_utils::futures::check_ready`
- Added `bevy_render/multi-threaded` cargo feature
## Migration Guide
- Match on the new `Creating` variant for exhaustive matches of
`CachedPipelineState`
Use `TypeIdMap<T>` instead of `HashMap<TypeId, T>`
- ~~`TypeIdMap` was in `bevy_ecs`. I've kept it there because of
#11478~~
- ~~I haven't swapped `bevy_reflect` over because it doesn't depend on
`bevy_ecs`, but I'd also be happy with moving `TypeIdMap` to
`bevy_utils` and then adding a dependency to that~~
- ~~this is a slight change in the public API of
`DrawFunctionsInternal`, does this need to go in the changelog?~~
## Changelog
- moved `TypeIdMap` to `bevy_utils`
- changed `DrawFunctionsInternal::indices` to `TypeIdMap`
## Migration Guide
- `TypeIdMap` now lives in `bevy_utils`
- `DrawFunctionsInternal::indices` now uses a `TypeIdMap`.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
Currently the `missing_docs` lint is allowed-by-default and enabled at
crate level when their documentations is complete (see #3492).
This PR proposes to inverse this logic by making `missing_docs`
warn-by-default and mark crates with imcomplete docs allowed.
## Solution
Makes `missing_docs` warn at workspace level and allowed at crate level
when the docs is imcomplete.
# Objective
- Allow `HashMap<Cow<'_, T>, _>` to use `&T` as key for `HashMap::get`
- Makes `CowArc` more like `Cow`
## Solution
Implements `Borrow<T>` and `AsRef<T>` for `CowArc<T>`.
# Objective
When working within `bevy_ecs`, we can't use the `log_once` macros due
to their placement in `bevy_log` - which depends on `bevy_ecs`. All this
create does is migrate those macros to the `bevy_utils` crate, while
still re-exporting them in `bevy_log`.
created to resolve this:
https://github.com/bevyengine/bevy/pull/11417#discussion_r1458100211
---------
Co-authored-by: François <mockersf@gmail.com>
# Objective
- We want to use `static_assertions` to perform precise compile time
checks at testing time. In this PR, we add those checks to make sure
that `EntityHashMap` and `PreHashMap` are `Clone` (and we replace the
more clumsy previous tests)
- Fixes#11181
(will need to be rebased once
https://github.com/bevyengine/bevy/pull/11178 is merged)
---------
Co-authored-by: Charles Bournhonesque <cbournhonesque@snapchat.com>
# Objective
- `EntityHashMap`, `EntityHashSet` and `PreHashMap` are currently not
Cloneable because of a missing trivial `Clone` bound for `EntityHash`
and `PreHash`. This PR makes them Cloneable.
(the parent struct `hashbrown::HashMap` requires the `HashBuilder` to be
`Clone` for the `HashMap` to be `Clone`, see:
https://github.com/rust-lang/hashbrown/blob/master/src/map.rs#L195)
## Solution
- Add a `Clone` bound to `PreHash` and `EntityHash`
---------
Co-authored-by: Charles Bournhonesque <cbournhonesque@snapchat.com>
# Objective
Fix ci hang, so we can merge pr's again.
## Solution
- switch ppa action to use mesa stable versions
https://launchpad.net/~kisak/+archive/ubuntu/turtle
- use commit from #11123
---------
Co-authored-by: Stepan Koltsov <stepan.koltsov@gmail.com>
# Objective
- Make the implementation order consistent between all sources to fit
the order in the trait.
## Solution
- Change the implementation order.
Matches versioning & features from other Cargo.toml files in the
project.
# Objective
Resolves#10932
## Solution
Added smallvec to the bevy_utils cargo.toml and added a line to
re-export the crate. Target version and features set to match what's
used in the other bevy crates.
# Objective
- Update winit dependency to 0.29
## Changelog
### KeyCode changes
- Removed `ScanCode`, as it was [replaced by
KeyCode](https://github.com/rust-windowing/winit/blob/master/CHANGELOG.md#0292).
- `ReceivedCharacter.char` is now a `SmolStr`, [relevant
doc](https://docs.rs/winit/latest/winit/event/struct.KeyEvent.html#structfield.text).
- Changed most `KeyCode` values, and added more.
KeyCode has changed meaning. With this PR, it refers to physical
position on keyboard rather than the printed letter on keyboard keys.
In practice this means:
- On QWERTY keyboard layouts, nothing changes
- On any other keyboard layout, `KeyCode` no longer reflects the label
on key.
- This is "good". In bevy 0.12, when you used WASD for movement, users
with non-QWERTY keyboards couldn't play your game! This was especially
bad for non-latin keyboards. Now, WASD represents the physical keys. A
French player will press the ZQSD keys, which are near each other,
Kyrgyz players will use "Цфыв".
- This is "bad" as well. You can't know in advance what the label of the
key for input is. Your UI says "press WASD to move", even if in reality,
they should be pressing "ZQSD" or "Цфыв". You also no longer can use
`KeyCode` for text inputs. In any case, it was a pretty bad API for text
input. You should use `ReceivedCharacter` now instead.
### Other changes
- Use `web-time` rather than `instant` crate.
(https://github.com/rust-windowing/winit/pull/2836)
- winit did split `run_return` in `run_onDemand` and `pump_events`, I
did the same change in bevy_winit and used `pump_events`.
- Removed `return_from_run` from `WinitSettings` as `winit::run` now
returns on supported platforms.
- I left the example "return_after_run" as I think it's still useful.
- This winit change is done partly to allow to create a new window after
quitting all windows: https://github.com/emilk/egui/issues/1918 ; this
PR doesn't address.
- added `width` and `height` properties in the `canvas` from wasm
example
(https://github.com/bevyengine/bevy/pull/10702#discussion_r1420567168)
## Known regressions (important follow ups?)
- Provide an API for reacting when a specific key from current layout
was released.
- possible solutions: use winit::Key from winit::KeyEvent ; mapping
between KeyCode and Key ; or .
- We don't receive characters through alt+numpad (e.g. alt + 151 = "ù")
anymore ; reproduced on winit example "ime". maybe related to
https://github.com/rust-windowing/winit/issues/2945
- (windows) Window content doesn't refresh at all when resizing. By
reading https://github.com/rust-windowing/winit/issues/2900 ; I suspect
we should just fire a `window.request_redraw();` from `AboutToWait`, and
handle actual redrawing within `RedrawRequested`. I'm not sure how to
move all that code so I'd appreciate it to be a follow up.
- (windows) unreleased winit fix for using set_control_flow in
AboutToWait https://github.com/rust-windowing/winit/issues/3215 ; ⚠️ I'm
not sure what the implications are, but that feels bad 🤔
## Follow up
I'd like to avoid bloating this PR, here are a few follow up tasks
worthy of a separate PR, or new issue to track them once this PR is
closed, as they would either complicate reviews, or at risk of being
controversial:
- remove CanvasParentResizePlugin
(https://github.com/bevyengine/bevy/pull/10702#discussion_r1417068856)
- avoid mentionning explicitly winit in docs from bevy_window ?
- NamedKey integration on bevy_input:
https://github.com/rust-windowing/winit/pull/3143 introduced a new
NamedKey variant. I implemented it only on the converters but we'd
benefit making the same changes to bevy_input.
- Add more info in KeyboardInput
https://github.com/bevyengine/bevy/pull/10702#pullrequestreview-1748336313
- https://github.com/bevyengine/bevy/pull/9905 added a workaround on a
bug allegedly fixed by winit 0.29. We should check if it's still
necessary.
- update to raw_window_handle 0.6
- blocked by wgpu
- Rename `KeyCode` to `PhysicalKeyCode`
https://github.com/bevyengine/bevy/pull/10702#discussion_r1404595015
- remove `instant` dependency, [replaced
by](https://github.com/rust-windowing/winit/pull/2836) `web_time`), we'd
need to update to :
- fastrand >= 2.0
- [`async-executor`](https://github.com/smol-rs/async-executor) >= 1.7
- [`futures-lite`](https://github.com/smol-rs/futures-lite) >= 2.0
- Verify license, see
[discussion](https://github.com/bevyengine/bevy/pull/8745#discussion_r1402439800)
- we might be missing a short notice or description of changes made
- Consider using https://github.com/rust-windowing/cursor-icon directly
rather than vendoring it in bevy.
- investigate [this
unwrap](https://github.com/bevyengine/bevy/pull/8745#discussion_r1387044986)
(`winit_window.canvas().unwrap();`)
- Use more good things about winit's update
- https://github.com/bevyengine/bevy/pull/10689#issuecomment-1823560428
## Migration Guide
This PR should have one.
# Objective
`Instant` and `Duration` from the `instant` crate are exposed in
`bevy_utils` to have a single abstraction for native/wasm.
It would be useful to have the same thing for
[`SystemTime`](https://doc.rust-lang.org/std/time/struct.SystemTime.html).
---
## Changelog
### Added
- `bevy_utils` now re-exposes the `instant::SystemTime` struct
Co-authored-by: Charles Bournhonesque <cbournhonesque@snapchat.com>
# Objective
- Shorten paths by removing unnecessary prefixes
## Solution
- Remove the prefixes from many paths which do not need them. Finding
the paths was done automatically using built-in refactoring tools in
Jetbrains RustRover.
# Objective
Keep essentially the same structure of `EntityHasher` from #9903, but
rephrase the multiplication slightly to save an instruction.
cc @superdump
Discord thread:
https://discord.com/channels/691052431525675048/1172033156845674507/1174969772522356756
## Solution
Today, the hash is
```rust
self.hash = i | (i.wrapping_mul(FRAC_U64MAX_PI) << 32);
```
with `i` being `(generation << 32) | index`.
Expanding things out, we get
```rust
i | ( (i * CONST) << 32 )
= (generation << 32) | index | ((((generation << 32) | index) * CONST) << 32)
= (generation << 32) | index | ((index * CONST) << 32) // because the generation overflowed
= (index * CONST | generation) << 32 | index
```
What if we do the same thing, but with `+` instead of `|`? That's almost
the same thing, except that it has carries, which are actually often
better in a hash function anyway, since it doesn't saturate. (`|` can be
dangerous, since once something becomes `-1` it'll stay that, and
there's no mixing available.)
```rust
(index * CONST + generation) << 32 + index
= (CONST << 32 + 1) * index + generation << 32
= (CONST << 32 + 1) * index + (WHATEVER << 32 + generation) << 32 // because the extra overflows and thus can be anything
= (CONST << 32 + 1) * index + ((CONST * generation) << 32 + generation) << 32 // pick "whatever" to be something convenient
= (CONST << 32 + 1) * index + ((CONST << 32 + 1) * generation) << 32
= (CONST << 32 + 1) * index +((CONST << 32 + 1) * (generation << 32)
= (CONST << 32 + 1) * (index + generation << 32)
= (CONST << 32 + 1) * (generation << 32 | index)
= (CONST << 32 + 1) * i
```
So we can do essentially the same thing using a single multiplication
instead of doing multiply-shift-or.
LLVM was already smart enough to merge the shifting into a
multiplication, but this saves the extra `or`:
![image](https://github.com/bevyengine/bevy/assets/18526288/d9396614-2326-4730-abbe-4908c01b5ace)
<https://rust.godbolt.org/z/MEvbz4eo4>
It's a very small change, and often will disappear in load latency
anyway, but it's a couple percent faster in lookups:
![image](https://github.com/bevyengine/bevy/assets/18526288/c365ec85-6adc-4f6d-8fa6-a65146f55a75)
(There was more of an improvement here before #10558, but with `to_bits`
being a single `qword` load now, keeping things mostly as it is turned
out to be better than the bigger changes I'd tried in #10605.)
---
## Changelog
(Probably skip it)
## Migration Guide
(none needed)
# Objective
The `generate_composite_uuid` utility function hidden in
`bevy_reflect::__macro_exports` could be generally useful to users.
For example, I previously relied on `Hash` to generate a `u64` to create
a deterministic `HandleId`. In v0.12, `HandleId` has been replaced by
`AssetId` which now requires a `Uuid`, which I could generate with this
function.
## Solution
Relocate `generate_composite_uuid` from `bevy_reflect::__macro_exports`
to `bevy_utils::uuid`.
It is still re-exported under `bevy_reflect::__macro_exports` so there
should not be any breaking changes (although, users should generally not
rely on pseudo-private/hidden modules like `__macro_exports`).
I chose to keep it in `bevy_reflect::__macro_exports` so as to not
clutter up our public API and to reduce the number of changes in this
PR. We could have also marked the export as `#[doc(hidden)]`, but
personally I like that we have a dedicated module for this (makes it
clear what is public and what isn't when just looking at the macro
code).
---
## Changelog
- Moved `generate_composite_uuid` to `bevy_utils::uuid` and made it
public
- Note: it was technically already public, just hidden
# Objective
Enables warning on `clippy::undocumented_unsafe_blocks` across the
workspace rather than only in `bevy_ecs`, `bevy_transform` and
`bevy_utils`. This adds a little awkwardness in a few areas of code that
have trivial safety or explain safety for multiple unsafe blocks with
one comment however automatically prevents these comments from being
missed.
## Solution
This adds `undocumented_unsafe_blocks = "warn"` to the workspace
`Cargo.toml` and fixes / adds a few missed safety comments. I also added
`#[allow(clippy::undocumented_unsafe_blocks)]` where the safety is
explained somewhere above.
There are a couple of safety comments I added I'm not 100% sure about in
`bevy_animation` and `bevy_render/src/view` and I'm not sure about the
use of `#[allow(clippy::undocumented_unsafe_blocks)]` compared to adding
comments like `// SAFETY: See above`.
# Objective
- Fix adding `#![allow(clippy::type_complexity)]` everywhere. like #9796
## Solution
- Use the new [lints] table that will land in 1.74
(https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#lints)
- inherit lint to the workspace, crates and examples.
```
[lints]
workspace = true
```
## Changelog
- Bump rust version to 1.74
- Enable lints table for the workspace
```toml
[workspace.lints.clippy]
type_complexity = "allow"
```
- Allow type complexity for all crates and examples
```toml
[lints]
workspace = true
```
---------
Co-authored-by: Martín Maita <47983254+mnmaita@users.noreply.github.com>
Preparing next release
This PR has been auto-generated
---------
Co-authored-by: Bevy Auto Releaser <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: François <mockersf@gmail.com>
# Objective
First of all, this PR took heavy inspiration from #7760 and #5715. It
intends to also fix#5569, but with a slightly different approach.
This also fixes#9335 by reexporting `DynEq`.
## Solution
The advantage of this API is that we can intern a value without
allocating for zero-sized-types and for enum variants that have no
fields. This PR does this automatically in the `SystemSet` and
`ScheduleLabel` derive macros for unit structs and fieldless enum
variants. So this should cover many internal and external use cases of
`SystemSet` and `ScheduleLabel`. In these optimal use cases, no memory
will be allocated.
- The interning returns a `Interned<dyn SystemSet>`, which is just a
wrapper around a `&'static dyn SystemSet`.
- `Hash` and `Eq` are implemented in terms of the pointer value of the
reference, similar to my first approach of anonymous system sets in
#7676.
- Therefore, `Interned<T>` does not implement `Borrow<T>`, only `Deref`.
- The debug output of `Interned<T>` is the same as the interned value.
Edit:
- `AppLabel` is now also interned and the old
`derive_label`/`define_label` macros were replaced with the new
interning implementation.
- Anonymous set ids are reused for different `Schedule`s, reducing the
amount of leaked memory.
### Pros
- `InternedSystemSet` and `InternedScheduleLabel` behave very similar to
the current `BoxedSystemSet` and `BoxedScheduleLabel`, but can be copied
without an allocation.
- Many use cases don't allocate at all.
- Very fast lookups and comparisons when using `InternedSystemSet` and
`InternedScheduleLabel`.
- The `intern` module might be usable in other areas.
- `Interned{ScheduleLabel, SystemSet, AppLabel}` does implement
`{ScheduleLabel, SystemSet, AppLabel}`, increasing ergonomics.
### Cons
- Implementors of `SystemSet` and `ScheduleLabel` still need to
implement `Hash` and `Eq` (and `Clone`) for it to work.
## Changelog
### Added
- Added `intern` module to `bevy_utils`.
- Added reexports of `DynEq` to `bevy_ecs` and `bevy_app`.
### Changed
- Replaced `BoxedSystemSet` and `BoxedScheduleLabel` with
`InternedSystemSet` and `InternedScheduleLabel`.
- Replaced `impl AsRef<dyn ScheduleLabel>` with `impl ScheduleLabel`.
- Replaced `AppLabelId` with `InternedAppLabel`.
- Changed `AppLabel` to use `Debug` for error messages.
- Changed `AppLabel` to use interning.
- Changed `define_label`/`derive_label` to use interning.
- Replaced `define_boxed_label`/`derive_boxed_label` with
`define_label`/`derive_label`.
- Changed anonymous set ids to be only unique inside a schedule, not
globally.
- Made interned label types implement their label trait.
### Removed
- Removed `define_boxed_label` and `derive_boxed_label`.
## Migration guide
- Replace `BoxedScheduleLabel` and `Box<dyn ScheduleLabel>` with
`InternedScheduleLabel` or `Interned<dyn ScheduleLabel>`.
- Replace `BoxedSystemSet` and `Box<dyn SystemSet>` with
`InternedSystemSet` or `Interned<dyn SystemSet>`.
- Replace `AppLabelId` with `InternedAppLabel` or `Interned<dyn
AppLabel>`.
- Types manually implementing `ScheduleLabel`, `AppLabel` or `SystemSet`
need to implement:
- `dyn_hash` directly instead of implementing `DynHash`
- `as_dyn_eq`
- Pass labels to `World::try_schedule_scope`, `World::schedule_scope`,
`World::try_run_schedule`. `World::run_schedule`, `Schedules::remove`,
`Schedules::remove_entry`, `Schedules::contains`, `Schedules::get` and
`Schedules::get_mut` by value instead of by reference.
---------
Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective
Simplify bind group creation code. alternative to (and based on) #9476
## Solution
- Add a `BindGroupEntries` struct that can transparently be used where
`&[BindGroupEntry<'b>]` is required in BindGroupDescriptors.
Allows constructing the descriptor's entries as:
```rust
render_device.create_bind_group(
"my_bind_group",
&my_layout,
&BindGroupEntries::with_indexes((
(2, &my_sampler),
(3, my_uniform),
)),
);
```
instead of
```rust
render_device.create_bind_group(
"my_bind_group",
&my_layout,
&[
BindGroupEntry {
binding: 2,
resource: BindingResource::Sampler(&my_sampler),
},
BindGroupEntry {
binding: 3,
resource: my_uniform,
},
],
);
```
or
```rust
render_device.create_bind_group(
"my_bind_group",
&my_layout,
&BindGroupEntries::sequential((&my_sampler, my_uniform)),
);
```
instead of
```rust
render_device.create_bind_group(
"my_bind_group",
&my_layout,
&[
BindGroupEntry {
binding: 0,
resource: BindingResource::Sampler(&my_sampler),
},
BindGroupEntry {
binding: 1,
resource: my_uniform,
},
],
);
```
the structs has no user facing macros, is tuple-type-based so stack
allocated, and has no noticeable impact on compile time.
- Also adds a `DynamicBindGroupEntries` struct with a similar api that
uses a `Vec` under the hood and allows extending the entries.
- Modifies `RenderDevice::create_bind_group` to take separate arguments
`label`, `layout` and `entries` instead of a `BindGroupDescriptor`
struct. The struct can't be stored due to the internal references, and
with only 3 members arguably does not add enough context to justify
itself.
- Modify the codebase to use the new api and the `BindGroupEntries` /
`DynamicBindGroupEntries` structs where appropriate (whenever the
entries slice contains more than 1 member).
## Migration Guide
- Calls to `RenderDevice::create_bind_group({BindGroupDescriptor {
label, layout, entries })` must be amended to
`RenderDevice::create_bind_group(label, layout, entries)`.
- If `label`s have been specified as `"bind_group_name".into()`, they
need to change to just `"bind_group_name"`. `Some("bind_group_name")`
and `None` will still work, but `Some("bind_group_name")` can optionally
be simplified to just `"bind_group_name"`.
---------
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
This adds support for **Multiple Asset Sources**. You can now register a
named `AssetSource`, which you can load assets from like you normally
would:
```rust
let shader: Handle<Shader> = asset_server.load("custom_source://path/to/shader.wgsl");
```
Notice that `AssetPath` now supports `some_source://` syntax. This can
now be accessed through the `asset_path.source()` accessor.
Asset source names _are not required_. If one is not specified, the
default asset source will be used:
```rust
let shader: Handle<Shader> = asset_server.load("path/to/shader.wgsl");
```
The behavior of the default asset source has not changed. Ex: the
`assets` folder is still the default.
As referenced in #9714
## Why?
**Multiple Asset Sources** enables a number of often-asked-for
scenarios:
* **Loading some assets from other locations on disk**: you could create
a `config` asset source that reads from the OS-default config folder
(not implemented in this PR)
* **Loading some assets from a remote server**: you could register a new
`remote` asset source that reads some assets from a remote http server
(not implemented in this PR)
* **Improved "Binary Embedded" Assets**: we can use this system for
"embedded-in-binary assets", which allows us to replace the old
`load_internal_asset!` approach, which couldn't support asset
processing, didn't support hot-reloading _well_, and didn't make
embedded assets accessible to the `AssetServer` (implemented in this pr)
## Adding New Asset Sources
An `AssetSource` is "just" a collection of `AssetReader`, `AssetWriter`,
and `AssetWatcher` entries. You can configure new asset sources like
this:
```rust
app.register_asset_source(
"other",
AssetSource::build()
.with_reader(|| Box::new(FileAssetReader::new("other")))
)
)
```
Note that `AssetSource` construction _must_ be repeatable, which is why
a closure is accepted.
`AssetSourceBuilder` supports `with_reader`, `with_writer`,
`with_watcher`, `with_processed_reader`, `with_processed_writer`, and
`with_processed_watcher`.
Note that the "asset source" system replaces the old "asset providers"
system.
## Processing Multiple Sources
The `AssetProcessor` now supports multiple asset sources! Processed
assets can refer to assets in other sources and everything "just works".
Each `AssetSource` defines an unprocessed and processed `AssetReader` /
`AssetWriter`.
Currently this is all or nothing for a given `AssetSource`. A given
source is either processed or it is not. Later we might want to add
support for "lazy asset processing", where an `AssetSource` (such as a
remote server) can be configured to only process assets that are
directly referenced by local assets (in order to save local disk space
and avoid doing extra work).
## A new `AssetSource`: `embedded`
One of the big features motivating **Multiple Asset Sources** was
improving our "embedded-in-binary" asset loading. To prove out the
**Multiple Asset Sources** implementation, I chose to build a new
`embedded` `AssetSource`, which replaces the old `load_interal_asset!`
system.
The old `load_internal_asset!` approach had a number of issues:
* The `AssetServer` was not aware of (or capable of loading) internal
assets.
* Because internal assets weren't visible to the `AssetServer`, they
could not be processed (or used by assets that are processed). This
would prevent things "preprocessing shaders that depend on built in Bevy
shaders", which is something we desperately need to start doing.
* Each "internal asset" needed a UUID to be defined in-code to reference
it. This was very manual and toilsome.
The new `embedded` `AssetSource` enables the following pattern:
```rust
// Called in `crates/bevy_pbr/src/render/mesh.rs`
embedded_asset!(app, "mesh.wgsl");
// later in the app
let shader: Handle<Shader> = asset_server.load("embedded://bevy_pbr/render/mesh.wgsl");
```
Notice that this always treats the crate name as the "root path", and it
trims out the `src` path for brevity. This is generally predictable, but
if you need to debug you can use the new `embedded_path!` macro to get a
`PathBuf` that matches the one used by `embedded_asset`.
You can also reference embedded assets in arbitrary assets, such as WGSL
shaders:
```rust
#import "embedded://bevy_pbr/render/mesh.wgsl"
```
This also makes `embedded` assets go through the "normal" asset
lifecycle. They are only loaded when they are actually used!
We are also discussing implicitly converting asset paths to/from shader
modules, so in the future (not in this PR) you might be able to load it
like this:
```rust
#import bevy_pbr::render::mesh::Vertex
```
Compare that to the old system!
```rust
pub const MESH_SHADER_HANDLE: Handle<Shader> = Handle::weak_from_u128(3252377289100772450);
load_internal_asset!(app, MESH_SHADER_HANDLE, "mesh.wgsl", Shader::from_wgsl);
// The mesh asset is the _only_ accessible via MESH_SHADER_HANDLE and _cannot_ be loaded via the AssetServer.
```
## Hot Reloading `embedded`
You can enable `embedded` hot reloading by enabling the
`embedded_watcher` cargo feature:
```
cargo run --features=embedded_watcher
```
## Improved Hot Reloading Workflow
First: the `filesystem_watcher` cargo feature has been renamed to
`file_watcher` for brevity (and to match the `FileAssetReader` naming
convention).
More importantly, hot asset reloading is no longer configured in-code by
default. If you enable any asset watcher feature (such as `file_watcher`
or `rust_source_watcher`), asset watching will be automatically enabled.
This removes the need to _also_ enable hot reloading in your app code.
That means you can replace this:
```rust
app.add_plugins(DefaultPlugins.set(AssetPlugin::default().watch_for_changes()))
```
with this:
```rust
app.add_plugins(DefaultPlugins)
```
If you want to hot reload assets in your app during development, just
run your app like this:
```
cargo run --features=file_watcher
```
This means you can use the same code for development and deployment! To
deploy an app, just don't include the watcher feature
```
cargo build --release
```
My intent is to move to this approach for pretty much all dev workflows.
In a future PR I would like to replace `AssetMode::ProcessedDev` with a
`runtime-processor` cargo feature. We could then group all common "dev"
cargo features under a single `dev` feature:
```sh
# this would enable file_watcher, embedded_watcher, runtime-processor, and more
cargo run --features=dev
```
## AssetMode
`AssetPlugin::Unprocessed`, `AssetPlugin::Processed`, and
`AssetPlugin::ProcessedDev` have been replaced with an `AssetMode` field
on `AssetPlugin`.
```rust
// before
app.add_plugins(DefaultPlugins.set(AssetPlugin::Processed { /* fields here */ })
// after
app.add_plugins(DefaultPlugins.set(AssetPlugin { mode: AssetMode::Processed, ..default() })
```
This aligns `AssetPlugin` with our other struct-like plugins. The old
"source" and "destination" `AssetProvider` fields in the enum variants
have been replaced by the "asset source" system. You no longer need to
configure the AssetPlugin to "point" to custom asset providers.
## AssetServerMode
To improve the implementation of **Multiple Asset Sources**,
`AssetServer` was made aware of whether or not it is using "processed"
or "unprocessed" assets. You can check that like this:
```rust
if asset_server.mode() == AssetServerMode::Processed {
/* do something */
}
```
Note that this refactor should also prepare the way for building "one to
many processed output files", as it makes the server aware of whether it
is loading from processed or unprocessed sources. Meaning we can store
and read processed and unprocessed assets differently!
## AssetPath can now refer to folders
The "file only" restriction has been removed from `AssetPath`. The
`AssetServer::load_folder` API now accepts an `AssetPath` instead of a
`Path`, meaning you can load folders from other asset sources!
## Improved AssetPath Parsing
AssetPath parsing was reworked to support sources, improve error
messages, and to enable parsing with a single pass over the string.
`AssetPath::new` was replaced by `AssetPath::parse` and
`AssetPath::try_parse`.
## AssetWatcher broken out from AssetReader
`AssetReader` is no longer responsible for constructing `AssetWatcher`.
This has been moved to `AssetSourceBuilder`.
## Duplicate Event Debouncing
Asset V2 already debounced duplicate filesystem events, but this was
_input_ events. Multiple input event types can produce the same _output_
`AssetSourceEvent`. Now that we have `embedded_watcher`, which does
expensive file io on events, it made sense to debounce output events
too, so I added that! This will also benefit the AssetProcessor by
preventing integrity checks for duplicate events (and helps keep the
noise down in trace logs).
## Next Steps
* **Port Built-in Shaders**: Currently the primary (and essentially
only) user of `load_interal_asset` in Bevy's source code is "built-in
shaders". I chose not to do that in this PR for a few reasons:
1. We need to add the ability to pass shader defs in to shaders via meta
files. Some shaders (such as MESH_VIEW_TYPES) need to pass shader def
values in that are defined in code.
2. We need to revisit the current shader module naming system. I think
we _probably_ want to imply modules from source structure (at least by
default). Ideally in a way that can losslessly convert asset paths
to/from shader modules (to enable the asset system to resolve modules
using the asset server).
3. I want to keep this change set minimal / get this merged first.
* **Deprecate `load_internal_asset`**: we can't do that until we do (1)
and (2)
* **Relative Asset Paths**: This PR significantly increases the need for
relative asset paths (which was already pretty high). Currently when
loading dependencies, it is assumed to be an absolute path, which means
if in an `AssetLoader` you call `context.load("some/path/image.png")` it
will assume that is the "default" asset source, _even if the current
asset is in a different asset source_. This will cause breakage for
AssetLoaders that are not designed to add the current source to whatever
paths are being used. AssetLoaders should generally not need to be aware
of the name of their current asset source, or need to think about the
"current asset source" generally. We should build apis that support
relative asset paths and then encourage using relative paths as much as
possible (both via api design and docs). Relative paths are also
important because they will allow developers to move folders around
(even across providers) without reprocessing, provided there is no path
breakage.
# Objective
- Improve rendering performance, particularly by avoiding the large
system commands costs of using the ECS in the way that the render world
does.
## Solution
- Define `EntityHasher` that calculates a hash from the
`Entity.to_bits()` by `i | (i.wrapping_mul(0x517cc1b727220a95) << 32)`.
`0x517cc1b727220a95` is something like `u64::MAX / N` for N that gives a
value close to π and that works well for hashing. Thanks for @SkiFire13
for the suggestion and to @nicopap for alternative suggestions and
discussion. This approach comes from `rustc-hash` (a.k.a. `FxHasher`)
with some tweaks for the case of hashing an `Entity`. `FxHasher` and
`SeaHasher` were also tested but were significantly slower.
- Define `EntityHashMap` type that uses the `EntityHashser`
- Use `EntityHashMap<Entity, T>` for render world entity storage,
including:
- `RenderMaterialInstances` - contains the `AssetId<M>` of the material
associated with the entity. Also for 2D.
- `RenderMeshInstances` - contains mesh transforms, flags and properties
about mesh entities. Also for 2D.
- `SkinIndices` and `MorphIndices` - contains the skin and morph index
for an entity, respectively
- `ExtractedSprites`
- `ExtractedUiNodes`
## Benchmarks
All benchmarks have been conducted on an M1 Max connected to AC power.
The tests are run for 1500 frames. The 1000th frame is captured for
comparison to check for visual regressions. There were none.
### 2D Meshes
`bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d`
#### `--ordered-z`
This test spawns the 2D meshes with z incrementing back to front, which
is the ideal arrangement allocation order as it matches the sorted
render order which means lookups have a high cache hit rate.
<img width="1112" alt="Screenshot 2023-09-27 at 07 50 45"
src="https://github.com/bevyengine/bevy/assets/302146/e140bc98-7091-4a3b-8ae1-ab75d16d2ccb">
-39.1% median frame time.
#### Random
This test spawns the 2D meshes with random z. This not only makes the
batching and transparent 2D pass lookups get a lot of cache misses, it
also currently means that the meshes are almost certain to not be
batchable.
<img width="1108" alt="Screenshot 2023-09-27 at 07 51 28"
src="https://github.com/bevyengine/bevy/assets/302146/29c2e813-645a-43ce-982a-55df4bf7d8c4">
-7.2% median frame time.
### 3D Meshes
`many_cubes --benchmark`
<img width="1112" alt="Screenshot 2023-09-27 at 07 51 57"
src="https://github.com/bevyengine/bevy/assets/302146/1a729673-3254-4e2a-9072-55e27c69f0fc">
-7.7% median frame time.
### Sprites
**NOTE: On `main` sprites are using `SparseSet<Entity, T>`!**
`bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite`
#### `--ordered-z`
This test spawns the sprites with z incrementing back to front, which is
the ideal arrangement allocation order as it matches the sorted render
order which means lookups have a high cache hit rate.
<img width="1116" alt="Screenshot 2023-09-27 at 07 52 31"
src="https://github.com/bevyengine/bevy/assets/302146/bc8eab90-e375-4d31-b5cd-f55f6f59ab67">
+13.0% median frame time.
#### Random
This test spawns the sprites with random z. This makes the batching and
transparent 2D pass lookups get a lot of cache misses.
<img width="1109" alt="Screenshot 2023-09-27 at 07 53 01"
src="https://github.com/bevyengine/bevy/assets/302146/22073f5d-99a7-49b0-9584-d3ac3eac3033">
+0.6% median frame time.
### UI
**NOTE: On `main` UI is using `SparseSet<Entity, T>`!**
`many_buttons`
<img width="1111" alt="Screenshot 2023-09-27 at 07 53 26"
src="https://github.com/bevyengine/bevy/assets/302146/66afd56d-cbe4-49e7-8b64-2f28f6043d85">
+15.1% median frame time.
## Alternatives
- Cart originally suggested trying out `SparseSet<Entity, T>` and indeed
that is slightly faster under ideal conditions. However,
`PassHashMap<Entity, T>` has better worst case performance when data is
randomly distributed, rather than in sorted render order, and does not
have the worst case memory usage that `SparseSet`'s dense `Vec<usize>`
that maps from the `Entity` index to sparse index into `Vec<T>`. This
dense `Vec` has to be as large as the largest Entity index used with the
`SparseSet`.
- I also tested `PassHashMap<u32, T>`, intending to use `Entity.index()`
as the key, but this proved to sometimes be slower and mostly no
different.
- The only outstanding approach that has not been implemented and tested
is to _not_ clear the render world of its entities each frame. That has
its own problems, though they could perhaps be solved.
- Performance-wise, if the entities and their component data were not
cleared, then they would incur table moves on spawn, and should not
thereafter, rather just their component data would be overwritten.
Ideally we would have a neat way of either updating data in-place via
`&mut T` queries, or inserting components if not present. This would
likely be quite cumbersome to have to remember to do everywhere, but
perhaps it only needs to be done in the more performance-sensitive
systems.
- The main problem to solve however is that we want to both maintain a
mapping between main world entities and render world entities, be able
to run the render app and world in parallel with the main app and world
for pipelined rendering, and at the same time be able to spawn entities
in the render world in such a way that those Entity ids do not collide
with those spawned in the main world. This is potentially quite
solvable, but could well be a lot of ECS work to do it in a way that
makes sense.
---
## Changelog
- Changed: Component data for entities to be drawn are no longer stored
on entities in the render world. Instead, data is stored in a
`EntityHashMap<Entity, T>` in various resources. This brings significant
performance benefits due to the way the render app clears entities every
frame. Resources of most interest are `RenderMeshInstances` and
`RenderMaterialInstances`, and their 2D counterparts.
## Migration Guide
Previously the render app extracted mesh entities and their component
data from the main world and stored them as entities and components in
the render world. Now they are extracted into essentially
`EntityHashMap<Entity, T>` where `T` are structs containing an
appropriate group of data. This means that while extract set systems
will continue to run extract queries against the main world they will
store their data in hash maps. Also, systems in later sets will either
need to look up entities in the available resources such as
`RenderMeshInstances`, or maintain their own `EntityHashMap<Entity, T>`
for their own data.
Before:
```rust
fn queue_custom(
material_meshes: Query<(Entity, &MeshTransforms, &Handle<Mesh>), With<InstanceMaterialData>>,
) {
...
for (entity, mesh_transforms, mesh_handle) in &material_meshes {
...
}
}
```
After:
```rust
fn queue_custom(
render_mesh_instances: Res<RenderMeshInstances>,
instance_entities: Query<Entity, With<InstanceMaterialData>>,
) {
...
for entity in &instance_entities {
let Some(mesh_instance) = render_mesh_instances.get(&entity) else { continue; };
// The mesh handle in `AssetId<Mesh>` form, and the `MeshTransforms` can now
// be found in `mesh_instance` which is a `RenderMeshInstance`
...
}
}
```
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
# Objective
- Implement the foundations of automatic batching/instancing of draw
commands as the next step from #89
- NOTE: More performance improvements will come when more data is
managed and bound in ways that do not require rebinding such as mesh,
material, and texture data.
## Solution
- The core idea for batching of draw commands is to check whether any of
the information that has to be passed when encoding a draw command
changes between two things that are being drawn according to the sorted
render phase order. These should be things like the pipeline, bind
groups and their dynamic offsets, index/vertex buffers, and so on.
- The following assumptions have been made:
- Only entities with prepared assets (pipelines, materials, meshes) are
queued to phases
- View bindings are constant across a phase for a given draw function as
phases are per-view
- `batch_and_prepare_render_phase` is the only system that performs this
batching and has sole responsibility for preparing the per-object data.
As such the mesh binding and dynamic offsets are assumed to only vary as
a result of the `batch_and_prepare_render_phase` system, e.g. due to
having to split data across separate uniform bindings within the same
buffer due to the maximum uniform buffer binding size.
- Implement `GpuArrayBuffer` for `Mesh2dUniform` to store Mesh2dUniform
in arrays in GPU buffers rather than each one being at a dynamic offset
in a uniform buffer. This is the same optimisation that was made for 3D
not long ago.
- Change batch size for a range in `PhaseItem`, adding API for getting
or mutating the range. This is more flexible than a size as the length
of the range can be used in place of the size, but the start and end can
be otherwise whatever is needed.
- Add an optional mesh bind group dynamic offset to `PhaseItem`. This
avoids having to do a massive table move just to insert
`GpuArrayBufferIndex` components.
## Benchmarks
All tests have been run on an M1 Max on AC power. `bevymark` and
`many_cubes` were modified to use 1920x1080 with a scale factor of 1. I
run a script that runs a separate Tracy capture process, and then runs
the bevy example with `--features bevy_ci_testing,trace_tracy` and
`CI_TESTING_CONFIG=../benchmark.ron` with the contents of
`../benchmark.ron`:
```rust
(
exit_after: Some(1500)
)
```
...in order to run each test for 1500 frames.
The recent changes to `many_cubes` and `bevymark` added reproducible
random number generation so that with the same settings, the same rng
will occur. They also added benchmark modes that use a fixed delta time
for animations. Combined this means that the same frames should be
rendered both on main and on the branch.
The graphs compare main (yellow) to this PR (red).
### 3D Mesh `many_cubes --benchmark`
<img width="1411" alt="Screenshot 2023-09-03 at 23 42 10"
src="https://github.com/bevyengine/bevy/assets/302146/2088716a-c918-486c-8129-090b26fd2bc4">
The mesh and material are the same for all instances. This is basically
the best case for the initial batching implementation as it results in 1
draw for the ~11.7k visible meshes. It gives a ~30% reduction in median
frame time.
The 1000th frame is identical using the flip tool:
![flip many_cubes-main-mesh3d many_cubes-batching-mesh3d 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/2511f37a-6df8-481a-932f-706ca4de7643)
```
Mean: 0.000000
Weighted median: 0.000000
1st weighted quartile: 0.000000
3rd weighted quartile: 0.000000
Min: 0.000000
Max: 0.000000
Evaluation time: 0.4615 seconds
```
### 3D Mesh `many_cubes --benchmark --material-texture-count 10`
<img width="1404" alt="Screenshot 2023-09-03 at 23 45 18"
src="https://github.com/bevyengine/bevy/assets/302146/5ee9c447-5bd2-45c6-9706-ac5ff8916daf">
This run uses 10 different materials by varying their textures. The
materials are randomly selected, and there is no sorting by material
bind group for opaque 3D so any batching is 'random'. The PR produces a
~5% reduction in median frame time. If we were to sort the opaque phase
by the material bind group, then this should be a lot faster. This
produces about 10.5k draws for the 11.7k visible entities. This makes
sense as randomly selecting from 10 materials gives a chance that two
adjacent entities randomly select the same material and can be batched.
The 1000th frame is identical in flip:
![flip many_cubes-main-mesh3d-mtc10 many_cubes-batching-mesh3d-mtc10
67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/2b3a8614-9466-4ed8-b50c-d4aa71615dbb)
```
Mean: 0.000000
Weighted median: 0.000000
1st weighted quartile: 0.000000
3rd weighted quartile: 0.000000
Min: 0.000000
Max: 0.000000
Evaluation time: 0.4537 seconds
```
### 3D Mesh `many_cubes --benchmark --vary-per-instance`
<img width="1394" alt="Screenshot 2023-09-03 at 23 48 44"
src="https://github.com/bevyengine/bevy/assets/302146/f02a816b-a444-4c18-a96a-63b5436f3b7f">
This run varies the material data per instance by randomly-generating
its colour. This is the worst case for batching and that it performs
about the same as `main` is a good thing as it demonstrates that the
batching has minimal overhead when dealing with ~11k visible mesh
entities.
The 1000th frame is identical according to flip:
![flip many_cubes-main-mesh3d-vpi many_cubes-batching-mesh3d-vpi 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/ac5f5c14-9bda-4d1a-8219-7577d4aac68c)
```
Mean: 0.000000
Weighted median: 0.000000
1st weighted quartile: 0.000000
3rd weighted quartile: 0.000000
Min: 0.000000
Max: 0.000000
Evaluation time: 0.4568 seconds
```
### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d`
<img width="1412" alt="Screenshot 2023-09-03 at 23 59 56"
src="https://github.com/bevyengine/bevy/assets/302146/cb02ae07-237b-4646-ae9f-fda4dafcbad4">
This spawns 160 waves of 1000 quad meshes that are shaded with
ColorMaterial. Each wave has a different material so 160 waves currently
should result in 160 batches. This results in a 50% reduction in median
frame time.
Capturing a screenshot of the 1000th frame main vs PR gives:
![flip bevymark-main-mesh2d bevymark-batching-mesh2d 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/80102728-1217-4059-87af-14d05044df40)
```
Mean: 0.001222
Weighted median: 0.750432
1st weighted quartile: 0.453494
3rd weighted quartile: 0.969758
Min: 0.000000
Max: 0.990296
Evaluation time: 0.4255 seconds
```
So they seem to produce the same results. I also double-checked the
number of draws. `main` does 160000 draws, and the PR does 160, as
expected.
### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d --material-texture-count 10`
<img width="1392" alt="Screenshot 2023-09-04 at 00 09 22"
src="https://github.com/bevyengine/bevy/assets/302146/4358da2e-ce32-4134-82df-3ab74c40849c">
This generates 10 textures and generates materials for each of those and
then selects one material per wave. The median frame time is reduced by
50%. Similar to the plain run above, this produces 160 draws on the PR
and 160000 on `main` and the 1000th frame is identical (ignoring the fps
counter text overlay).
![flip bevymark-main-mesh2d-mtc10 bevymark-batching-mesh2d-mtc10 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/ebed2822-dce7-426a-858b-b77dc45b986f)
```
Mean: 0.002877
Weighted median: 0.964980
1st weighted quartile: 0.668871
3rd weighted quartile: 0.982749
Min: 0.000000
Max: 0.992377
Evaluation time: 0.4301 seconds
```
### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d --vary-per-instance`
<img width="1396" alt="Screenshot 2023-09-04 at 00 13 53"
src="https://github.com/bevyengine/bevy/assets/302146/b2198b18-3439-47ad-919a-cdabe190facb">
This creates unique materials per instance by randomly-generating the
material's colour. This is the worst case for 2D batching. Somehow, this
PR manages a 7% reduction in median frame time. Both main and this PR
issue 160000 draws.
The 1000th frame is the same:
![flip bevymark-main-mesh2d-vpi bevymark-batching-mesh2d-vpi 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/a2ec471c-f576-4a36-a23b-b24b22578b97)
```
Mean: 0.001214
Weighted median: 0.937499
1st weighted quartile: 0.635467
3rd weighted quartile: 0.979085
Min: 0.000000
Max: 0.988971
Evaluation time: 0.4462 seconds
```
### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite`
<img width="1396" alt="Screenshot 2023-09-04 at 12 21 12"
src="https://github.com/bevyengine/bevy/assets/302146/8b31e915-d6be-4cac-abf5-c6a4da9c3d43">
This just spawns 160 waves of 1000 sprites. There should be and is no
notable difference between main and the PR.
### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite --material-texture-count 10`
<img width="1389" alt="Screenshot 2023-09-04 at 12 36 08"
src="https://github.com/bevyengine/bevy/assets/302146/45fe8d6d-c901-4062-a349-3693dd044413">
This spawns the sprites selecting a texture at random per instance from
the 10 generated textures. This has no significant change vs main and
shouldn't.
### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite --vary-per-instance`
<img width="1401" alt="Screenshot 2023-09-04 at 12 29 52"
src="https://github.com/bevyengine/bevy/assets/302146/762c5c60-352e-471f-8dbe-bbf10e24ebd6">
This sets the sprite colour as being unique per instance. This can still
all be drawn using one batch. There should be no difference but the PR
produces median frame times that are 4% higher. Investigation showed no
clear sources of cost, rather a mix of give and take that should not
happen. It seems like noise in the results.
### Summary
| Benchmark | % change in median frame time |
| ------------- | ------------- |
| many_cubes | 🟩 -30% |
| many_cubes 10 materials | 🟩 -5% |
| many_cubes unique materials | 🟩 ~0% |
| bevymark mesh2d | 🟩 -50% |
| bevymark mesh2d 10 materials | 🟩 -50% |
| bevymark mesh2d unique materials | 🟩 -7% |
| bevymark sprite | 🟥 2% |
| bevymark sprite 10 materials | 🟥 0.6% |
| bevymark sprite unique materials | 🟥 4.1% |
---
## Changelog
- Added: 2D and 3D mesh entities that share the same mesh and material
(same textures, same data) are now batched into the same draw command
for better performance.
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Nicola Papale <nico@nicopap.ch>