# Objective
- Closes#15752
Calling the functions `App::observe` and `World::observe` doesn't make
sense because you're not "observing" the `App` or `World`, you're adding
an observer that listens for an event that occurs *within* the `World`.
We should rename them to better fit this.
## Solution
Renames:
- `App::observe` -> `App::add_observer`
- `World::observe` -> `World::add_observer`
- `Commands::observe` -> `Commands::add_observer`
- `EntityWorldMut::observe_entity` -> `EntityWorldMut::observe`
(Note this isn't a breaking change as the original rename was introduced
earlier this cycle.)
## Testing
Reusing current tests.
# Objective
After merging retained rendering world #15320, we now have a good way of
creating a link between worlds (*HIYAA intensifies*). This means that
`get_or_spawn` is no longer necessary for that function. Entity should
be opaque as the warning above `get_or_spawn` says. This is also part of
#15459.
I'm deprecating `get_or_spawn_batch` in a different PR in order to keep
the PR small in size.
## Solution
Deprecate `get_or_spawn` and replace it with `get_entity` in most
contexts. If it's possible to query `&RenderEntity`, then the entity is
synced and `render_entity.id()` is initialized in the render world.
## Migration Guide
If you are given an `Entity` and you want to do something with it, use
`Commands.entity(...)` or `World.entity(...)`. If instead you want to
spawn something use `Commands.spawn(...)` or `World.spawn(...)`. If you
are not sure if an entity exists, you can always use `get_entity` and
match on the `Option<...>` that is returned.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
The current observers have some unfortunate footguns where you can end
up confused about what is actually being observed. For apps you can
chain observe like `app.observe(..).observe(..)` which works like you
would expect, but if you try the same with world the first `observe()`
will return the `EntityWorldMut` for the created observer, and the
second `observe()` will only observe on the observer entity. It took
several hours for multiple people on discord to figure this out, which
is not a great experience.
## Solution
Rename `observe` on entities to `observe_entity`. It's slightly more
verbose when you know you have an entity, but it feels right to me that
observers for specific things have more specific naming, and it prevents
this issue completely.
Another possible solution would be to unify `observe` on `App` and
`World` to have the same kind of return type, but I'm not sure exactly
what that would look like.
## Testing
Simple name change, so only concern is docs really.
---
## Migration Guide
The `observe()` method on entities has been renamed to
`observe_entity()` to prevent confusion about what is being observed in
some cases.
# Objective
Fixes#14511.
`despawn` allows you to remove entities from the world. However, if the
entity does not exist, it emits a warning. This may not be intended
behavior for many users who have use cases where they need to call
`despawn` regardless of if the entity actually exists (see the issue),
or don't care in general if the entity already doesn't exist.
(Also trying to gauge interest on if this feature makes sense, I'd
personally love to have it, but I could see arguments that this might be
a footgun. Just trying to help here 😄 If there's no contention I could
also implement this for `despawn_recursive` and `despawn_descendants` in
the same PR)
## Solution
Add `try_despawn`, `try_despawn_recursive` and
`try_despawn_descendants`.
Modify `World::despawn_with_caller` to also take in a `warn` boolean
argument, which is then considered when logging the warning. Set
`log_warning` to `true` in the case of `despawn`, and `false` in the
case of `try_despawn`.
## Testing
Ran `cargo run -p ci` on macOS, it seemed fine.
# Objective
Add despawn and despawn_recursive benchmarks in a similar vein to the
spawn benchmark.
## Testing
Ran `cargo bench` from `benches` and it compiled fine.
On my machine:
```
despawn_world/1_entities
time: [3.1495 ns 3.1574 ns 3.1652 ns]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
despawn_world/10_entities
time: [28.629 ns 28.674 ns 28.720 ns]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
despawn_world/100_entities
time: [286.95 ns 287.41 ns 287.90 ns]
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
despawn_world/1000_entities
time: [2.8739 µs 2.9001 µs 2.9355 µs]
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
despawn_world/10000_entities
time: [28.535 µs 28.617 µs 28.698 µs]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
despawn_world_recursive/1_entities
time: [5.2270 ns 5.2507 ns 5.2907 ns]
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
6 (6.00%) high mild
4 (4.00%) high severe
despawn_world_recursive/10_entities
time: [57.495 ns 57.590 ns 57.691 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
despawn_world_recursive/100_entities
time: [514.43 ns 518.91 ns 526.88 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
despawn_world_recursive/1000_entities
time: [5.0362 µs 5.0463 µs 5.0578 µs]
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
despawn_world_recursive/10000_entities
time: [51.159 µs 51.603 µs 52.215 µs]
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
```
As discussed in #15521
- Partial revert of #14897, reverting the change to the methods to
consume `self`
- The `insert_if` method is kept
The migration guide of #14897 should be removed
Closes#15521
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
- Fixes#6370
- Closes#6581
## Solution
- Added the following lints to the workspace:
- `std_instead_of_core`
- `std_instead_of_alloc`
- `alloc_instead_of_core`
- Used `cargo +nightly fmt` with [item level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A)
to split all `use` statements into single items.
- Used `cargo clippy --workspace --all-targets --all-features --fix
--allow-dirty` to _attempt_ to resolve the new linting issues, and
intervened where the lint was unable to resolve the issue automatically
(usually due to needing an `extern crate alloc;` statement in a crate
root).
- Manually removed certain uses of `std` where negative feature gating
prevented `--all-features` from finding the offending uses.
- Used `cargo +nightly fmt` with [crate level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A)
to re-merge all `use` statements matching Bevy's previous styling.
- Manually fixed cases where the `fmt` tool could not re-merge `use`
statements due to conditional compilation attributes.
## Testing
- Ran CI locally
## Migration Guide
The MSRV is now 1.81. Please update to this version or higher.
## Notes
- This is a _massive_ change to try and push through, which is why I've
outlined the semi-automatic steps I used to create this PR, in case this
fails and someone else tries again in the future.
- Making this change has no impact on user code, but does mean Bevy
contributors will be warned to use `core` and `alloc` instead of `std`
where possible.
- This lint is a critical first step towards investigating `no_std`
options for Bevy.
---------
Co-authored-by: François Mockers <francois.mockers@vleue.com>
# Objective
Updating ``glam`` to 0.29, ``encase`` to 0.10.
## Solution
Update the necessary ``Cargo.toml`` files.
## Testing
Ran ``cargo run -p ci`` on Windows; no issues came up.
---------
Co-authored-by: aecsocket <aecsocket@tutanota.com>
# Objective
Fixes#14331
## Solution
- Make `Traversal` a subtrait of `ReadOnlyQueryData`
- Update implementations and usages
## Testing
- Updated unit tests
## Migration Guide
Update implementations of `Traversal`.
---------
Co-authored-by: Christian Hughes <9044780+ItsDoot@users.noreply.github.com>
# Objective
While #13152 added function reflection, it didn't really make functions
reflectable. Instead, it made it so that they can be called with
reflected arguments and return reflected data. But functions themselves
cannot be reflected.
In other words, we can't go from `DynamicFunction` to `dyn
PartialReflect`.
## Solution
Allow `DynamicFunction` to actually be reflected.
This PR adds the `Function` reflection subtrait (and corresponding
`ReflectRef`, `ReflectKind`, etc.). With this new trait, we're able to
implement `PartialReflect` on `DynamicFunction`.
### Implementors
`Function` is currently only implemented for `DynamicFunction<'static>`.
This is because we can't implement it generically over all
functions—even those that implement `IntoFunction`.
What about `DynamicFunctionMut`? Well, this PR does **not** implement
`Function` for `DynamicFunctionMut`.
The reasons for this are a little complicated, but it boils down to
mutability. `DynamicFunctionMut` requires `&mut self` to be invoked
since it wraps a `FnMut`. However, we can't really model this well with
`Function`. And if we make `DynamicFunctionMut` wrap its internal
`FnMut` in a `Mutex` to allow for `&self` invocations, then we run into
either concurrency issues or recursion issues (or, in the worst case,
both).
So for the time-being, we won't implement `Function` for
`DynamicFunctionMut`. It will be better to evaluate it on its own. And
we may even consider the possibility of removing it altogether if it
adds too much complexity to the crate.
### Dynamic vs Concrete
One of the issues with `DynamicFunction` is the fact that it's both a
dynamic representation (like `DynamicStruct` or `DynamicList`) and the
only way to represent a function.
Because of this, it's in a weird middle ground where we can't easily
implement full-on `Reflect`. That would require `Typed`, but what static
`TypeInfo` could it provide? Just that it's a `DynamicFunction`? None of
the other dynamic types implement `Typed`.
However, by not implementing `Reflect`, we lose the ability to downcast
back to our `DynamicStruct`. Our only option is to call
`Function::clone_dynamic`, which clones the data rather than by simply
downcasting. This works in favor of the `PartialReflect::try_apply`
implementation since it would have to clone anyways, but is definitely
not ideal. This is also the reason I had to add `Debug` as a supertrait
on `Function`.
For now, this PR chooses not to implement `Reflect` for
`DynamicFunction`. We may want to explore this in a followup PR (or even
this one if people feel strongly that it's strictly required).
The same is true for `FromReflect`. We may decide to add an
implementation there as well, but it's likely out-of-scope of this PR.
## Testing
You can test locally by running:
```
cargo test --package bevy_reflect --all-features
```
---
## Showcase
You can now pass around a `DynamicFunction` as a `dyn PartialReflect`!
This also means you can use it as a field on a reflected type without
having to ignore it (though you do need to opt out of `FromReflect`).
```rust
#[derive(Reflect)]
#[reflect(from_reflect = false)]
struct ClickEvent {
callback: DynamicFunction<'static>,
}
let event: Box<dyn Struct> = Box::new(ClickEvent {
callback: (|| println!("Clicked!")).into_function(),
});
// We can access our `DynamicFunction` as a `dyn PartialReflect`
let callback: &dyn PartialReflect = event.field("callback").unwrap();
// And access function-related methods via the new `Function` trait
let ReflectRef::Function(callback) = callback.reflect_ref() else {
unreachable!()
};
// Including calling the function
callback.reflect_call(ArgList::new()).unwrap(); // Prints: Clicked!
```
# Objective
- Fixes#15106
## Solution
- Trivial refactor to rename the method. The duplicate method `push` was
removed as well. This will simpify the API and make the semantics more
clear. `Add` implies that the action happens immediately, whereas in
reality, the command is queued to be run eventually.
- `ChildBuilder::add_command` has similarly been renamed to
`queue_command`.
## Testing
Unit tests should suffice for this simple refactor.
---
## Migration Guide
- `Commands::add` and `Commands::push` have been replaced with
`Commnads::queue`.
- `ChildBuilder::add_command` has been renamed to
`ChildBuilder::queue_command`.
# Objective
- fix#12853
- Make `Table::allocate` faster
## Solution
The PR consists of multiple steps:
1) For the component data: create a new data-structure that's similar to
`BlobVec` but doesn't store `len` & `capacity` inside of it: "BlobArray"
(name suggestions welcome)
2) For the `Tick` data: create a new data-structure that's similar to
`ThinSlicePtr` but supports dynamic reallocation: "ThinArrayPtr" (name
suggestions welcome)
3) Create a new data-structure that's very similar to `Column` that
doesn't store `len` & `capacity` inside of it: "ThinColumn"
4) Adjust the `Table` implementation to use `ThinColumn` instead of
`Column`
The result is that only one set of `len` & `capacity` is stored in
`Table`, in `Table::entities`
### Notes Regarding Performance
Apart from shaving off some excess memory in `Table`, the changes have
also brought noteworthy performance improvements:
The previous implementation relied on `Vec::reserve` &
`BlobVec::reserve`, but that redundantly repeated the same if statement
(`capacity` == `len`). Now that check could be made at the `Table` level
because the capacity and length of all the columns are synchronized;
saving N branches per allocation. The result is a respectable
performance improvement per every `Table::reserve` (and subsequently
`Table::allocate`) call.
I'm hesitant to give exact numbers because I don't have a lot of
experience in profiling and benchmarking, but these are the results I
got so far:
*`add_remove_big/table` benchmark after the implementation:*
![after_add_remove_big_table](https://github.com/bevyengine/bevy/assets/46227443/b667da29-1212-4020-8bb0-ec0f15bb5f8a)
*`add_remove_big/table` benchmark in main branch (measured in comparison
to the implementation):*
![main_add_remove_big_table](https://github.com/bevyengine/bevy/assets/46227443/41abb92f-3112-4e01-b935-99696eb2fe58)
*`add_remove_very_big/table` benchmark after the implementation:*
![after_add_remove_very_big](https://github.com/bevyengine/bevy/assets/46227443/f268a155-295b-4f55-ab02-f8a9dcc64fc2)
*`add_remove_very_big/table` benchmark in main branch (measured in
comparison to the implementation):*
![main_add_remove_very_big](https://github.com/bevyengine/bevy/assets/46227443/78b4e3a6-b255-47c9-baee-1a24c25b9aea)
cc @james7132 to verify
---
## Changelog
- New data-structure that's similar to `BlobVec` but doesn't store `len`
& `capacity` inside of it: `BlobArray`
- New data-structure that's similar to `ThinSlicePtr` but supports
dynamic allocation:`ThinArrayPtr`
- New data-structure that's very similar to `Column` that doesn't store
`len` & `capacity` inside of it: `ThinColumn`
- Adjust the `Table` implementation to use `ThinColumn` instead of
`Column`
- New benchmark: `add_remove_very_big` to benchmark the performance of
spawning a lot of entities with a lot of components (15) each
## Migration Guide
`Table` now uses `ThinColumn` instead of `Column`. That means that
methods that previously returned `Column`, will now return `ThinColumn`
instead.
`ThinColumn` has a much more limited and low-level API, but you can
still achieve the same things in `ThinColumn` as you did in `Column`.
For example, instead of calling `Column::get_added_tick`, you'd call
`ThinColumn::get_added_ticks_slice` and index it to get the specific
added tick.
---------
Co-authored-by: James Liu <contact@jamessliu.com>
# Objective
- follow of #14049 ,we could use it on our Parallel Iterator,this pr
also unified the used function in both regular iter and parallel
iterations.
## Performance
![image](https://github.com/user-attachments/assets/cba700bc-169c-4b58-b504-823bdca8ec05)
no performance regression for regular itertaion
3.5X faster in hybrid parallel iteraion,this number is far greater than
the benefits obtained in regular iteration(~1.81) because mutable
iterations on continuous memory can effectively reduce the cost of
mataining core cache coherence
# Objective
Fixes#14883
## Solution
Pretty simple update to `EntityCommands` methods to consume `self` and
return it rather than taking `&mut self`. The things probably worth
noting:
* I added `#[allow(clippy::should_implement_trait)]` to the `add` method
because it causes a linting conflict with `std::ops::Add`.
* `despawn` and `log_components` now return `Self`. I'm not sure if
that's exactly the desired behavior so I'm happy to adjust if that seems
wrong.
## Testing
Tested with `cargo run -p ci`. I think that should be sufficient to call
things good.
## Migration Guide
The most likely migration needed is changing code from this:
```
let mut entity = commands.get_or_spawn(entity);
if depth_prepass {
entity.insert(DepthPrepass);
}
if normal_prepass {
entity.insert(NormalPrepass);
}
if motion_vector_prepass {
entity.insert(MotionVectorPrepass);
}
if deferred_prepass {
entity.insert(DeferredPrepass);
}
```
to this:
```
let mut entity = commands.get_or_spawn(entity);
if depth_prepass {
entity = entity.insert(DepthPrepass);
}
if normal_prepass {
entity = entity.insert(NormalPrepass);
}
if motion_vector_prepass {
entity = entity.insert(MotionVectorPrepass);
}
if deferred_prepass {
entity.insert(DeferredPrepass);
}
```
as can be seen in several of the example code updates here. There will
probably also be instances where mutable `EntityCommands` vars no longer
need to be mutable.
# Objective
Fixes#14782
## Solution
Enable the lint and fix all upcoming hints (`--fix`). Also tried to
figure out the false-positive (see review comment). Maybe split this PR
up into multiple parts where only the last one enables the lint, so some
can already be merged resulting in less many files touched / less
potential for merge conflicts?
Currently, there are some cases where it might be easier to read the
code with the qualifier, so perhaps remove the import of it and adapt
its cases? In the current stage it's just a plain adoption of the
suggestions in order to have a base to discuss.
## Testing
`cargo clippy` and `cargo run -p ci` are happy.
# Objective
One of the changes in #14704 made `DynamicFunction` effectively the same
as `DynamicClosure<'static>`. This change meant that the de facto
function type would likely be `DynamicClosure<'static>` instead of the
intended `DynamicFunction`, since the former is much more flexible.
We _could_ explore ways of making `DynamicFunction` implement `Copy`
using some unsafe code, but it likely wouldn't be worth it. And users
would likely still reach for the convenience of
`DynamicClosure<'static>` over the copy-ability of `DynamicFunction`.
The goal of this PR is to fix this confusion between the two types.
## Solution
Firstly, the `DynamicFunction` type was removed. Again, it was no
different than `DynamicClosure<'static>` so it wasn't a huge deal to
remove.
Secondly, `DynamicClosure<'env>` and `DynamicClosureMut<'env>` were
renamed to `DynamicFunction<'env>` and `DynamicFunctionMut<'env>`,
respectively.
Yes, we still ultimately kept the naming of `DynamicFunction`, but
changed its behavior to that of `DynamicClosure<'env>`. We need a term
to refer to both functions and closures, and "function" was the best
option.
[Originally](https://discord.com/channels/691052431525675048/1002362493634629796/1274091992162242710),
I was going to go with "callable" as the replacement term to encompass
both functions and closures (e.g. `DynamciCallable<'env>`). However, it
was
[suggested](https://discord.com/channels/691052431525675048/1002362493634629796/1274653581777047625)
by @SkiFire13 that the simpler "function" term could be used instead.
While "callable" is perhaps the better umbrella term—being truly
ambiguous over functions and closures— "function" is more familiar, used
more often, easier to discover, and is subjectively just
"better-sounding".
## Testing
Most changes are purely swapping type names or updating documentation,
but you can verify everything still works by running the following
command:
```
cargo test --package bevy_reflect
```
# Objective
Apparently #14382 broke this, but it's not a part of CI, so it wasn't
found until earlier today.
## Solution
Update the benchmark like we updated the examples.
## Testing
Running `cargo bench` actually works now.
# Objective
It would be good to have benchmarks handy for function reflection as it
continues to be worked on.
## Solution
Add some basic benchmarks for function reflection.
## Testing
To test locally, run the following in the `benches` directory:
```
cargo bench --bench reflect_function
```
## Results
Here are a couple of the results (M1 Max MacBook Pro):
<img width="936" alt="Results of benching calling functions vs closures
via reflection. Closures average about 40ns, while functions average
about 55ns"
src="https://github.com/user-attachments/assets/b9a6c585-5fbe-43db-9a7b-f57dbd3815e3">
<img width="936" alt="Results of benching converting functions vs
closures into their dynamic representations. Closures average about
34ns, while functions average about 37ns"
src="https://github.com/user-attachments/assets/4614560a-7192-4c1e-9ade-7bc5a4ca68e3">
Currently, it seems `DynamicClosure` is just a bit more performant. This
is likely due to the fact that `DynamicFunction` stores its function
object in an `Arc` instead of a `Box` so that it can be `Send + Sync`
(and also `Clone`).
We'll likely need to do the same for `DynamicClosure` so I suspect these
results to change in the near future.
Basically it's https://github.com/bevyengine/bevy/pull/13792 with the
bumped versions of `encase` and `hexasphere`.
---------
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
To implement relations we will need to add a `ComponentIndex`, which is
a map from a Component to the list of archetypes that contain this
component.
One of the reasons is that with fragmenting relations the number of
archetypes will explode, so it will become inefficient to create and
update the query caches by iterating through the list of all archetypes.
In this PR, we introduce the `ComponentIndex`, and we update the
`QueryState` to make use of it:
- if a query has at least 1 required component (i.e. something other
than `()`, `Entity` or `Option<>`, etc.): for each of the required
components we find the list of archetypes that contain it (using the
ComponentIndex). Then, we select the smallest list among these. This
gives a small subset of archetypes to iterate through compared with
iterating through all new archetypes
- if it doesn't, then we keep using the current approach of iterating
through all new archetypes
# Implementation
- This breaks query iteration order, in the sense that we are not
guaranteed anymore to return results in the order in which the
archetypes were created. I think this should be fine because this wasn't
an explicit bevy guarantee so users should not be relying on this. I
updated a bunch of unit tests that were failing because of this.
- I had an issue with the borrow checker because iterating the list of
potential archetypes requires access to `&state.component_access`, which
was conflicting with the calls to
```
if state.new_archetype_internal(archetype) {
state.update_archetype_component_access(archetype, access);
}
```
which need a mutable access to the state.
The solution I chose was to introduce a `QueryStateView` which is a
temporary view into the `QueryState` which enables a "split-borrows"
kind of approach. It is described in detail in this blog post:
https://smallcultfollowing.com/babysteps/blog/2018/11/01/after-nll-interprocedural-conflicts/
# Test
The unit tests pass.
Benchmark results:
```
❯ critcmp main pr
group main pr
----- ---- --
iter_fragmented/base 1.00 342.2±25.45ns ? ?/sec 1.02 347.5±16.24ns ? ?/sec
iter_fragmented/foreach 1.04 165.4±11.29ns ? ?/sec 1.00 159.5±4.27ns ? ?/sec
iter_fragmented/foreach_wide 1.03 3.3±0.04µs ? ?/sec 1.00 3.2±0.06µs ? ?/sec
iter_fragmented/wide 1.03 3.1±0.06µs ? ?/sec 1.00 3.0±0.08µs ? ?/sec
iter_fragmented_sparse/base 1.00 6.5±0.14ns ? ?/sec 1.02 6.6±0.08ns ? ?/sec
iter_fragmented_sparse/foreach 1.00 6.3±0.08ns ? ?/sec 1.04 6.6±0.08ns ? ?/sec
iter_fragmented_sparse/foreach_wide 1.00 43.8±0.15ns ? ?/sec 1.02 44.6±0.53ns ? ?/sec
iter_fragmented_sparse/wide 1.00 29.8±0.44ns ? ?/sec 1.00 29.8±0.26ns ? ?/sec
iter_simple/base 1.00 8.2±0.10µs ? ?/sec 1.00 8.2±0.09µs ? ?/sec
iter_simple/foreach 1.00 3.8±0.02µs ? ?/sec 1.02 3.9±0.03µs ? ?/sec
iter_simple/foreach_sparse_set 1.00 19.0±0.26µs ? ?/sec 1.01 19.3±0.16µs ? ?/sec
iter_simple/foreach_wide 1.00 17.8±0.24µs ? ?/sec 1.00 17.9±0.31µs ? ?/sec
iter_simple/foreach_wide_sparse_set 1.06 95.6±6.23µs ? ?/sec 1.00 90.6±0.59µs ? ?/sec
iter_simple/sparse_set 1.00 19.3±1.63µs ? ?/sec 1.01 19.5±0.29µs ? ?/sec
iter_simple/system 1.00 8.1±0.10µs ? ?/sec 1.00 8.1±0.09µs ? ?/sec
iter_simple/wide 1.05 37.7±2.53µs ? ?/sec 1.00 35.8±0.57µs ? ?/sec
iter_simple/wide_sparse_set 1.00 95.7±1.62µs ? ?/sec 1.00 95.9±0.76µs ? ?/sec
par_iter_simple/with_0_fragment 1.04 35.0±2.51µs ? ?/sec 1.00 33.7±0.49µs ? ?/sec
par_iter_simple/with_1000_fragment 1.00 50.4±2.52µs ? ?/sec 1.01 51.0±3.84µs ? ?/sec
par_iter_simple/with_100_fragment 1.02 40.3±2.23µs ? ?/sec 1.00 39.5±1.32µs ? ?/sec
par_iter_simple/with_10_fragment 1.14 38.8±7.79µs ? ?/sec 1.00 34.0±0.78µs ? ?/sec
```
# Objective
- currently, bevy employs sparse iteration if any of the target
components in the query are stored in a sparse set. it may lead to
increased cache misses in some cases, potentially impacting performance.
- partial fixes#12381
## Solution
- use dense iteration when an archetype and its table have the same
entity count.
- to avoid introducing complicate unsafe noise, this pr only implement
for `for_each ` style iteration.
- added a benchmark to test performance for hybrid iteration.
## Performance
![image](https://github.com/bevyengine/bevy/assets/45868716/5cce13cf-6ff2-4861-9576-e75edc63bd46)
nearly 2x win in specific scenarios, and no performance degradation in
other test cases.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Christian Hughes <9044780+ItsDoot@users.noreply.github.com>
# Objective
- The event propagation benchmark is largely derived from
bevy_eventlistener. However, it doesn't accurately reflect performance
of bevy side, as our event bubble propagation is based on observer.
## Solution
- added several new benchmarks that focuse on observer itself rather
than event bubble
# Objective
Add basic bubbling to observers, modeled off `bevy_eventlistener`.
## Solution
- Introduce a new `Traversal` trait for components which point to other
entities.
- Provide a default `TraverseNone: Traversal` component which cannot be
constructed.
- Implement `Traversal` for `Parent`.
- The `Event` trait now has an associated `Traversal` which defaults to
`TraverseNone`.
- Added a field `bubbling: &mut bool` to `Trigger` which can be used to
instruct the runner to bubble the event to the entity specified by the
event's traversal type.
- Added an associated constant `SHOULD_BUBBLE` to `Event` which
configures the default bubbling state.
- Added logic to wire this all up correctly.
Introducing the new associated information directly on `Event` (instead
of a new `BubblingEvent` trait) lets us dispatch both bubbling and
non-bubbling events through the same api.
## Testing
I have added several unit tests to cover the common bugs I identified
during development. Running the unit tests should be enough to validate
correctness. The changes effect unsafe portions of the code, but should
not change any of the safety assertions.
## Changelog
Observers can now bubble up the entity hierarchy! To create a bubbling
event, change your `Derive(Event)` to something like the following:
```rust
#[derive(Component)]
struct MyEvent;
impl Event for MyEvent {
type Traverse = Parent; // This event will propagate up from child to parent.
const AUTO_PROPAGATE: bool = true; // This event will propagate by default.
}
```
You can dispatch a bubbling event using the normal
`world.trigger_targets(MyEvent, entity)`.
Halting an event mid-bubble can be done using
`trigger.propagate(false)`. Events with `AUTO_PROPAGATE = false` will
not propagate by default, but you can enable it using
`trigger.propagate(true)`.
If there are multiple observers attached to a target, they will all be
triggered by bubbling. They all share a bubbling state, which can be
accessed mutably using `trigger.propagation_mut()` (`trigger.propagate`
is just sugar for this).
You can choose to implement `Traversal` for your own types, if you want
to bubble along a different structure than provided by `bevy_hierarchy`.
Implementers must be careful never to produce loops, because this will
cause bevy to hang.
## Migration Guide
+ Manual implementations of `Event` should add associated type `Traverse
= TraverseNone` and associated constant `AUTO_PROPAGATE = false`;
+ `Trigger::new` has new field `propagation: &mut Propagation` which
provides the bubbling state.
+ `ObserverRunner` now takes the same `&mut Propagation` as a final
parameter.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective
- #4972 introduce a benchmark to measure chang detection performance
- However,it uses `iter_batch ` cause a lot of overhead in clone data to
each routine closure(it feels like a bug in`iter_batch `) and constructs
new query in every iter.This overhead masks the real change detection
throughput we want to measure. Instead of evaluating raw change
detection, the benchmark ends up dominated by data cloning and
allocation costs.
## Solution
- Use iter_batch_ref to reduce the benchmark overload
- Use cached query to better reflect real-world usage scenarios.
- Add more benmark
---
## Changelog
# Objective
Remove the limit of `RenderLayer` by using a growable mask using
`SmallVec`.
Changes adopted from @UkoeHB's initial PR here
https://github.com/bevyengine/bevy/pull/12502 that contained additional
changes related to propagating render layers.
Changes
## Solution
The main thing needed to unblock this is removing `RenderLayers` from
our shader code. This primarily affects `DirectionalLight`. We are now
computing a `skip` field on the CPU that is then used to skip the light
in the shader.
## Testing
Checked a variety of examples and did a quick benchmark on `many_cubes`.
There were some existing problems identified during the development of
the original pr (see:
https://discord.com/channels/691052431525675048/1220477928605749340/1221190112939872347).
This PR shouldn't change any existing behavior besides removing the
layer limit (sans the comment in migration about `all` layers no longer
being possible).
---
## Changelog
Removed the limit on `RenderLayers` by using a growable bitset that only
allocates when layers greater than 64 are used.
## Migration Guide
- `RenderLayers::all()` no longer exists. Entities expecting to be
visible on all layers, e.g. lights, should compute the active layers
that are in use.
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
# Objective
Fixes#12966
## Solution
Renaming multi_threaded feature to match snake case
## Migration Guide
Bevy feature multi-threaded should be refered to multi_threaded from now
on.
# Objective
- Update glam version requirement to latest version.
## Solution
- Updated `glam` version requirement from 0.25 to 0.27.
- Updated `encase` and `encase_derive_impl` version requirement from 0.7
to 0.8.
- Updated `hexasphere` version requirement from 10.0 to 12.0.
- Breaking changes from glam changelog:
- [0.26.0] Minimum Supported Rust Version bumped to 1.68.2 for impl
From<bool> for {f32,f64} support.
- [0.27.0] Changed implementation of vector fract method to match the
Rust implementation instead of the GLSL implementation, that is self -
self.trunc() instead of self - self.floor().
---
## Migration Guide
- When using `glam` exports, keep in mind that `vector` `fract()` method
now matches Rust implementation (that is `self - self.trunc()` instead
of `self - self.floor()`). If you want to use the GLSL implementation
you should now use `fract_gl()`.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
- The [`version`] field in `Cargo.toml` is optional for crates not
published on <https://crates.io>.
- We have several `publish = false` tools in this repository that still
have a version field, even when it's not useful.
[`version`]:
https://doc.rust-lang.org/cargo/reference/manifest.html#the-version-field
## Solution
- Remove the [`version`] field for all crates where `publish = false`.
- Update the description on a few crates and remove extra newlines as
well.
# Objective
- Fix#7303
- bevy would spawn a lot of tasks in parallel iteration when it matchs a
large storage and many small storage ,it significantly increase the
overhead of schedule.
## Solution
- collect small storage into one task
# Objective
This is a necessary precursor to #9122 (this was split from that PR to
reduce the amount of code to review all at once).
Moving `!Send` resource ownership to `App` will make it unambiguously
`!Send`. `SubApp` must be `Send`, so it can't wrap `App`.
## Solution
Refactor `App` and `SubApp` to not have a recursive relationship. Since
`SubApp` no longer wraps `App`, once `!Send` resources are moved out of
`World` and into `App`, `SubApp` will become unambiguously `Send`.
There could be less code duplication between `App` and `SubApp`, but
that would break `App` method chaining.
## Changelog
- `SubApp` no longer wraps `App`.
- `App` fields are no longer publicly accessible.
- `App` can no longer be converted into a `SubApp`.
- Various methods now return references to a `SubApp` instead of an
`App`.
## Migration Guide
- To construct a sub-app, use `SubApp::new()`. `App` can no longer
convert into `SubApp`.
- If you implemented a trait for `App`, you may want to implement it for
`SubApp` as well.
- If you're accessing `app.world` directly, you now have to use
`app.world()` and `app.world_mut()`.
- `App::sub_app` now returns `&SubApp`.
- `App::sub_app_mut` now returns `&mut SubApp`.
- `App::get_sub_app` now returns `Option<&SubApp>.`
- `App::get_sub_app_mut` now returns `Option<&mut SubApp>.`
# Objective
- Primitive meshing is suboptimal
- Improve primitive meshing
## Solution
- Add primitive meshing benchmark
- Allows measuring future improvements
---
First of a few PRs to refactor and improve primitive meshing.
# Objective
Fixes https://github.com/bevyengine/bevy/issues/11628
## Migration Guide
`Command` and `CommandQueue` have migrated from `bevy_ecs::system` to
`bevy_ecs::world`, so `use bevy_ecs::world::{Command, CommandQueue};`
when necessary.
# Objective
We deprecated quite a few APIs in 0.13. 0.13 has shipped already. It
should be OK to remove them in 0.14's release. Fixes#4059. Fixes#9011.
## Solution
Remove them.
# Objective
* Fixes#11932 (performance impact when stepping is disabled)
## Solution
The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in
#8453 caused a measurable performance impact even when stepping is
disabled. This can be seen by the benchmark of running `Schedule:run()`
on an empty schedule in a tight loop
(https://github.com/bevyengine/bevy/issues/11932#issuecomment-1950970236).
I was able to get the same performance results as on 0.12.1 by changing
the argument
`ScheduleExecutor::run()` from `Option<FixedBitSet>` to
`Option<&FixedBitSet>`. The down-side of this change is that
`Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when
stepping is enabled
---
## Changelog
* Change `ScheduleExecutor::run()` `_skipped_systems` from
`Option<FixedBitSet>` to `Option<&FixedBitSet>`
* Added a few benchmarks to measure `Schedule::run()` performance with
various executors
# Objective
Reduce the size of `bevy_utils`
(https://github.com/bevyengine/bevy/issues/11478)
## Solution
Move `EntityHash` related types into `bevy_ecs`. This also allows us
access to `Entity`, which means we no longer need `EntityHashMap`'s
first generic argument.
---
## Changelog
- Moved `bevy::utils::{EntityHash, EntityHasher, EntityHashMap,
EntityHashSet}` into `bevy::ecs::entity::hash` .
- Removed `EntityHashMap`'s first generic argument. It is now hardcoded
to always be `Entity`.
## Migration Guide
- Uses of `bevy::utils::{EntityHash, EntityHasher, EntityHashMap,
EntityHashSet}` now have to be imported from `bevy::ecs::entity::hash`.
- Uses of `EntityHashMap` no longer have to specify the first generic
parameter. It is now hardcoded to always be `Entity`.
# Objective
- The benchmarks for `bevy_ecs`' `iter_simple` group use `for` loops
instead of `World::spawn_batch`.
- There's a TODO comment that says to batch spawn them.
## Solution
- Replace the `for` loops with `World::spawn_batch`.
# Objective
It is unclear how to run Bevy's benchmarks
## Solution
Add a README to the benches, with documentation that tells you what the
benchmarks are, and how to run them.
---------
Co-authored-by: Rob Parrett <robparrett@gmail.com>
# Objective
- Implements change described in
https://github.com/bevyengine/bevy/issues/3022
- Goal is to allow Entity to benefit from niche optimization, especially
in the case of Option<Entity> to reduce memory overhead with structures
with empty slots
## Discussion
- First PR attempt: https://github.com/bevyengine/bevy/pull/3029
- Discord:
https://discord.com/channels/691052431525675048/1154573759752183808/1154573764240093224
## Solution
- Change `Entity::generation` from u32 to NonZeroU32 to allow for niche
optimization.
- The reason for changing generation rather than index is so that the
costs are only encountered on Entity free, instead of on Entity alloc
- There was some concern with generations being used, due to there being
some desire to introduce flags. This was more to do with the original
retirement approach, however, in reality even if generations were
reduced to 24-bits, we would still have 16 million generations available
before wrapping and current ideas indicate that we would be using closer
to 4-bits for flags.
- Additionally, another concern was the representation of relationships
where NonZeroU32 prevents us using the full address space, talking with
Joy it seems unlikely to be an issue. The majority of the time these
entity references will be low-index entries (ie. `ChildOf`, `Owes`),
these will be able to be fast lookups, and the remainder of the range
can use slower lookups to map to the address space.
- It has the additional benefit of being less visible to most users,
since generation is only ever really set through `from_bits` type
methods.
- `EntityMeta` was changed to match
- On free, generation now explicitly wraps:
- Originally, generation would panic in debug mode and wrap in release
mode due to using regular ops.
- The first attempt at this PR changed the behavior to "retire" slots
and remove them from use when generations overflowed. This change was
controversial, and likely needs a proper RFC/discussion.
- Wrapping matches current release behaviour, and should therefore be
less controversial.
- Wrapping also more easily migrates to the retirement approach, as
users likely to exhaust the exorbitant supply of generations will code
defensively against aliasing and that defensive code is less likely to
break than code assuming that generations don't wrap.
- We use some unsafe code here when wrapping generations, to avoid
branch on NonZeroU32 construction. It's guaranteed safe due to how we
perform wrapping and it results in significantly smaller ASM code.
- https://godbolt.org/z/6b6hj8PrM
## Migration
- Previous `bevy_scene` serializations have a high likelihood of being
broken, as they contain 0th generation entities.
## Current Issues
- `Entities::reserve_generations` and `EntityMapper` wrap now, even in
debug - although they technically did in release mode already so this
probably isn't a huge issue. It just depends if we need to change
anything here?
---------
Co-authored-by: Natalie Baker <natalie.baker@advancednavigation.com>
Update to `glam` 0.25, `encase` 0.7 and `hexasphere` to 10.0
## Changelog
Added the `FloatExt` trait to the `bevy_math` prelude which adds `lerp`,
`inverse_lerp` and `remap` methods to the `f32` and `f64` types.
# Objective
After #6547, `Query::for_each` has been capable of automatic
vectorization on certain queries, which is seeing a notable (>50% CPU
time improvements) for iteration. However, `Query::for_each` isn't
idiomatic Rust, and lacks the flexibility of iterator combinators.
Ideally, `Query::iter` and friends should be able to achieve the same
results. However, this does seem to blocked upstream
(rust-lang/rust#104914) by Rust's loop optimizations.
## Solution
This is an intermediate solution and refactor. This moves the
`Query::for_each` implementation onto the `Iterator::fold`
implementation for `QueryIter` instead. This should result in the same
automatic vectorization optimization on all `Iterator` functions that
internally use fold, including `Iterator::for_each`, `Iterator::count`,
etc.
With this, it should close the gap between the two completely.
Internally, this PR changes `Query::for_each` to use
`query.iter().for_each(..)` instead of the duplicated implementation.
Separately, the duplicate implementations of internal iteration (i.e.
`Query::par_for_each`) now use portions of the current `Query::for_each`
implementation factored out into their own functions.
This also massively cleans up our internal fragmentation of internal
iteration options, deduplicating the iteration code used in `for_each`
and `par_iter().for_each()`.
---
## Changelog
Changed: `Query::for_each`, `Query::for_each_mut`, `Query::for_each`,
and `Query::for_each_mut` have been moved to `QueryIter`'s
`Iterator::for_each` implementation, and still retains their performance
improvements over normal iteration. These APIs are deprecated in 0.13
and will be removed in 0.14.
---------
Co-authored-by: JoJoJet <21144246+JoJoJet@users.noreply.github.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
- Format `benches` crate to match current Rust standards.
## Solution
- Ran `cargo fmt` in the `benches` crate.
## Notes
I accidentally came across this when working on the `Drop`
implementation for `CommandQueue` and rather embarrassingly let it sneak
into my PR there. I think it makes sense to ensure this crate is also
well formatted to avoid it in the future.
(This is my first PR here, so I've probably missed some things. Please
let me know what else I should do to help you as a reviewer!)
# Objective
Due to https://github.com/rust-lang/rust/issues/117800, the `derive`'d
`PartialEq::eq` on `Entity` isn't as good as it could be. Since that's
used in hashtable lookup, let's improve it.
## Solution
The derived `PartialEq::eq` short-circuits if the generation doesn't
match. However, having a branch there is sub-optimal, especially on
64-bit systems like x64 that could just load the whole `Entity` in one
load anyway.
Due to complications around `poison` in LLVM and the exact details of
what unsafe code is allowed to do with reference in Rust
(https://github.com/rust-lang/unsafe-code-guidelines/issues/346), LLVM
isn't allowed to completely remove the short-circuiting. `&Entity` is
marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8
bytes -- and does so -- but it has to assume that the `index` might be
undef/poison if the `generation` doesn't match, and thus while it finds
a way to do it without needing a branch, it has to do something slightly
more complicated than optimal to combine the results. (LLVM is allowed
to change non-short-circuiting code to use branches, but not the other
way around.)
Here's a link showing the codegen today:
<https://rust.godbolt.org/z/9WzjxrY7c>
```rust
#[no_mangle]
pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool {
a == b
}
```
ends up generating the following assembly:
```asm
demo_eq_ref:
movq xmm0, qword ptr [rdi]
movq xmm1, qword ptr [rsi]
pcmpeqd xmm1, xmm0
pshufd xmm0, xmm1, 80
movmskpd eax, xmm0
cmp eax, 3
sete al
ret
```
(It's usually not this bad in real uses after inlining and LTO, but it
makes a strong demo.)
This PR manually implements `PartialEq::eq` *without* short-circuiting,
and because that tells LLVM that neither the generations nor the index
can be poison, it doesn't need to be so careful and can generate the
"just compare the two 64-bit values" code you'd have probably already
expected:
```asm
demo_eq_ref:
mov rax, qword ptr [rsi]
cmp qword ptr [rdi], rax
sete al
ret
```
Since this doesn't change the representation of `Entity`, if it's
instead passed by *value*, then each `Entity` is two `u32` registers,
and the old and the new code do exactly the same thing. (Other
approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect
this case.)
This should hopefully merge easily with changes like
https://github.com/bevyengine/bevy/pull/9907 that also want to change
`Entity`.
## Benchmarks
I'm not super-confident that I got my machine fully consistent for
benchmarking, but whether I run the old or the new one first I get
reasonably consistent results.
Here's a fairly typical example of the benchmarks I added in this PR:
![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1)
Building the sets seems to be basically the same. It's usually reported
as noise, but sometimes I see a few percent slower or faster.
But lookup hits in particular -- since a hit checks that the key is
equal -- consistently shows around 10% improvement.
`cargo run --example many_cubes --features bevy/trace_tracy --release --
--benchmark` showed as slightly faster with this change, though if I had
to bet I'd probably say it's more noise than meaningful (but at least
it's not worse either):
![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0)
This is my first PR here -- and my first time running Tracy -- so please
let me know what else I should run, or run things on your own more
reliable machines to double-check.
---
## Changelog
(probably not worth including)
Changed: micro-optimized `Entity::eq` to help LLVM slightly.
## Migration Guide
(I really hope nobody was using this on uninitialized entities where
sufficiently tortured `unsafe` could could technically notice that this
has changed.)
# Objective
Reduce code duplication and improve APIs of Bevy's [global
taskpools](https://github.com/bevyengine/bevy/blob/main/crates/bevy_tasks/src/usages.rs).
## Solution
- As all three of the global taskpools have identical implementations
and only differ in their identifiers, this PR moves the implementation
into a macro to reduce code duplication.
- The `init` method is renamed to `get_or_init` to more accurately
reflect what it really does.
- Add a new `try_get` method that just returns `None` when the pool is
uninitialized, to complement the other getter methods.
- Minor documentation improvements to accompany the above changes.
---
## Changelog
- Added a new `try_get` method to the global TaskPools
- The global TaskPools' `init` method has been renamed to `get_or_init`
for clarity
- Documentation improvements
## Migration Guide
- Uses of `ComputeTaskPool::init`, `AsyncComputeTaskPool::init` and
`IoTaskPool::init` should be changed to `::get_or_init`.
# Objective
It is difficult to inspect the generated assembly of benchmark systems
using a tool such as `cargo-asm`
## Solution
Mark the related functions as `#[inline(never)]`. This way, you can pass
the module name as argument to `cargo-asm` to get the generated assembly
for the given function.
It may have as side effect to make benchmarks a bit more predictable and
useful too. As it prevents inlining where in bevy no inlining could
possibly take place.
### Measurements
Following the recommendations in
<https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>,
I
1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of
disabling turboboost, giving more consistent performances
2. Disabled all hyperthreading cores using `echo 0 >
/sys/devices/system/cpu/cpu{11,12…}/online`
3. Set the scaling governor to `performance`
4. Manually disabled AMD boost with `echo 0 >
/sys/devices/system/cpu/cpufreq/boost`
5. Set the nice level of the criterion benchmark using `cargo bench … &
sudo renice -n -5 -p $! ; fg`
6. Not running any other program than the benchmarks (outside of system
daemons and the X11 server)
With this setup, running multiple times the same benchmarks on `main`
gives me a lot of "regression" and "improvement" messages, which is
absurd given that no code changed.
On this branch, there is still some spurious performance change
detection, but they are much less frequent.
This only accounts for `iter_simple` and `iter_frag` benchmarks of
course.
# Objective
`no_archetype` benchmark group results were very noisy
## Solution
Use the `SingeThreaded` executor.
On my machine, this makes the `no_archetype` bench group 20 to 30 times
faster. Meaning that most of the runtime was accounted by the
multithreaded scheduler. ie: the benchmark was not testing system
archetype update, but the overhead of multithreaded scheduling.
With this change, the benchmark results are more meaningful.
The add_archetypes function is also simplified.