Commit graph

1088 commits

Author SHA1 Message Date
Doonv
ba0f8f996f
Add insert_state to App. (#11043)
# Objective

Fix #10731.

## Solution

Rename `App::add_state<T>(&mut self)` to `init_state`, and add
`App::insert_state<T>(&mut self, state: T)`. I decided on these names
because they are more similar to `init_resource` and `insert_resource`.

I also removed the `States` trait's requirement for `Default`. Instead,
`init_state` requires `FromWorld`.

---

## Changelog

- Renamed `App::add_state` to `init_state`.
- Added `App::insert_state`.
- Removed the `States` trait's requirement for `Default`.

## Migration Guide

- Renamed `App::add_state` to `init_state`.
2023-12-21 14:09:24 +00:00
Rob Parrett
7d2e6cb468
Fix typo in docs for Has (#11028)
# Objective

Fix typo

## Solution

Put the letter `i` in there.
2023-12-19 17:59:34 +00:00
Nicola Papale
d64e148e4e
Add non-existent entity behavior to Has doc (#11025)
# Objective

`Has<T>` in some niche cases may behave in an unexpected way.

Specifically, when using `Query::get` on a `Has<T>` with a despawned
entity.

## Solution

Add precision about cases wehre `Query::get` could return an `Err`.
2023-12-19 16:34:55 +00:00
Stepan Koltsov
0c2df27930
Rename some lifetimes (ResMut etc) for clarity (#11021)
Use `'w` for world lifetime consistently.

When implementing system params, useful to look at how other params are
implemented. `'w` makes it clear it is world, not state.
2023-12-19 15:22:25 +00:00
JMS55
9a89fc44f4
Add is_resource_changed_by_id + is_resource_added_by_id (#11012)
# Objective

- Allow checking if a resource has changed by its ComponentId

---

## Changelog
- Added `World::is_resource_changed_by_id()` and
`World::is_resource_added_by_id()`.
2023-12-18 01:44:33 +00:00
Tygyh
63d17e8494
Simplify equality assertions (#10988)
# Objective

- Shorten assertions.

## Solution

- Replace '==' assertions with 'assert_eq()' and '!=' assertions with
'assert_ne()' .
2023-12-16 23:58:41 +00:00
Tygyh
696af48416
Remove unnecessary parentheses (#10990)
# Objective

- Increase readability.

## Solution

- Remove unnecessary parentheses.
2023-12-16 02:26:18 +00:00
Joseph
11065974d4
Simplify lifetimes in QueryState methods (#10937)
# Objective

The definition of several `QueryState` methods use unnecessary explicit
lifetimes, which adds to visual noise.

## Solution

Elide the lifetimes.
2023-12-14 17:26:03 +00:00
Mike
6b84ba97a3
Auto insert sync points (#9822)
# Objective

- Users are often confused when their command effects are not visible in
the next system. This PR auto inserts sync points if there are deferred
buffers on a system and there are dependents on that system (systems
with after relationships).
- Manual sync points can lead to users adding more than needed and it's
hard for the user to have a global understanding of their system graph
to know which sync points can be merged. However we can easily calculate
which sync points can be merged automatically.

## Solution

1. Add new edge types to allow opting out of new behavior
2. Insert an sync point for each edge whose initial node has deferred
system params.
3. Reuse nodes if they're at the number of sync points away.

* add opt outs for specific edges with `after_ignore_deferred`,
`before_ignore_deferred` and `chain_ignore_deferred`. The
`auto_insert_apply_deferred` boolean on `ScheduleBuildSettings` can be
set to false to opt out for the whole schedule.

## Perf
This has a small negative effect on schedule build times.
```text
group                                           auto-sync                              main-for-auto-sync
-----                                           -----------                            ------------------
build_schedule/1000_schedule                    1.06       2.8±0.15s        ? ?/sec    1.00       2.7±0.06s        ? ?/sec
build_schedule/1000_schedule_noconstraints      1.01     26.2±0.88ms        ? ?/sec    1.00     25.8±0.36ms        ? ?/sec
build_schedule/100_schedule                     1.02     13.1±0.33ms        ? ?/sec    1.00     12.9±0.28ms        ? ?/sec
build_schedule/100_schedule_noconstraints       1.08   505.3±29.30µs        ? ?/sec    1.00   469.4±12.48µs        ? ?/sec
build_schedule/500_schedule                     1.00    485.5±6.29ms        ? ?/sec    1.00    485.5±9.80ms        ? ?/sec
build_schedule/500_schedule_noconstraints       1.00      6.8±0.10ms        ? ?/sec    1.02      6.9±0.16ms        ? ?/sec
```
---

## Changelog

- Auto insert sync points and added `after_ignore_deferred`,
`before_ignore_deferred`, `chain_no_deferred` and
`auto_insert_apply_deferred` APIs to opt out of this behavior

## Migration Guide

- `apply_deferred` points are added automatically when there is ordering
relationship with a system that has deferred parameters like `Commands`.
If you want to opt out of this you can switch from `after`, `before`,
and `chain` to the corresponding `ignore_deferred` API,
`after_ignore_deferred`, `before_ignore_deferred` or
`chain_ignore_deferred` for your system/set ordering.
- You can also set `ScheduleBuildSettings::auto_insert_sync_points` to
`false` if you want to do it for the whole schedule. Note that in this
mode you can still add `apply_deferred` points manually.
- For most manual insertions of `apply_deferred` you should remove them
as they cannot be merged with the automatically inserted points and
might reduce parallelizability of the system graph.

## TODO
- [x] remove any apply_deferred used in the engine
- [x] ~~decide if we should deprecate manually using apply_deferred.~~
We'll still allow inserting manual sync points for now for whatever edge
cases users might have.
- [x] Update migration guide
- [x] rerun schedule build benchmarks

---------

Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
2023-12-14 16:34:01 +00:00
Tygyh
b2661ea73d
Reorder impl to be the same as the trait (#10964)
# Objective

- Make the implementation order consistent between all sources to fit
the order in the trait.

## Solution

- Change the implementation order.
2023-12-13 21:19:49 +00:00
Federico Rinaldi
9c78128e8f
Rename Q type parameter to D when referring to WorldQueryData (#10782)
# Objective

Since #10776 split `WorldQuery` to `WorldQueryData` and
`WorldQueryFilter`, it should be clear that the query is actually
composed of two parts. It is not factually correct to call "query" only
the data part. Therefore I suggest to rename the `Q` parameter to `D` in
`Query` and related items.

As far as I know, there shouldn't be breaking changes from renaming
generic type parameters.

## Solution

I used a combination of rust-analyzer go to reference and `Ctrl-F`ing
various patterns to catch as many cases as possible. Hopefully I got
them all. Feel free to check if you're concerned of me having missed
some.

## Notes

This and #10779 have many lines in common, so merging one will cause a
lot of merge conflicts to the other.

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-12-13 18:50:46 +00:00
Stepan Koltsov
12a11e2fd0
Actually check alignment in BlobVec test aligned_zst (#10885)
Do not rely on Miri.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2023-12-13 02:36:49 +00:00
Sludge
41db723c5c
Fix soundness of UnsafeWorldCell usage example (#10941)
# Objective

- The example in the docs is unsound.

Demo:

```rust
#[derive(Resource)]
struct MyRes(u32);

fn main() {
    let mut w = World::new();
    w.insert_resource(MyRes(0));

    let (mut res, comp) = split_world_access(&mut w);
    let mut r1 = res.get_resource_mut::<MyRes>().unwrap();
    let mut r2 = res.get_resource_mut::<MyRes>().unwrap();

    *r1 = MyRes(1);
    *r2 = MyRes(2);
}
```

The API in the example allows aliasing mutable references to the same
resource. Miri also complains when running this.

## Solution

- Change the example API to make the returned `Mut` borrow from the
`OnlyResourceAccessWorld` instead of borrowing from the world via `'w`.
This prevents obtaining more than one `Mut` at the same time from it.
2023-12-13 00:55:30 +00:00
Mantas
5af2f022d8
Rename WorldQueryData & WorldQueryFilter to QueryData & QueryFilter (#10779)
# Rename `WorldQueryData` & `WorldQueryFilter` to `QueryData` &
`QueryFilter`

Fixes #10776 

## Solution

Traits `WorldQueryData` & `WorldQueryFilter` were renamed to `QueryData`
and `QueryFilter`, respectively. Related Trait types were also renamed.

---

## Changelog

- Trait `WorldQueryData` has been renamed to `QueryData`. Derive macro's
`QueryData` attribute `world_query_data` has been renamed to
`query_data`.
- Trait `WorldQueryFilter` has been renamed to `QueryFilter`. Derive
macro's `QueryFilter` attribute `world_query_filter` has been renamed to
`query_filter`.
- Trait's `ExtractComponent` type `Query` has been renamed to `Data`.
- Trait's `GetBatchData` types `Query` & `QueryFilter` has been renamed
to `Data` & `Filter`, respectively.
- Trait's `ExtractInstance` type `Query` has been renamed to `Data`.
- Trait's `ViewNode` type `ViewQuery` has been renamed to `ViewData`.
- Trait's `RenderCommand` types `ViewWorldQuery` & `ItemWorldQuery` has
been renamed to `ViewData` & `ItemData`, respectively.

## Migration Guide

Note: if merged before 0.13 is released, this should instead modify the
migration guide of #10776 with the updated names.

- Rename `WorldQueryData` & `WorldQueryFilter` trait usages to
`QueryData` & `QueryFilter` and their respective derive macro attributes
`world_query_data` & `world_query_filter` to `query_data` &
`query_filter`.
- Rename the following trait type usages:
  - Trait's `ExtractComponent` type `Query` to `Data`.
  - Trait's `GetBatchData` type `Query` to `Data`.
  - Trait's `ExtractInstance` type `Query` to `Data`.
  - Trait's `ViewNode` type `ViewQuery` to `ViewData`'
- Trait's `RenderCommand` types `ViewWolrdQuery` & `ItemWorldQuery` to
`ViewData` & `ItemData`, respectively.

```rust
// Before
#[derive(WorldQueryData)]
#[world_query_data(derive(Debug))]
struct EmptyQuery {
    empty: (),
}

// After
#[derive(QueryData)]
#[query_data(derive(Debug))]
struct EmptyQuery {
    empty: (),
}

// Before
#[derive(WorldQueryFilter)]
struct CustomQueryFilter<T: Component, P: Component> {
    _c: With<ComponentC>,
    _d: With<ComponentD>,
    _or: Or<(Added<ComponentC>, Changed<ComponentD>, Without<ComponentZ>)>,
    _generic_tuple: (With<T>, With<P>),
}

// After
#[derive(QueryFilter)]
struct CustomQueryFilter<T: Component, P: Component> {
    _c: With<ComponentC>,
    _d: With<ComponentD>,
    _or: Or<(Added<ComponentC>, Changed<ComponentD>, Without<ComponentZ>)>,
    _generic_tuple: (With<T>, With<P>),
}

// Before
impl ExtractComponent for ContrastAdaptiveSharpeningSettings {
    type Query = &'static Self;
    type Filter = With<Camera>;
    type Out = (DenoiseCAS, CASUniform);

    fn extract_component(item: QueryItem<Self::Query>) -> Option<Self::Out> {
        //...
    }
}

// After
impl ExtractComponent for ContrastAdaptiveSharpeningSettings {
    type Data = &'static Self;
    type Filter = With<Camera>;
    type Out = (DenoiseCAS, CASUniform);

    fn extract_component(item: QueryItem<Self::Data>) -> Option<Self::Out> {
        //...
    }
}

// Before
impl GetBatchData for MeshPipeline {
    type Param = SRes<RenderMeshInstances>;
    type Query = Entity;
    type QueryFilter = With<Mesh3d>;
    type CompareData = (MaterialBindGroupId, AssetId<Mesh>);
    type BufferData = MeshUniform;

    fn get_batch_data(
        mesh_instances: &SystemParamItem<Self::Param>,
        entity: &QueryItem<Self::Query>,
    ) -> (Self::BufferData, Option<Self::CompareData>) {
        // ....
    }
}

// After
impl GetBatchData for MeshPipeline {
    type Param = SRes<RenderMeshInstances>;
    type Data = Entity;
    type Filter = With<Mesh3d>;
    type CompareData = (MaterialBindGroupId, AssetId<Mesh>);
    type BufferData = MeshUniform;

    fn get_batch_data(
        mesh_instances: &SystemParamItem<Self::Param>,
        entity: &QueryItem<Self::Data>,
    ) -> (Self::BufferData, Option<Self::CompareData>) {
        // ....
    }
}

// Before
impl<A> ExtractInstance for AssetId<A>
where
    A: Asset,
{
    type Query = Read<Handle<A>>;
    type Filter = ();

    fn extract(item: QueryItem<'_, Self::Query>) -> Option<Self> {
        Some(item.id())
    }
}

// After
impl<A> ExtractInstance for AssetId<A>
where
    A: Asset,
{
    type Data = Read<Handle<A>>;
    type Filter = ();

    fn extract(item: QueryItem<'_, Self::Data>) -> Option<Self> {
        Some(item.id())
    }
}

// Before
impl ViewNode for PostProcessNode {
    type ViewQuery = (
        &'static ViewTarget,
        &'static PostProcessSettings,
    );

    fn run(
        &self,
        _graph: &mut RenderGraphContext,
        render_context: &mut RenderContext,
        (view_target, _post_process_settings): QueryItem<Self::ViewQuery>,
        world: &World,
    ) -> Result<(), NodeRunError> {
        // ...
    }
}

// After
impl ViewNode for PostProcessNode {
    type ViewData = (
        &'static ViewTarget,
        &'static PostProcessSettings,
    );

    fn run(
        &self,
        _graph: &mut RenderGraphContext,
        render_context: &mut RenderContext,
        (view_target, _post_process_settings): QueryItem<Self::ViewData>,
        world: &World,
    ) -> Result<(), NodeRunError> {
        // ...
    }
}

// Before
impl<P: CachedRenderPipelinePhaseItem> RenderCommand<P> for SetItemPipeline {
    type Param = SRes<PipelineCache>;
    type ViewWorldQuery = ();
    type ItemWorldQuery = ();
    #[inline]
    fn render<'w>(
        item: &P,
        _view: (),
        _entity: (),
        pipeline_cache: SystemParamItem<'w, '_, Self::Param>,
        pass: &mut TrackedRenderPass<'w>,
    ) -> RenderCommandResult {
        // ...
    }
}

// After
impl<P: CachedRenderPipelinePhaseItem> RenderCommand<P> for SetItemPipeline {
    type Param = SRes<PipelineCache>;
    type ViewData = ();
    type ItemData = ();
    #[inline]
    fn render<'w>(
        item: &P,
        _view: (),
        _entity: (),
        pipeline_cache: SystemParamItem<'w, '_, Self::Param>,
        pass: &mut TrackedRenderPass<'w>,
    ) -> RenderCommandResult {
        // ...
    }
}
```
2023-12-12 19:45:50 +00:00
Stepan Koltsov
79641c7f08
Reorder fields in SystemSchedule (#10764)
Make more clear what is going on there.
2023-12-12 19:45:44 +00:00
Ben Frankel
9c8576996f
Add a doc note about despawn footgun (#10889)
# Objective

The `Despawn` command breaks the hierarchy whenever you use it if the
despawned entity has a parent or any children. This is a serious footgun
because the `Despawn` command has the shortest name, the behavior is
unexpected and not likely to be what you want, and the crash that it
causes can be very difficult to track down.

## Solution

Until this can be fixed by relations, add a note mentioning the footgun
in the documentation.
2023-12-12 19:44:05 +00:00
TheBigCheese
6a15ed564d
Improve EntityWorldMut.remove, retain and despawn docs by linking to more detail (#10943)
## Solution

`Commands.remove` and `.retain` (because I copied `remove`s doc)
referenced `EntityWorldMut.remove` and `retain` for more detail but the
`Commands` docs are much more detailed (which makes sense because it is
the most common api), so I have instead inverted this so that
`EntityWorldMut` docs link to `Commands`.

I also made `EntityWorldMut.despawn` reference `World.despawn` for more
details, like `Commands.despawn` does.
2023-12-12 19:27:11 +00:00
Nicola Papale
d2614f2d80
Add a couple assertions for system types (#10893)
# Objective

Test more complex function signatures for exclusive systems, and test
that `StaticSystemParam` is indeed a `SystemParam`.

I mean, it currently works, but might as well add a test for it.
2023-12-06 20:35:46 +00:00
TheBigCheese
9da65b10b4
Add EntityCommands.retain and EntityWorldMut.retain (#10873)
# Objective
Adds `EntityCommands.retain` and `EntityWorldMut.retain` to remove all
components except the given bundle from the entity.
Fixes #10865.

## Solution

I added a private unsafe function in `EntityWorldMut` called
`remove_bundle_info` which performs the shared behaviour of `remove` and
`retain`, namely taking a `BundleInfo` of components to remove, and
removing them from the given entity. Then `retain` simply gets all the
components on the entity and filters them by whether they are in the
bundle it was passed, before passing this `BundleInfo` into
`remove_bundle_info`.

`EntityCommands.retain` just creates a new type `Retain` which runs
`EntityWorldMut.retain` when run.

---

## Changelog

Added `EntityCommands.retain` and `EntityWorldMut.retain`, which remove
all components except the given bundle from the entity, they can also be
used to remove all components by passing `()` as the bundle.
2023-12-05 15:37:33 +00:00
Zachary Harrold
72adf2ae2a
Reduced TableRow as Casting (#10811)
# Objective

- Fixes #10806

## Solution

Replaced `new` and `index` methods for both `TableRow` and `TableId`
with `from_*` and `as_*` methods. These remove the need to perform
casting at call sites, reducing the total number of casts in the Bevy
codebase. Within these methods, an appropriate `debug_assertion` ensures
the cast will behave in an expected manner (no wrapping, etc.). I am
using a `debug_assertion` instead of an `assert` to reduce any possible
runtime overhead, however minimal. This choice is something I am open to
changing (or leaving up to another PR) if anyone has any strong
arguments for it.

---

## Changelog

- `ComponentSparseSet::sparse` stores a `TableRow` instead of a `u32`
(private change)
- Replaced `TableRow::new` and `TableRow::index` methods with
`TableRow::from_*` and `TableRow::as_*`, with `debug_assertions`
protecting any internal casting.
- Replaced `TableId::new` and `TableId::index` methods with
`TableId::from_*` and `TableId::as_*`, with `debug_assertions`
protecting any internal casting.
- All `TableId` methods are now `const`

## Migration Guide

- `TableRow::new` -> `TableRow::from_usize`
- `TableRow::index` -> `TableRow::as_usize`
- `TableId::new` -> `TableId::from_usize`
- `TableId::index` -> `TableId::as_usize`

---

## Notes

I have chosen to remove the `index` and `new` methods for the following
chain of reasoning:

- Across the codebase, `new` was called with a mixture of `u32` and
`usize` values. Likewise for `index`.
- Choosing `new` to either be `usize` or `u32` would break half of these
call-sites, requiring `as` casting at the site.
- Adding a second method `new_u32` or `new_usize` avoids the above, bu
looks visually inconsistent.
- Therefore, they should be replaced with `from_*` and `as_*` methods
instead.

Worth noting is that by updating `ComponentSparseSet`, there are now
zero instances of interacting with the inner value of `TableRow` as a
`u32`, it is exclusively used as a `usize` value (due to interactions
with methods like `len` and slice indexing). I have left the `as_u32`
and `from_u32` methods as the "proper" constructors/getters.
2023-12-05 02:44:33 +00:00
Stepan Koltsov
2653adf5d8
Make ComponentId typed in Components (#10770)
Trying to understand code. Types help.
2023-12-05 01:54:27 +00:00
Nathan
24c6a7df05
Clarifying Commands' purpose (#10837)
# Objective
As described in [Issue
#10805](https://github.com/bevyengine/bevy/issues/10805) I have changed
"impactful changes" to "structural changes"

## Solution
Updated the text "impactful" to "structural"
2023-12-02 10:07:19 +00:00
AxiomaticSemantics
b4c33da149
Fix typos in safety comment (#10827)
# Objective

- Minor fix for typos in safety comment

- Fix the typos.

Co-authored-by: ebola <dev@axiomatic>
2023-12-01 18:07:16 +00:00
Aldrich Suratos
cbf39b7eab
Deprecate QueryState::for_each_unchecked (#10815)
# Objective

Resolves Issue #10772.

## Solution

Added the deprecated warning for QueryState::for_each_unchecked, as
noted in the comments of PR #6773.
Followed the wording in the deprecation messages for `for_each` and
`for_each_mut`
2023-12-01 09:48:16 +00:00
James Liu
2148518758
Override QueryIter::fold to port Query::for_each perf gains to select Iterator combinators (#6773)
# Objective
After #6547, `Query::for_each` has been capable of automatic
vectorization on certain queries, which is seeing a notable (>50% CPU
time improvements) for iteration. However, `Query::for_each` isn't
idiomatic Rust, and lacks the flexibility of iterator combinators.

Ideally, `Query::iter` and friends should be able to achieve the same
results. However, this does seem to blocked upstream
(rust-lang/rust#104914) by Rust's loop optimizations.

## Solution
This is an intermediate solution and refactor. This moves the
`Query::for_each` implementation onto the `Iterator::fold`
implementation for `QueryIter` instead. This should result in the same
automatic vectorization optimization on all `Iterator` functions that
internally use fold, including `Iterator::for_each`, `Iterator::count`,
etc.

With this, it should close the gap between the two completely.
Internally, this PR changes `Query::for_each` to use
`query.iter().for_each(..)` instead of the duplicated implementation.

Separately, the duplicate implementations of internal iteration (i.e.
`Query::par_for_each`) now use portions of the current `Query::for_each`
implementation factored out into their own functions.

This also massively cleans up our internal fragmentation of internal
iteration options, deduplicating the iteration code used in `for_each`
and `par_iter().for_each()`.

---

## Changelog
Changed: `Query::for_each`, `Query::for_each_mut`, `Query::for_each`,
and `Query::for_each_mut` have been moved to `QueryIter`'s
`Iterator::for_each` implementation, and still retains their performance
improvements over normal iteration. These APIs are deprecated in 0.13
and will be removed in 0.14.

---------

Co-authored-by: JoJoJet <21144246+JoJoJet@users.noreply.github.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-12-01 09:09:55 +00:00
Stepan Koltsov
d04a2204f5
Make IntoSystemConfigs::into_configs public API (visible in docs) (#10624)
`IntoSystemConfigs::into_configs` function is public, but hidden from
documentation. This PR makes it visible.

Fixes #10622.
2023-11-29 14:38:37 +00:00
tygyh
fd308571c4
Remove unnecessary path prefixes (#10749)
# Objective

- Shorten paths by removing unnecessary prefixes

## Solution

- Remove the prefixes from many paths which do not need them. Finding
the paths was done automatically using built-in refactoring tools in
Jetbrains RustRover.
2023-11-28 23:43:40 +00:00
scottmcm
a902ea6f85
Save an instruction in EntityHasher (#10648)
# Objective

Keep essentially the same structure of `EntityHasher` from #9903, but
rephrase the multiplication slightly to save an instruction.

cc @superdump 
Discord thread:
https://discord.com/channels/691052431525675048/1172033156845674507/1174969772522356756

## Solution

Today, the hash is
```rust
        self.hash = i | (i.wrapping_mul(FRAC_U64MAX_PI) << 32);
```
with `i` being `(generation << 32) | index`.

Expanding things out, we get
```rust
i | ( (i * CONST) << 32 )
= (generation << 32) | index | ((((generation << 32) | index) * CONST) << 32)
= (generation << 32) | index | ((index * CONST) << 32)  // because the generation overflowed
= (index * CONST | generation) << 32 | index
```

What if we do the same thing, but with `+` instead of `|`? That's almost
the same thing, except that it has carries, which are actually often
better in a hash function anyway, since it doesn't saturate. (`|` can be
dangerous, since once something becomes `-1` it'll stay that, and
there's no mixing available.)

```rust
(index * CONST + generation) << 32 + index
= (CONST << 32 + 1) * index + generation << 32
= (CONST << 32 + 1) * index + (WHATEVER << 32 + generation) << 32 // because the extra overflows and thus can be anything
= (CONST << 32 + 1) * index + ((CONST * generation) << 32 + generation) << 32 // pick "whatever" to be something convenient
= (CONST << 32 + 1) * index + ((CONST << 32 + 1) * generation) << 32
= (CONST << 32 + 1) * index +((CONST << 32 + 1) * (generation << 32)
= (CONST << 32 + 1) * (index + generation << 32)
= (CONST << 32 + 1) * (generation << 32 | index)
= (CONST << 32 + 1) * i
```

So we can do essentially the same thing using a single multiplication
instead of doing multiply-shift-or.

LLVM was already smart enough to merge the shifting into a
multiplication, but this saves the extra `or`:

![image](https://github.com/bevyengine/bevy/assets/18526288/d9396614-2326-4730-abbe-4908c01b5ace)
<https://rust.godbolt.org/z/MEvbz4eo4>

It's a very small change, and often will disappear in load latency
anyway, but it's a couple percent faster in lookups:

![image](https://github.com/bevyengine/bevy/assets/18526288/c365ec85-6adc-4f6d-8fa6-a65146f55a75)

(There was more of an improvement here before #10558, but with `to_bits`
being a single `qword` load now, keeping things mostly as it is turned
out to be better than the bigger changes I'd tried in #10605.)

---

## Changelog

(Probably skip it)

## Migration Guide

(none needed)
2023-11-28 12:37:30 +00:00
Kanabenki
0e9f6e92ea
Add clippy::manual_let_else at warn level to lints (#10684)
# Objective

Related to #10612.

Enable the
[`clippy::manual_let_else`](https://rust-lang.github.io/rust-clippy/master/#manual_let_else)
lint as a warning. The `let else` form seems more idiomatic to me than a
`match`/`if else` that either match a pattern or diverge, and from the
clippy doc, the lint doesn't seem to have any possible false positive.

## Solution

Add the lint as warning in `Cargo.toml`, refactor places where the lint
triggers.
2023-11-28 04:15:27 +00:00
Mark Wainwright
f0a8994f55
Split WorldQuery into WorldQueryData and WorldQueryFilter (#9918)
# Objective

- Fixes #7680
- This is an updated for https://github.com/bevyengine/bevy/pull/8899
which had the same objective but fell a long way behind the latest
changes


## Solution

The traits `WorldQueryData : WorldQuery` and `WorldQueryFilter :
WorldQuery` have been added and some of the types and functions from
`WorldQuery` has been moved into them.

`ReadOnlyWorldQuery` has been replaced with `ReadOnlyWorldQueryData`. 

`WorldQueryFilter` is safe (as long as `WorldQuery` is implemented
safely).

`WorldQueryData` is unsafe - safely implementing it requires that
`Self::ReadOnly` is a readonly version of `Self` (this used to be a
safety requirement of `WorldQuery`)

The type parameters `Q` and `F` of `Query` must now implement
`WorldQueryData` and `WorldQueryFilter` respectively.

This makes it impossible to accidentally use a filter in the data
position or vice versa which was something that could lead to bugs.
~~Compile failure tests have been added to check this.~~

It was previously sometimes useful to use `Option<With<T>>` in the data
position. Use `Has<T>` instead in these cases.

The `WorldQuery` derive macro has been split into separate derive macros
for `WorldQueryData` and `WorldQueryFilter`.

Previously it was possible to derive both `WorldQuery` for a struct that
had a mixture of data and filter items. This would not work correctly in
some cases but could be a useful pattern in others. *This is no longer
possible.*

---

## Notes

- The changes outside of `bevy_ecs` are all changing type parameters to
the new types, updating the macro use, or replacing `Option<With<T>>`
with `Has<T>`.

- All `WorldQueryData` types always returned `true` for `IS_ARCHETYPAL`
so I moved it to `WorldQueryFilter` and
replaced all calls to it with `true`. That should be the only logic
change outside of the macro generation code.

- `Changed<T>` and `Added<T>` were being generated by a macro that I
have expanded. Happy to revert that if desired.

- The two derive macros share some functions for implementing
`WorldQuery` but the tidiest way I could find to implement them was to
give them a ton of arguments and ask clippy to ignore that.

## Changelog

### Changed
- Split `WorldQuery` into `WorldQueryData` and `WorldQueryFilter` which
now have separate derive macros. It is not possible to derive both for
the same type.
- `Query` now requires that the first type argument implements
`WorldQueryData` and the second implements `WorldQueryFilter`

## Migration Guide

- Update derives

```rust
// old
#[derive(WorldQuery)]
#[world_query(mutable, derive(Debug))]
struct CustomQuery {
    entity: Entity,
    a: &'static mut ComponentA
}

#[derive(WorldQuery)]
struct QueryFilter {
    _c: With<ComponentC>
}

// new 
#[derive(WorldQueryData)]
#[world_query_data(mutable, derive(Debug))]
struct CustomQuery {
    entity: Entity,
    a: &'static mut ComponentA,
}

#[derive(WorldQueryFilter)]
struct QueryFilter {
    _c: With<ComponentC>
}
```
- Replace `Option<With<T>>` with `Has<T>`

```rust
/// old
fn my_system(query: Query<(Entity, Option<With<ComponentA>>)>)
{
  for (entity, has_a_option) in query.iter(){
    let has_a:bool = has_a_option.is_some();
    //todo!()
  }
}

/// new
fn my_system(query: Query<(Entity, Has<ComponentA>)>)
{
  for (entity, has_a) in query.iter(){
    //todo!()
  }
}
```

- Fix queries which had filters in the data position or vice versa.

```rust
// old
fn my_system(query: Query<(Entity, With<ComponentA>)>)
{
  for (entity, _) in query.iter(){
  //todo!()
  }
}

// new
fn my_system(query: Query<Entity, With<ComponentA>>)
{
  for entity in query.iter(){
  //todo!()
  }
}

// old
fn my_system(query: Query<AnyOf<(&ComponentA, With<ComponentB>)>>)
{
  for (entity, _) in query.iter(){
  //todo!()
  }
}

// new
fn my_system(query: Query<Option<&ComponentA>, Or<(With<ComponentA>, With<ComponentB>)>>)
{
  for entity in query.iter(){
  //todo!()
  }
}

```

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-11-28 03:56:07 +00:00
Helix
879893c30a
fix insert_reflect panic caused by clone_value (#10627)
# Objective

- `insert_reflect` relies on `reflect_type_path`, which doesn't gives
the actual type path for object created by `clone_value`, leading to an
unexpected panic. This is a workaround for it.
- Fix #10590 

## Solution

- Tries to get type path from `get_represented_type_info` if get failed
from `reflect_type_path`.

---

## Defect remaining

- `get_represented_type_info` implies a shortage on performance than
using `TypeRegistry`.
2023-11-28 00:17:10 +00:00
Zachary Harrold
6e871ab919
Implement Drop for CommandQueue (#10746)
# Objective

- Fixes #10676, preventing a possible memory leak for commands which
owned resources.

## Solution

Implemented `Drop` for `CommandQueue`. This has been done entirely in
the private API of `CommandQueue`, ensuring no breaking changes. Also
added a unit test, `test_command_queue_inner_drop_early`, based on the
reproduction steps as outlined in #10676.

## Notes

I believe this can be applied to `0.12.1` as well, but I am uncertain of
the process to make that kind of change. Please let me know if there's
anything I can do to help with the back-porting of this change.

---------

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2023-11-27 22:35:36 +00:00
Aldrich Suratos
1e2132672e
Copy over docs for Condition trait from PR #10718 (#10748)
# Objective

Resolves #10743.

## Solution

Copied over the documentation written by @stepancheng from PR #10718.

I left out the lines from the doctest where `<()>` is removed, as that
seemed to be the part people couldn't decide on whether to keep or not.
2023-11-27 16:33:22 +00:00
Cameron
53919c3e70
Wait until FixedUpdate can see events before dropping them (#10077)
## Objective
 
Currently, events are dropped after two frames. This cadence wasn't
*chosen* for a specific reason, double buffering just lets events
persist for at least two frames. Events only need to be dropped at a
predictable point so that the event queues don't grow forever (i.e.
events should never cause a memory leak).
 
Events (and especially input events) need to be observable by systems in
`FixedUpdate`, but as-is events are dropped before those systems even
get a chance to see them.
 
## Solution
 
Instead of unconditionally dropping events in `First`, require
`FixedUpdate` to first queue the buffer swap (if the `TimePlugin` has
been installed). This way, events are only dropped after a frame that
runs `FixedUpdate`.
 
## Future Work

In the same way we have independent copies of `Time` for tracking time
in `Main` and `FixedUpdate`, we will need independent copies of `Input`
for tracking press/release status correctly in `Main` and `FixedUpdate`.

--
 
Every run of `FixedUpdate` covers a specific timespan. For example, if
the fixed timestep `Δt` is 10ms, the first three `FixedUpdate` runs
cover `[0ms, 10ms)`, `[10ms, 20ms)`, and `[20ms, 30ms)`.
 
`FixedUpdate` can run many times in one frame. For truly
framerate-independent behavior, each `FixedUpdate` should only see the
events that occurred in its covered timespan, but what happens right now
is the first step in the frame reads all pending events.

Fixing that will require timestamped events.

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-11-26 23:04:41 +00:00
Mike
11b1b3a24f
delete methods deprecated in 0.12 (#10693)
## Changelog

- delete methods deprecated in 0.12
2023-11-24 16:15:47 +00:00
Stepan Koltsov
48af029f7b
Warn that Added/Changed filters do not see deferred changes (#10681)
Explain https://github.com/bevyengine/bevy/issues/10625.

This might be obvious to those familiar with Bevy internals, but it
surprised me.

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-11-23 14:04:07 +00:00
Stepan Koltsov
b0cc6bfd7e
Rustdoc example for Ref (#10682) 2023-11-22 20:20:39 +00:00
Mike
208ecb53dc
Append commands (#10400)
# Objective

- I've been experimenting with different patterns to try and make async
tasks more convenient. One of the better ones I've found is to return a
command queue to allow for deferred &mut World access. It can be
convenient to check for task completion in a normal system, but it is
hard to do something with the command queue after getting it back. This
pr adds a `append` to Commands. This allows appending the returned
command queue onto the system's commands.

## Solution

- I edited the async compute example to use the new `append`, but not
sure if I should keep the example changed as this might be too
opinionated.

## Future Work

- It would be very easy to pull the pattern used in the example out into
a plugin or a external crate, so users wouldn't have to add the checking
system.

---

## Changelog

- add `append` to `Commands` and `CommandQueue`
2023-11-22 00:04:37 +00:00
TheBigCheese
e67cfdf82b
Enable clippy::undocumented_unsafe_blocks warning across the workspace (#10646)
# Objective

Enables warning on `clippy::undocumented_unsafe_blocks` across the
workspace rather than only in `bevy_ecs`, `bevy_transform` and
`bevy_utils`. This adds a little awkwardness in a few areas of code that
have trivial safety or explain safety for multiple unsafe blocks with
one comment however automatically prevents these comments from being
missed.

## Solution

This adds `undocumented_unsafe_blocks = "warn"` to the workspace
`Cargo.toml` and fixes / adds a few missed safety comments. I also added
`#[allow(clippy::undocumented_unsafe_blocks)]` where the safety is
explained somewhere above.

There are a couple of safety comments I added I'm not 100% sure about in
`bevy_animation` and `bevy_render/src/view` and I'm not sure about the
use of `#[allow(clippy::undocumented_unsafe_blocks)]` compared to adding
comments like `// SAFETY: See above`.
2023-11-21 02:06:24 +00:00
Andrew Safigan
a22020bf5c
Make impl block for RemovedSystem generic (#10651)
# Objective

Make the impl block for RemovedSystem generic so that the methods can be
called for systems that have inputs or outputs.

## Solution

Simply adding generics to the impl block.
2023-11-21 01:27:29 +00:00
TheBigCheese
865041de74
Add an Entry api to EntityWorldMut. (#10650)
# Objective

Adds `.entry` to `EntityWorldMut` with `Entry`, `OccupiedEntry` and
`VacantEntry` for easier in-situ modification, based on `HashMap.entry`.

Fixes #10635

## Solution

This adds the `entry` method to `EntityWorldMut` which returns an
`Entry`. This is an enum of `OccupiedEntry` and `VacantEntry` and has
the methods `and_modify`, `insert_entry`, `or_insert`, `or_insert_with`
and `or_default`. The only difference between `OccupiedEntry` and
`VacantEntry` is the type, they are both a mutable reference to the
`EntityWorldMut` and a marker for the component type, `HashMap` also
stores things to make it quicker to access the data in `OccupiedEntry`
but I wasn't sure if we had anything it would be logical to store to
make accessing/modifying the component faster? As such, the differences
are that `OccupiedEntry` assumes the entity has the component (because
nothing else can have an `EntityWorldMut` so it can't be changed outside
the entry api) and has different methods.

All the methods are based very closely off `hashbrown::HashMap` (because
its easier to read the source of) with a couple of quirks like
`OccupiedEntry.insert` doesn't return the old value because we don't
appear to have an api for mem::replacing components.

---

## Changelog

- Added a new function `EntityWorldMut.entry` which returns an `Entry`,
allowing easier in-situ modification of a component.

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Pascal Hertleif <killercup@gmail.com>
2023-11-21 01:26:06 +00:00
Konstantin Kostiuk
eeb0c2f2e4
Allow #[derive(Bundle)] on tuple structs (take 3) (#10561)
- rework of old @Veykril's work in
[2499](https://github.com/bevyengine/bevy/pull/2499)
- Fixes [3537](https://github.com/bevyengine/bevy/issues/3537)
2023-11-21 01:09:16 +00:00
Ame
951c9bb1a2
Add [lints] table, fix adding #![allow(clippy::type_complexity)] everywhere (#10011)
# Objective

- Fix adding `#![allow(clippy::type_complexity)]` everywhere. like #9796

## Solution

- Use the new [lints] table that will land in 1.74
(https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#lints)
- inherit lint to the workspace, crates and examples.
```
[lints]
workspace = true
```

## Changelog

- Bump rust version to 1.74
- Enable lints table for the workspace
```toml
[workspace.lints.clippy]
type_complexity = "allow"
```
- Allow type complexity for all crates and examples
```toml
[lints]
workspace = true
```

---------

Co-authored-by: Martín Maita <47983254+mnmaita@users.noreply.github.com>
2023-11-18 20:58:48 +00:00
Gonçalo Rica Pais da Silva
9c0fca072d
Optimise Entity with repr align & manual PartialOrd/Ord (#10558)
# Objective

- Follow up on https://github.com/bevyengine/bevy/pull/10519, diving
deeper into optimising `Entity` due to the `derive`d `PartialOrd`
`partial_cmp` not being optimal with codegen:
https://github.com/rust-lang/rust/issues/106107
- Fixes #2346.

## Solution

Given the previous PR's solution and the other existing LLVM codegen
bug, there seemed to be a potential further optimisation possible with
`Entity`. In exploring providing manual `PartialOrd` impl, it turned out
initially that the resulting codegen was not immediately better than the
derived version. However, once `Entity` was given `#[repr(align(8)]`,
the codegen improved remarkably, even more once the fields in `Entity`
were rearranged to correspond to a `u64` layout (Rust doesn't
automatically reorder fields correctly it seems). The field order and
`align(8)` additions also improved `to_bits` codegen to be a single
`mov` op. In turn, this led me to replace the previous
"non-shortcircuiting" impl of `PartialEq::eq` to use direct `to_bits`
comparison.

The result was remarkably better codegen across the board, even for
hastable lookups.

The current baseline codegen is as follows:
https://godbolt.org/z/zTW1h8PnY

Assuming the following example struct that mirrors with the existing
`Entity` definition:

```rust
#[derive(Clone, Copy, Eq, PartialEq, PartialOrd, Ord)]
pub struct FakeU64 {
    high: u32,
    low: u32,
}
```

the output for `to_bits` is as follows:

```
example::FakeU64::to_bits:
        shl     rdi, 32
        mov     eax, esi
        or      rax, rdi
        ret
```

Changing the struct to:
```rust
#[derive(Clone, Copy, Eq)]
#[repr(align(8))]
pub struct FakeU64 {
    low: u32,
    high: u32,
}
```
and providing manual implementations for `PartialEq`/`PartialOrd`/`Ord`,
`to_bits` now optimises to:
```
example::FakeU64::to_bits:
        mov     rax, rdi
        ret
```
The full codegen example for this PR is here for reference:
https://godbolt.org/z/n4Mjx165a

To highlight, `gt` comparison goes from
```
example::greater_than:
        cmp     edi, edx
        jae     .LBB3_2
        xor     eax, eax
        ret
.LBB3_2:
        setne   dl
        cmp     esi, ecx
        seta    al
        or      al, dl
        ret
```
to
```
example::greater_than:
        cmp     rdi, rsi
        seta    al
        ret
```

As explained on Discord by @scottmcm :

>The root issue here, as far as I understand it, is that LLVM's
middle-end is inexplicably unwilling to merge loads if that would make
them under-aligned. It leaves that entirely up to its target-specific
back-end, and thus a bunch of the things that you'd expect it to do that
would fix this just don't happen.

## Benchmarks

Before discussing benchmarks, everything was tested on the following
specs:

AMD Ryzen 7950X 16C/32T CPU
64GB 5200 RAM
AMD RX7900XT 20GB Gfx card
Manjaro KDE on Wayland

I made use of the new entity hashing benchmarks to see how this PR would
improve things there. With the changes in place, I first did an
implementation keeping the existing "non shortcircuit" `PartialEq`
implementation in place, but with the alignment and field ordering
changes, which in the benchmark is the `ord_shortcircuit` column. The
`to_bits` `PartialEq` implementation is the `ord_to_bits` column. The
main_ord column is the current existing baseline from `main` branch.


![Screenshot_20231114_132908](https://github.com/bevyengine/bevy/assets/3116268/cb9090c9-ff74-4cc5-abae-8e4561332261)

My machine is not super set-up for benchmarking, so some results are
within noise, but there's not just a clear improvement between the
non-shortcircuiting implementation, but even further optimisation taking
place with the `to_bits` implementation.

On my machine, a fair number of the stress tests were not showing any
difference (indicating other bottlenecks), but I was able to get a clear
difference with `many_foxes` with a fox count of 10,000:

Test with `cargo run --example many_foxes --features
bevy/trace_tracy,wayland --release -- --count 10000`:


![Screenshot_20231114_144217](https://github.com/bevyengine/bevy/assets/3116268/89bdc21c-7209-43c8-85ae-efbf908bfed3)

On avg, a framerate of about 28-29FPS was improved to 30-32FPS. "This
trace" represents the current PR's perf, while "External trace"
represents the `main` branch baseline.

## Changelog

Changed: micro-optimized Entity align and field ordering as well as
providing manual `PartialOrd`/`Ord` impls to help LLVM optimise further.

## Migration Guide

Any `unsafe` code relying on field ordering of `Entity` or sufficiently
cursed shenanigans should change to reflect the different internal
representation and alignment requirements of `Entity`.

Co-authored-by: james7132 <contact@jamessliu.com>
Co-authored-by: NathanW <nathansward@comcast.net>
2023-11-18 20:04:37 +00:00
Stepan Koltsov
0c9f265423
Link to In in pipe documentation (#10596) 2023-11-17 15:34:58 +00:00
Stepan Koltsov
17e509748d
Some docs for IntoSystemSet (#10563)
Co-authored-by: Pascal Hertleif <killercup@gmail.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-11-16 17:44:42 +00:00
Zachary Harrold
3c689b9ca8
Update Event send methods to return EventId (#10551)
# Objective

- Fixes #10532

## Solution

I've updated the various `Event` send methods to return the sent
`EventId`(s). Since these methods previously returned nothing, and this
information is cheap to copy, there should be minimal negative
consequences to providing this additional information. In the case of
`send_batch`, an iterator is returned built from `Range` and `Map`,
which only consumes 16 bytes on the stack with no heap allocations for
all batch sizes. As such, the cost of this information is negligible.

These changes are reflected for `EventWriter` and `World`. For `World`,
the return types are optional to account for the possible lack of an
`Events` resource. Again, these methods previously returned no
information, so its inclusion should only be a benefit.

## Usage

Now when sending events, the IDs of those events is available for
immediate use:

```rust
// Example of a request-response system where the requester can track handled requests.

/// A system which can make and track requests
fn requester(
    mut requests: EventWriter<Request>,
    mut handled: EventReader<Handled>,
    mut pending: Local<HashSet<EventId<Request>>>,
) {
    // Check status of previous requests
    for Handled(id) in handled.read() {
        pending.remove(&id);
    }

    if !pending.is_empty() {
        error!("Not all my requests were handled on the previous frame!");
        pending.clear();
    }

    // Send a new request and remember its ID for later
    let request_id = requests.send(Request::MyRequest { /* ... */ });

    pending.insert(request_id);
}

/// A system which handles requests
fn responder(
    mut requests: EventReader<Request>,
    mut handled: EventWriter<Handled>,
) {
    for (request, id) in requests.read_with_id() {
        if handle(request).is_ok() {
            handled.send(Handled(id));
        }
    }
}
```

In the above example, a `requester` system can send request events, and
keep track of which ones are currently pending by `EventId`. Then, a
`responder` system can act on that event, providing the ID as a
reference that the `requester` can use. Before this PR, it was not
trivial for a system sending events to keep track of events by ID. This
is unfortunate, since for a system reading events, it is trivial to
access the ID of a event.

---

## Changelog

- Updated `Events`:
  - Added `send_batch`
  - Modified `send` to return the sent `EventId`
  - Modified `send_default` to return the sent `EventId`
- Updated `EventWriter`
  - Modified `send_batch` to return all sent `EventId`s
  - Modified `send` to return the sent `EventId`
  - Modified `send_default` to return the sent `EventId`
- Updated `World`
- Modified `send_event` to return the sent `EventId` if sent, otherwise
`None`.
- Modified `send_event_default` to return the sent `EventId` if sent,
otherwise `None`.
- Modified `send_event_batch` to return all sent `EventId`s if sent,
otherwise `None`.
- Added unit test `test_send_events_ids` to ensure returned `EventId`s
match the sent `Event`s
- Updated uses of modified methods.

## Migration Guide

### `send` / `send_default` / `send_batch`

For the following methods:

- `Events::send`
- `Events::send_default`
- `Events::send_batch`
- `EventWriter::send`
- `EventWriter::send_default`
- `EventWriter::send_batch`
- `World::send_event`
- `World::send_event_default`
- `World::send_event_batch`

Ensure calls to these methods either handle the returned value, or
suppress the result with `;`.

```rust
// Now fails to compile due to mismatched return type
fn send_my_event(mut events: EventWriter<MyEvent>) {
    events.send_default()
}

// Fix
fn send_my_event(mut events: EventWriter<MyEvent>) {
    events.send_default();
}
```

This will most likely be noticed within `match` statements:

```rust
// Before
match is_pressed {
    true => events.send(PlayerAction::Fire),
//                 ^--^ No longer returns ()
    false => {}
}

// After
match is_pressed {
    true => {
        events.send(PlayerAction::Fire);
    },
    false => {}
}
```

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Nicola Papale <nicopap@users.noreply.github.com>
2023-11-16 17:20:43 +00:00
Nathan Fenner
1f05e1e2ab
Add 'World::run_system_with_input' function + allow World::run_system to get system output (#10380)
# Objective

Allows chained systems taking an `In<_>` input parameter to be run as
one-shot systems. This API was mentioned in #8963.

In addition, `run_system(_with_input)` returns the system output, for
any `'static` output type.

## Solution

A new function, `World::run_system_with_input` allows a `SystemId<I, O>`
to be run by providing an `I` value as input and producing `O` as an
output.

`SystemId<I, O>` is now generic over the input type `I` and output type
`O`, along with the related functions and types `RegisteredSystem`,
`RemovedSystem`, `register_system`, `remove_system`, and
`RegisteredSystemError`. These default to `()`, preserving the existing
API, for all of the public types.

---

## Changelog

- Added `World::run_system_with_input` function to allow one-shot
systems that take `In<_>` input parameters
- Changed `World::run_system` and `World::register_system` to support
systems with return types beyond `()`
- Added `Commands::run_system_with_input` command that schedules a
one-shot system with an `In<_>` input parameter
2023-11-15 13:44:44 +00:00
scottmcm
713b1d8fa4
Optimize Entity::eq (#10519)
(This is my first PR here, so I've probably missed some things. Please
let me know what else I should do to help you as a reviewer!)

# Objective

Due to https://github.com/rust-lang/rust/issues/117800, the `derive`'d
`PartialEq::eq` on `Entity` isn't as good as it could be. Since that's
used in hashtable lookup, let's improve it.

## Solution

The derived `PartialEq::eq` short-circuits if the generation doesn't
match. However, having a branch there is sub-optimal, especially on
64-bit systems like x64 that could just load the whole `Entity` in one
load anyway.

Due to complications around `poison` in LLVM and the exact details of
what unsafe code is allowed to do with reference in Rust
(https://github.com/rust-lang/unsafe-code-guidelines/issues/346), LLVM
isn't allowed to completely remove the short-circuiting. `&Entity` is
marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8
bytes -- and does so -- but it has to assume that the `index` might be
undef/poison if the `generation` doesn't match, and thus while it finds
a way to do it without needing a branch, it has to do something slightly
more complicated than optimal to combine the results. (LLVM is allowed
to change non-short-circuiting code to use branches, but not the other
way around.)

Here's a link showing the codegen today:
<https://rust.godbolt.org/z/9WzjxrY7c>
```rust
#[no_mangle]
pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool {
    a == b
}
```
ends up generating the following assembly:
```asm
demo_eq_ref:
        movq    xmm0, qword ptr [rdi]
        movq    xmm1, qword ptr [rsi]
        pcmpeqd xmm1, xmm0
        pshufd  xmm0, xmm1, 80
        movmskpd        eax, xmm0
        cmp     eax, 3
        sete    al
        ret
```
(It's usually not this bad in real uses after inlining and LTO, but it
makes a strong demo.)

This PR manually implements `PartialEq::eq` *without* short-circuiting,
and because that tells LLVM that neither the generations nor the index
can be poison, it doesn't need to be so careful and can generate the
"just compare the two 64-bit values" code you'd have probably already
expected:
```asm
demo_eq_ref:
        mov     rax, qword ptr [rsi]
        cmp     qword ptr [rdi], rax
        sete    al
        ret
```

Since this doesn't change the representation of `Entity`, if it's
instead passed by *value*, then each `Entity` is two `u32` registers,
and the old and the new code do exactly the same thing. (Other
approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect
this case.)

This should hopefully merge easily with changes like
https://github.com/bevyengine/bevy/pull/9907 that also want to change
`Entity`.

## Benchmarks

I'm not super-confident that I got my machine fully consistent for
benchmarking, but whether I run the old or the new one first I get
reasonably consistent results.

Here's a fairly typical example of the benchmarks I added in this PR:

![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1)

Building the sets seems to be basically the same. It's usually reported
as noise, but sometimes I see a few percent slower or faster.

But lookup hits in particular -- since a hit checks that the key is
equal -- consistently shows around 10% improvement.

`cargo run --example many_cubes --features bevy/trace_tracy --release --
--benchmark` showed as slightly faster with this change, though if I had
to bet I'd probably say it's more noise than meaningful (but at least
it's not worse either):

![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0)

This is my first PR here -- and my first time running Tracy -- so please
let me know what else I should run, or run things on your own more
reliable machines to double-check.

---

## Changelog

(probably not worth including)

Changed: micro-optimized `Entity::eq` to help LLVM slightly.

## Migration Guide

(I really hope nobody was using this on uninitialized entities where
sufficiently tortured `unsafe` could could technically notice that this
has changed.)
2023-11-14 02:06:21 +00:00
Giacomo Stevanato
e75c2f8b16
Remove a ptr-to-int cast in CommandQueue::apply (#10475)
# Objective

- `CommandQueue::apply` calculates the address of the end of the
internal buffer as a `usize` rather than as a pointer, requiring two
casts of `cursor` to `usize`. Casting pointers to integers is generally
discouraged and may also prevent optimizations. It's also unnecessary
here.

## Solution

- Calculate the end address as a pointer rather than a `usize`.

Small note:

A trivial translation of the old code to use pointers would have
computed `end_addr` as `cursor.add(self.bytes.len())`, which is not
wrong but is an additional `unsafe` operation that also needs to be
properly documented and proven correct. However this operation is
already implemented in the form of the safe `as_mut_ptr_range`, so I
just used that.
2023-11-09 19:32:33 +00:00