bevy/crates/bevy_ecs/src/bundle.rs

930 lines
37 KiB
Rust
Raw Normal View History

//! Types for handling [`Bundle`]s.
//!
//! This module contains the [`Bundle`] trait and some other helper types.
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
pub use bevy_ecs_macros::Bundle;
use bevy_utils::{HashMap, HashSet, TypeIdMap};
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
use crate::{
archetype::{
Extend EntityLocation with TableId and TableRow (#6681) # Objective `Query::get` and other random access methods require looking up `EntityLocation` for every provided entity, then always looking up the `Archetype` to get the table ID and table row. This requires 4 total random fetches from memory: the `Entities` lookup, the `Archetype` lookup, the table row lookup, and the final fetch from table/sparse sets. If `EntityLocation` contains the table ID and table row, only the `Entities` lookup and the final storage fetch are required. ## Solution Add `TableId` and table row to `EntityLocation`. Ensure it's updated whenever entities are moved around. To ensure `EntityMeta` does not grow bigger, both `TableId` and `ArchetypeId` have been shrunk to u32, and the archetype index and table row are stored as u32s instead of as usizes. This should shrink `EntityMeta` by 4 bytes, from 24 to 20 bytes, as there is no padding anymore due to the change in alignment. This idea was partially concocted by @BoxyUwU. ## Performance This should restore the `Query::get` "gains" lost to #6625 that were introduced in #4800 without being unsound, and also incorporates some of the memory usage reductions seen in #3678. This also removes the same lookups during add/remove/spawn commands, so there may be a bit of a speedup in commands and `Entity{Ref,Mut}`. --- ## Changelog Added: `EntityLocation::table_id` Added: `EntityLocation::table_row`. Changed: `World`s can now only hold a maximum of 2<sup>32</sup>- 1 archetypes. Changed: `World`s can now only hold a maximum of 2<sup>32</sup> - 1 tables. ## Migration Guide A `World` can only hold a maximum of 2<sup>32</sup> - 1 archetypes and tables now. If your use case requires more than this, please file an issue explaining your use case.
2023-01-02 21:25:04 +00:00
Archetype, ArchetypeId, Archetypes, BundleComponentStatus, ComponentStatus,
SpawnBundleStatus,
},
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
component::{Component, ComponentId, ComponentStorage, Components, StorageType, Tick},
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
entity::{Entities, Entity, EntityLocation},
query::DebugCheckedUnwrap,
storage::{SparseSetIndex, SparseSets, Storages, Table, TableRow},
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
};
use bevy_ptr::OwningPtr;
use bevy_utils::all_tuples;
use bevy_utils::HashMap for better performance. TypeId is predefined … (#7642) …u64, so hash safety is not a concern # Objective - While reading the code, just noticed the BundleInfo's HashMap is std::collections::HashMap, which uses a slow but safe hasher. ## Solution - Use bevy_utils::HashMap instead benchmark diff (I run several times in a linux box, the perf improvement is consistent, though numbers varies from time to time, I paste my last run result here): ``` bash cargo bench -- spawn Compiling bevy_ecs v0.9.0 (/home/lishuo/developer/pr/bevy/crates/bevy_ecs) Compiling bevy_app v0.9.0 (/home/lishuo/developer/pr/bevy/crates/bevy_app) Compiling benches v0.1.0 (/home/lishuo/developer/pr/bevy/benches) Finished bench [optimized] target(s) in 1m 17s Running benches/bevy_ecs/change_detection.rs (/home/lishuo/developer/pr/bevy/benches/target/release/deps/change_detection-86c5445d0dc34529) Gnuplot not found, using plotters backend Running benches/bevy_ecs/benches.rs (/home/lishuo/developer/pr/bevy/benches/target/release/deps/ecs-e49b3abe80bfd8c0) Gnuplot not found, using plotters backend spawn_commands/2000_entities time: [153.94 µs 159.19 µs 164.37 µs] change: [-14.706% -11.050% -6.9633%] (p = 0.00 < 0.05) Performance has improved. spawn_commands/4000_entities time: [328.77 µs 339.11 µs 349.11 µs] change: [-7.6331% -3.9932% +0.0487%] (p = 0.06 > 0.05) No change in performance detected. spawn_commands/6000_entities time: [445.01 µs 461.29 µs 477.36 µs] change: [-16.639% -13.358% -10.006%] (p = 0.00 < 0.05) Performance has improved. spawn_commands/8000_entities time: [657.94 µs 677.71 µs 696.95 µs] change: [-8.8708% -5.2591% -1.6847%] (p = 0.01 < 0.05) Performance has improved. get_or_spawn/individual time: [452.02 µs 466.70 µs 482.07 µs] change: [-17.218% -14.041% -10.728%] (p = 0.00 < 0.05) Performance has improved. get_or_spawn/batched time: [291.12 µs 301.12 µs 311.31 µs] change: [-12.281% -8.9163% -5.3660%] (p = 0.00 < 0.05) Performance has improved. spawn_world/1_entities time: [81.668 ns 84.284 ns 86.860 ns] change: [-12.251% -6.7872% -1.5402%] (p = 0.02 < 0.05) Performance has improved. spawn_world/10_entities time: [789.78 ns 821.96 ns 851.95 ns] change: [-19.738% -14.186% -8.0733%] (p = 0.00 < 0.05) Performance has improved. spawn_world/100_entities time: [7.9906 µs 8.2449 µs 8.5013 µs] change: [-12.417% -6.6837% -0.8766%] (p = 0.02 < 0.05) Change within noise threshold. spawn_world/1000_entities time: [81.602 µs 84.161 µs 86.833 µs] change: [-13.656% -8.6520% -3.0491%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking spawn_world/10000_entities: Warming up for 500.00 ms Warning: Unable to complete 100 samples in 4.0s. You may wish to increase target time to 4.0s, enable flat sampling, or reduce sample count to 70. spawn_world/10000_entities time: [813.02 µs 839.76 µs 865.41 µs] change: [-12.133% -6.1970% -0.2302%] (p = 0.05 < 0.05) Change within noise threshold. ``` --- ## Changelog > This section is optional. If this was a trivial fix, or has no externally-visible impact, you can delete this section. - use bevy_utils::HashMap for Bundles::bundle_ids ## Migration Guide > This section is optional. If there are no breaking changes, you can delete this section. - Not a breaking change, hashmap is internal impl.
2023-02-15 04:19:26 +00:00
use std::any::TypeId;
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
/// The `Bundle` trait enables insertion and removal of [`Component`]s from an entity.
///
/// Implementors of the `Bundle` trait are called 'bundles'.
///
/// Each bundle represents a static set of [`Component`] types.
/// Currently, bundles can only contain one of each [`Component`], and will
/// panic once initialised if this is not met.
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
///
/// ## Insertion
///
/// The primary use for bundles is to add a useful collection of components to an entity.
///
/// Adding a value of bundle to an entity will add the components from the set it
/// represents to the entity.
/// The values of these components are taken from the bundle.
/// If an entity already had one of these components, the entity's original component value
/// will be overwritten.
///
/// Importantly, bundles are only their constituent set of components.
/// You **should not** use bundles as a unit of behavior.
/// The behavior of your app can only be considered in terms of components, as systems,
/// which drive the behavior of a `bevy` application, operate on combinations of
/// components.
///
/// This rule is also important because multiple bundles may contain the same component type,
/// calculated in different ways &mdash; adding both of these bundles to one entity
/// would create incoherent behavior.
/// This would be unexpected if bundles were treated as an abstraction boundary, as
/// the abstraction would be unmaintainable for these cases.
/// For example, both `Camera3dBundle` and `Camera2dBundle` contain the `CameraRenderGraph`
/// component, but specifying different render graphs to use.
/// If the bundles were both added to the same entity, only one of these two bundles would work.
///
/// For this reason, there is intentionally no [`Query`] to match whether an entity
/// contains the components of a bundle.
/// Queries should instead only select the components they logically operate on.
///
/// ## Removal
///
/// Bundles are also used when removing components from an entity.
///
/// Removing a bundle from an entity will remove any of its components attached
/// to the entity from the entity.
/// That is, if the entity does not have all the components of the bundle, those
/// which are present will be removed.
///
/// # Implementors
///
/// Every type which implements [`Component`] also implements `Bundle`, since
/// [`Component`] types can be added to or removed from an entity.
///
/// Additionally, [Tuples](`tuple`) of bundles are also [`Bundle`] (with up to 15 bundles).
/// These bundles contain the items of the 'inner' bundles.
/// This is a convenient shorthand which is primarily used when spawning entities.
/// For example, spawning an entity using the bundle `(SpriteBundle {...}, PlayerMarker)`
/// will spawn an entity with components required for a 2d sprite, and the `PlayerMarker` component.
///
/// [`unit`], otherwise known as [`()`](`unit`), is a [`Bundle`] containing no components (since it
/// can also be considered as the empty tuple).
/// This can be useful for spawning large numbers of empty entities using
/// [`World::spawn_batch`](crate::world::World::spawn_batch).
///
/// Tuple bundles can be nested, which can be used to create an anonymous bundle with more than
/// 15 items.
/// However, in most cases where this is required, the derive macro [`derive@Bundle`] should be
/// used instead.
/// The derived `Bundle` implementation contains the items of its fields, which all must
/// implement `Bundle`.
/// As explained above, this includes any [`Component`] type, and other derived bundles.
///
/// If you want to add `PhantomData` to your `Bundle` you have to mark it with `#[bundle(ignore)]`.
/// ```
/// # use std::marker::PhantomData;
/// use bevy_ecs::{component::Component, bundle::Bundle};
///
/// #[derive(Component)]
/// struct XPosition(i32);
/// #[derive(Component)]
/// struct YPosition(i32);
///
/// #[derive(Bundle)]
/// struct PositionBundle {
/// // A bundle can contain components
/// x: XPosition,
/// y: YPosition,
/// }
///
/// // You have to implement `Default` for ignored field types in bundle structs.
/// #[derive(Default)]
/// struct Other(f32);
///
/// #[derive(Bundle)]
/// struct NamedPointBundle<T: Send + Sync + 'static> {
/// // Or other bundles
/// a: PositionBundle,
/// // In addition to more components
/// z: PointName,
///
/// // when you need to use `PhantomData` you have to mark it as ignored
/// #[bundle(ignore)]
/// _phantom_data: PhantomData<T>
2021-04-16 19:13:08 +00:00
/// }
///
/// #[derive(Component)]
/// struct PointName(String);
/// ```
///
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
/// # Safety
///
/// Manual implementations of this trait are unsupported.
/// That is, there is no safe way to implement this trait, and you must not do so.
/// If you want a type to implement [`Bundle`], you must use [`derive@Bundle`](derive@Bundle).
///
/// [`Query`]: crate::system::Query
// Some safety points:
// - [`Bundle::component_ids`] must return the [`ComponentId`] for each component type in the
// bundle, in the _exact_ order that [`DynamicBundle::get_components`] is called.
// - [`Bundle::from_components`] must call `func` exactly once for each [`ComponentId`] returned by
// [`Bundle::component_ids`].
pub unsafe trait Bundle: DynamicBundle + Send + Sync + 'static {
/// Gets this [`Bundle`]'s component ids, in the order of this bundle's [`Component`]s
#[doc(hidden)]
fn component_ids(
components: &mut Components,
storages: &mut Storages,
ids: &mut impl FnMut(ComponentId),
);
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
/// Calls `func`, which should return data for each component in the bundle, in the order of
/// this bundle's [`Component`]s
///
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
/// # Safety
/// Caller must return data for each component in the bundle, in the order of this bundle's
/// [`Component`]s
#[doc(hidden)]
unsafe fn from_components<T, F>(ctx: &mut T, func: &mut F) -> Self
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
where
// Ensure that the `OwningPtr` is used correctly
F: for<'a> FnMut(&'a mut T) -> OwningPtr<'a>,
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
Self: Sized;
}
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
/// The parts from [`Bundle`] that don't require statically knowing the components of the bundle.
pub trait DynamicBundle {
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
// SAFETY:
// The `StorageType` argument passed into [`Bundle::get_components`] must be correct for the
// component being fetched.
//
/// Calls `func` on each value, in the order of this bundle's [`Component`]s. This passes
/// ownership of the component values to `func`.
#[doc(hidden)]
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
fn get_components(self, func: &mut impl FnMut(StorageType, OwningPtr<'_>));
}
// SAFETY:
// - `Bundle::component_ids` calls `ids` for C's component id (and nothing else)
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
// - `Bundle::get_components` is called exactly once for C and passes the component's storage type based on it's associated constant.
// - `Bundle::from_components` calls `func` exactly once for C, which is the exact value returned by `Bundle::component_ids`.
unsafe impl<C: Component> Bundle for C {
fn component_ids(
components: &mut Components,
storages: &mut Storages,
ids: &mut impl FnMut(ComponentId),
) {
ids(components.init_component::<C>(storages));
}
unsafe fn from_components<T, F>(ctx: &mut T, func: &mut F) -> Self
where
// Ensure that the `OwningPtr` is used correctly
F: for<'a> FnMut(&'a mut T) -> OwningPtr<'a>,
Self: Sized,
{
// Safety: The id given in `component_ids` is for `Self`
func(ctx).read()
}
}
impl<C: Component> DynamicBundle for C {
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
#[inline]
fn get_components(self, func: &mut impl FnMut(StorageType, OwningPtr<'_>)) {
OwningPtr::make(self, |ptr| func(C::Storage::STORAGE_TYPE, ptr));
}
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
macro_rules! tuple_impl {
($($name: ident),*) => {
add more `SAFETY` comments and lint for missing ones in `bevy_ecs` (#4835) # Objective `SAFETY` comments are meant to be placed before `unsafe` blocks and should contain the reasoning of why in this case the usage of unsafe is okay. This is useful when reading the code because it makes it clear which assumptions are required for safety, and makes it easier to spot possible unsoundness holes. It also forces the code writer to think of something to write and maybe look at the safety contracts of any called unsafe methods again to double-check their correct usage. There's a clippy lint called `undocumented_unsafe_blocks` which warns when using a block without such a comment. ## Solution - since clippy expects `SAFETY` instead of `SAFE`, rename those - add `SAFETY` comments in more places - for the last remaining 3 places, add an `#[allow()]` and `// TODO` since I wasn't comfortable enough with the code to justify their safety - add ` #![warn(clippy::undocumented_unsafe_blocks)]` to `bevy_ecs` ### Note for reviewers The first commit only renames `SAFETY` to `SAFE` so it doesn't need a thorough review. https://github.com/bevyengine/bevy/pull/4835/files/cb042a416ecbe5e7d74797449969e064d8a5f13c..55cef2d6fa3aa634667a60f6d5abc16f43f16298 is the diff for all other changes. ### Safety comments where I'm not too familiar with the code https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/entity/mod.rs#L540-L546 https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/world/entity_ref.rs#L249-L252 ### Locations left undocumented with a `TODO` comment https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/schedule/executor_parallel.rs#L196-L199 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L287-L289 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L413-L415 Co-authored-by: Jakob Hellermann <hellermann@sipgate.de>
2022-07-04 14:44:24 +00:00
// SAFETY:
// - `Bundle::component_ids` calls `ids` for each component type in the
// bundle, in the exact order that `DynamicBundle::get_components` is called.
add more `SAFETY` comments and lint for missing ones in `bevy_ecs` (#4835) # Objective `SAFETY` comments are meant to be placed before `unsafe` blocks and should contain the reasoning of why in this case the usage of unsafe is okay. This is useful when reading the code because it makes it clear which assumptions are required for safety, and makes it easier to spot possible unsoundness holes. It also forces the code writer to think of something to write and maybe look at the safety contracts of any called unsafe methods again to double-check their correct usage. There's a clippy lint called `undocumented_unsafe_blocks` which warns when using a block without such a comment. ## Solution - since clippy expects `SAFETY` instead of `SAFE`, rename those - add `SAFETY` comments in more places - for the last remaining 3 places, add an `#[allow()]` and `// TODO` since I wasn't comfortable enough with the code to justify their safety - add ` #![warn(clippy::undocumented_unsafe_blocks)]` to `bevy_ecs` ### Note for reviewers The first commit only renames `SAFETY` to `SAFE` so it doesn't need a thorough review. https://github.com/bevyengine/bevy/pull/4835/files/cb042a416ecbe5e7d74797449969e064d8a5f13c..55cef2d6fa3aa634667a60f6d5abc16f43f16298 is the diff for all other changes. ### Safety comments where I'm not too familiar with the code https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/entity/mod.rs#L540-L546 https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/world/entity_ref.rs#L249-L252 ### Locations left undocumented with a `TODO` comment https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/schedule/executor_parallel.rs#L196-L199 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L287-L289 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L413-L415 Co-authored-by: Jakob Hellermann <hellermann@sipgate.de>
2022-07-04 14:44:24 +00:00
// - `Bundle::from_components` calls `func` exactly once for each `ComponentId` returned by `Bundle::component_ids`.
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
// - `Bundle::get_components` is called exactly once for each member. Relies on the above implementation to pass the correct
// `StorageType` into the callback.
unsafe impl<$($name: Bundle),*> Bundle for ($($name,)*) {
#[allow(unused_variables)]
fn component_ids(components: &mut Components, storages: &mut Storages, ids: &mut impl FnMut(ComponentId)){
$(<$name as Bundle>::component_ids(components, storages, ids);)*
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
#[allow(unused_variables, unused_mut)]
#[allow(clippy::unused_unit)]
unsafe fn from_components<T, F>(ctx: &mut T, func: &mut F) -> Self
Use lifetimed, type erased pointers in bevy_ecs (#3001) # Objective `bevy_ecs` has large amounts of unsafe code which is hard to get right and makes it difficult to audit for soundness. ## Solution Introduce lifetimed, type-erased pointers: `Ptr<'a>` `PtrMut<'a>` `OwningPtr<'a>'` and `ThinSlicePtr<'a, T>` which are newtypes around a raw pointer with a lifetime and conceptually representing strong invariants about the pointee and validity of the pointer. The process of converting bevy_ecs to use these has already caught multiple cases of unsound behavior. ## Changelog TL;DR for release notes: `bevy_ecs` now uses lifetimed, type-erased pointers internally, significantly improving safety and legibility without sacrificing performance. This should have approximately no end user impact, unless you were meddling with the (unfortunately public) internals of `bevy_ecs`. - `Fetch`, `FilterFetch` and `ReadOnlyFetch` trait no longer have a `'state` lifetime - this was unneeded - `ReadOnly/Fetch` associated types on `WorldQuery` are now on a new `WorldQueryGats<'world>` trait - was required to work around lack of Generic Associated Types (we wish to express `type Fetch<'a>: Fetch<'a>`) - `derive(WorldQuery)` no longer requires `'w` lifetime on struct - this was unneeded, and improves the end user experience - `EntityMut::get_unchecked_mut` returns `&'_ mut T` not `&'w mut T` - allows easier use of unsafe API with less footguns, and can be worked around via lifetime transmutery as a user - `Bundle::from_components` now takes a `ctx` parameter to pass to the `FnMut` closure - required because closure return types can't borrow from captures - `Fetch::init` takes `&'world World`, `Fetch::set_archetype` takes `&'world Archetype` and `&'world Tables`, `Fetch::set_table` takes `&'world Table` - allows types implementing `Fetch` to store borrows into world - `WorldQuery` trait now has a `shrink` fn to shorten the lifetime in `Fetch::<'a>::Item` - this works around lack of subtyping of assoc types, rust doesnt allow you to turn `<T as Fetch<'static>>::Item'` into `<T as Fetch<'a>>::Item'` - `QueryCombinationsIter` requires this - Most types implementing `Fetch` now have a lifetime `'w` - allows the fetches to store borrows of world data instead of using raw pointers ## Migration guide - `EntityMut::get_unchecked_mut` returns a more restricted lifetime, there is no general way to migrate this as it depends on your code - `Bundle::from_components` implementations must pass the `ctx` arg to `func` - `Bundle::from_components` callers have to use a fn arg instead of closure captures for borrowing from world - Remove lifetime args on `derive(WorldQuery)` structs as it is nonsensical - `<Q as WorldQuery>::ReadOnly/Fetch` should be changed to either `RO/QueryFetch<'world>` or `<Q as WorldQueryGats<'world>>::ReadOnly/Fetch` - `<F as Fetch<'w, 's>>` should be changed to `<F as Fetch<'w>>` - Change the fn sigs of `Fetch::init/set_archetype/set_table` to match respective trait fn sigs - Implement the required `fn shrink` on any `WorldQuery` implementations - Move assoc types `Fetch` and `ReadOnlyFetch` on `WorldQuery` impls to `WorldQueryGats` impls - Pass an appropriate `'world` lifetime to whatever fetch struct you are for some reason using ### Type inference regression in some cases rustc may give spurrious errors when attempting to infer the `F` parameter on a query/querystate this can be fixed by manually specifying the type, i.e. `QueryState::new::<_, ()>(world)`. The error is rather confusing: ```rust= error[E0271]: type mismatch resolving `<() as Fetch<'_>>::Item == bool` --> crates/bevy_pbr/src/render/light.rs:1413:30 | 1413 | main_view_query: QueryState::new(world), | ^^^^^^^^^^^^^^^ expected `bool`, found `()` | = note: required because of the requirements on the impl of `for<'x> FilterFetch<'x>` for `<() as WorldQueryGats<'x>>::Fetch` note: required by a bound in `bevy_ecs::query::QueryState::<Q, F>::new` --> crates/bevy_ecs/src/query/state.rs:49:32 | 49 | for<'x> QueryFetch<'x, F>: FilterFetch<'x>, | ^^^^^^^^^^^^^^^ required by this bound in `bevy_ecs::query::QueryState::<Q, F>::new` ``` --- Made with help from @BoxyUwU and @alice-i-cecile Co-authored-by: Boxy <supbscripter@gmail.com>
2022-04-27 23:44:06 +00:00
where
F: FnMut(&mut T) -> OwningPtr<'_>
{
// Rust guarantees that tuple calls are evaluated 'left to right'.
// https://doc.rust-lang.org/reference/expressions.html#evaluation-order-of-operands
($(<$name as Bundle>::from_components(ctx, func),)*)
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
}
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
impl<$($name: Bundle),*> DynamicBundle for ($($name,)*) {
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[allow(unused_variables, unused_mut)]
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
#[inline(always)]
fn get_components(self, func: &mut impl FnMut(StorageType, OwningPtr<'_>)) {
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[allow(non_snake_case)]
let ($(mut $name,)*) = self;
$(
$name.get_components(&mut *func);
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
)*
}
}
}
}
all_tuples!(tuple_impl, 0, 15, B);
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
/// For a specific [`World`], this stores a unique value identifying a type of a registered [`Bundle`].
///
/// [`World`]: crate::world::World
#[derive(Debug, Clone, Copy, Eq, PartialEq, Hash)]
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
pub struct BundleId(usize);
impl BundleId {
/// Returns the index of the associated [`Bundle`] type.
///
/// Note that this is unique per-world, and should not be reused across them.
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[inline]
pub fn index(self) -> usize {
self.0
}
}
impl SparseSetIndex for BundleId {
#[inline]
fn sparse_set_index(&self) -> usize {
self.index()
}
#[inline]
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
fn get_sparse_set_index(value: usize) -> Self {
Self(value)
}
}
/// Stores metadata associated with a specific type of [`Bundle`] for a given [`World`].
///
/// [`World`]: crate::world::World
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
pub struct BundleInfo {
id: BundleId,
// SAFETY: Every ID in this list must be valid within the World that owns the BundleInfo,
// must have its storage initialized (i.e. columns created in tables, sparse set created),
// and must be in the same order as the source bundle type writes its components in.
component_ids: Vec<ComponentId>,
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
impl BundleInfo {
/// Create a new [`BundleInfo`].
///
/// # Safety
///
/// Every ID in `component_ids` must be valid within the World that owns the `BundleInfo`,
/// must have its storage initialized (i.e. columns created in tables, sparse set created),
/// and must be in the same order as the source bundle type writes its components in.
unsafe fn new(
bundle_type_name: &'static str,
components: &Components,
component_ids: Vec<ComponentId>,
id: BundleId,
) -> BundleInfo {
let mut deduped = component_ids.clone();
deduped.sort();
deduped.dedup();
if deduped.len() != component_ids.len() {
// TODO: Replace with `Vec::partition_dedup` once https://github.com/rust-lang/rust/issues/54279 is stabilized
let mut seen = HashSet::new();
let mut dups = Vec::new();
for id in component_ids {
if !seen.insert(id) {
dups.push(id);
}
}
let names = dups
.into_iter()
.map(|id| {
// SAFETY: the caller ensures component_id is valid.
unsafe { components.get_info_unchecked(id).name() }
})
.collect::<Vec<_>>()
.join(", ");
panic!("Bundle {bundle_type_name} has duplicate components: {names}");
}
// SAFETY: The caller ensures that component_ids:
// - is valid for the associated world
// - has had its storage initialized
// - is in the same order as the source bundle type
BundleInfo { id, component_ids }
}
/// Returns a value identifying the associated [`Bundle`] type.
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
#[inline]
pub const fn id(&self) -> BundleId {
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
self.id
}
/// Returns the [ID](ComponentId) of each component stored in this bundle.
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
#[inline]
pub fn components(&self) -> &[ComponentId] {
&self.component_ids
}
pub(crate) fn get_bundle_inserter<'a, 'b>(
&'b self,
entities: &'a mut Entities,
archetypes: &'a mut Archetypes,
components: &Components,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
storages: &'a mut Storages,
archetype_id: ArchetypeId,
change_tick: Tick,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
) -> BundleInserter<'a, 'b> {
let new_archetype_id =
self.add_bundle_to_archetype(archetypes, storages, components, archetype_id);
let archetypes_ptr = archetypes.archetypes.as_mut_ptr();
if new_archetype_id == archetype_id {
let archetype = &mut archetypes[archetype_id];
let table_id = archetype.table_id();
BundleInserter {
bundle_info: self,
archetype,
entities,
sparse_sets: &mut storages.sparse_sets,
table: &mut storages.tables[table_id],
archetypes_ptr,
change_tick,
result: InsertBundleResult::SameArchetype,
}
} else {
let (archetype, new_archetype) = archetypes.get_2_mut(archetype_id, new_archetype_id);
let table_id = archetype.table_id();
if table_id == new_archetype.table_id() {
BundleInserter {
bundle_info: self,
archetype,
archetypes_ptr,
entities,
sparse_sets: &mut storages.sparse_sets,
table: &mut storages.tables[table_id],
change_tick,
result: InsertBundleResult::NewArchetypeSameTable { new_archetype },
}
} else {
let (table, new_table) = storages
.tables
.get_2_mut(table_id, new_archetype.table_id());
BundleInserter {
bundle_info: self,
archetype,
sparse_sets: &mut storages.sparse_sets,
entities,
archetypes_ptr,
table,
change_tick,
result: InsertBundleResult::NewArchetypeNewTable {
new_archetype,
new_table,
},
}
}
}
}
pub(crate) fn get_bundle_spawner<'a, 'b>(
&'b self,
entities: &'a mut Entities,
archetypes: &'a mut Archetypes,
components: &Components,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
storages: &'a mut Storages,
change_tick: Tick,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
) -> BundleSpawner<'a, 'b> {
let new_archetype_id =
self.add_bundle_to_archetype(archetypes, storages, components, ArchetypeId::EMPTY);
let archetype = &mut archetypes[new_archetype_id];
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
let table = &mut storages.tables[archetype.table_id()];
BundleSpawner {
archetype,
bundle_info: self,
table,
entities,
sparse_sets: &mut storages.sparse_sets,
change_tick,
}
}
/// This writes components from a given [`Bundle`] to the given entity.
///
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
/// # Safety
///
/// `bundle_component_status` must return the "correct" [`ComponentStatus`] for each component
/// in the [`Bundle`], with respect to the entity's original archetype (prior to the bundle being added)
/// For example, if the original archetype already has `ComponentA` and `T` also has `ComponentA`, the status
/// should be `Mutated`. If the original archetype does not have `ComponentA`, the status should be `Added`.
/// When "inserting" a bundle into an existing entity, [`AddBundle`](crate::archetype::AddBundle)
/// should be used, which will report `Added` vs `Mutated` status based on the current archetype's structure.
/// When spawning a bundle, [`SpawnBundleStatus`] can be used instead, which removes the need
/// to look up the [`AddBundle`](crate::archetype::AddBundle) in the archetype graph, which requires
/// ownership of the entity's current archetype.
///
/// `table` must be the "new" table for `entity`. `table_row` must have space allocated for the
/// `entity`, `bundle` must match this [`BundleInfo`]'s type
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[inline]
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
#[allow(clippy::too_many_arguments)]
unsafe fn write_components<T: DynamicBundle, S: BundleComponentStatus>(
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
&self,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
table: &mut Table,
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
sparse_sets: &mut SparseSets,
bundle_component_status: &S,
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
entity: Entity,
table_row: TableRow,
change_tick: Tick,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
bundle: T,
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
) {
// NOTE: get_components calls this closure on each component in "bundle order".
// bundle_info.component_ids are also in "bundle order"
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
let mut bundle_component = 0;
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
bundle.get_components(&mut |storage_type, component_ptr| {
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
let component_id = *self.component_ids.get_unchecked(bundle_component);
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
match storage_type {
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
StorageType::Table => {
let column =
// SAFETY: If component_id is in self.component_ids, BundleInfo::new requires that
// the target table contains the component.
unsafe { table.get_column_mut(component_id).debug_checked_unwrap() };
// SAFETY: bundle_component is a valid index for this bundle
match bundle_component_status.get_status(bundle_component) {
Reliable change detection (#1471) # Problem Definition The current change tracking (via flags for both components and resources) fails to detect changes made by systems that are scheduled to run earlier in the frame than they are. This issue is discussed at length in [#68](https://github.com/bevyengine/bevy/issues/68) and [#54](https://github.com/bevyengine/bevy/issues/54). This is very much a draft PR, and contributions are welcome and needed. # Criteria 1. Each change is detected at least once, no matter the ordering. 2. Each change is detected at most once, no matter the ordering. 3. Changes should be detected the same frame that they are made. 4. Competitive ergonomics. Ideally does not require opting-in. 5. Low CPU overhead of computation. 6. Memory efficient. This must not increase over time, except where the number of entities / resources does. 7. Changes should not be lost for systems that don't run. 8. A frame needs to act as a pure function. Given the same set of entities / components it needs to produce the same end state without side-effects. **Exact** change-tracking proposals satisfy criteria 1 and 2. **Conservative** change-tracking proposals satisfy criteria 1 but not 2. **Flaky** change tracking proposals satisfy criteria 2 but not 1. # Code Base Navigation There are three types of flags: - `Added`: A piece of data was added to an entity / `Resources`. - `Mutated`: A piece of data was able to be modified, because its `DerefMut` was accessed - `Changed`: The bitwise OR of `Added` and `Changed` The special behavior of `ChangedRes`, with respect to the scheduler is being removed in [#1313](https://github.com/bevyengine/bevy/pull/1313) and does not need to be reproduced. `ChangedRes` and friends can be found in "bevy_ecs/core/resources/resource_query.rs". The `Flags` trait for Components can be found in "bevy_ecs/core/query.rs". `ComponentFlags` are stored in "bevy_ecs/core/archetypes.rs", defined on line 446. # Proposals **Proposal 5 was selected for implementation.** ## Proposal 0: No Change Detection The baseline, where computations are performed on everything regardless of whether it changed. **Type:** Conservative **Pros:** - already implemented - will never miss events - no overhead **Cons:** - tons of repeated work - doesn't allow users to avoid repeating work (or monitoring for other changes) ## Proposal 1: Earlier-This-Tick Change Detection The current approach as of Bevy 0.4. Flags are set, and then flushed at the end of each frame. **Type:** Flaky **Pros:** - already implemented - simple to understand - low memory overhead (2 bits per component) - low time overhead (clear every flag once per frame) **Cons:** - misses systems based on ordering - systems that don't run every frame miss changes - duplicates detection when looping - can lead to unresolvable circular dependencies ## Proposal 2: Two-Tick Change Detection Flags persist for two frames, using a double-buffer system identical to that used in events. A change is observed if it is found in either the current frame's list of changes or the previous frame's. **Type:** Conservative **Pros:** - easy to understand - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - can result in a great deal of duplicated work - systems that don't run every frame miss changes - duplicates detection when looping ## Proposal 3: Last-Tick Change Detection Flags persist for two frames, using a double-buffer system identical to that used in events. A change is observed if it is found in the previous frame's list of changes. **Type:** Exact **Pros:** - exact - easy to understand - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - change detection is always delayed, possibly causing painful chained delays - systems that don't run every frame miss changes - duplicates detection when looping ## Proposal 4: Flag-Doubling Change Detection Combine Proposal 2 and Proposal 3. Differentiate between `JustChanged` (current behavior) and `Changed` (Proposal 3). Pack this data into the flags according to [this implementation proposal](https://github.com/bevyengine/bevy/issues/68#issuecomment-769174804). **Type:** Flaky + Exact **Pros:** - allows users to acc - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - users must specify the type of change detection required - still quite fragile to system ordering effects when using the flaky `JustChanged` form - cannot get immediate + exact results - systems that don't run every frame miss changes - duplicates detection when looping ## [SELECTED] Proposal 5: Generation-Counter Change Detection A global counter is increased after each system is run. Each component saves the time of last mutation, and each system saves the time of last execution. Mutation is detected when the component's counter is greater than the system's counter. Discussed [here](https://github.com/bevyengine/bevy/issues/68#issuecomment-769174804). How to handle addition detection is unsolved; the current proposal is to use the highest bit of the counter as in proposal 1. **Type:** Exact (for mutations), flaky (for additions) **Pros:** - low time overhead (set component counter on access, set system counter after execution) - robust to systems that don't run every frame - robust to systems that loop **Cons:** - moderately complex implementation - must be modified as systems are inserted dynamically - medium memory overhead (4 bytes per component + system) - unsolved addition detection ## Proposal 6: System-Data Change Detection For each system, track which system's changes it has seen. This approach is only worth fully designing and implementing if Proposal 5 fails in some way. **Type:** Exact **Pros:** - exact - conceptually simple **Cons:** - requires storing data on each system - implementation is complex - must be modified as systems are inserted dynamically ## Proposal 7: Total-Order Change Detection Discussed [here](https://github.com/bevyengine/bevy/issues/68#issuecomment-754326523). This proposal is somewhat complicated by the new scheduler, but I believe it should still be conceptually feasible. This approach is only worth fully designing and implementing if Proposal 5 fails in some way. **Type:** Exact **Pros:** - exact - efficient data storage relative to other exact proposals **Cons:** - requires access to the scheduler - complex implementation and difficulty grokking - must be modified as systems are inserted dynamically # Tests - We will need to verify properties 1, 2, 3, 7 and 8. Priority: 1 > 2 = 3 > 8 > 7 - Ideally we can use identical user-facing syntax for all proposals, allowing us to re-use the same syntax for each. - When writing tests, we need to carefully specify order using explicit dependencies. - These tests will need to be duplicated for both components and resources. - We need to be sure to handle cases where ambiguous system orders exist. `changing_system` is always the system that makes the changes, and `detecting_system` always detects the changes. The component / resource changed will be simple boolean wrapper structs. ## Basic Added / Mutated / Changed 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs before `detecting_system` - verify at the end of tick 2 ## At Least Once 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs after `detecting_system` - verify at the end of tick 2 ## At Most Once 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs once before `detecting_system` - increment a counter based on the number of changes detected - verify at the end of tick 2 ## Fast Detection 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs before `detecting_system` - verify at the end of tick 1 ## Ambiguous System Ordering Robustness 2 x 3 x 2 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs [before/after] `detecting_system` in tick 1 - `changing_system` runs [after/before] `detecting_system` in tick 2 ## System Pausing 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs in tick 1, then is disabled by run criteria - `detecting_system` is disabled by run criteria until it is run once during tick 3 - verify at the end of tick 3 ## Addition Causes Mutation 2 design: - Resources vs. Components - `adding_system_1` adds a component / resource - `adding system_2` adds the same component / resource - verify the `Mutated` flag at the end of the tick - verify the `Added` flag at the end of the tick First check tests for: https://github.com/bevyengine/bevy/issues/333 Second check tests for: https://github.com/bevyengine/bevy/issues/1443 ## Changes Made By Commands - `adding_system` runs in Update in tick 1, and sends a command to add a component - `detecting_system` runs in Update in tick 1 and 2, after `adding_system` - We can't detect the changes in tick 1, since they haven't been processed yet - If we were to track these changes as being emitted by `adding_system`, we can't detect the changes in tick 2 either, since `detecting_system` has already run once after `adding_system` :( # Benchmarks See: [general advice](https://github.com/bevyengine/bevy/blob/master/docs/profiling.md), [Criterion crate](https://github.com/bheisler/criterion.rs) There are several critical parameters to vary: 1. entity count (1 to 10^9) 2. fraction of entities that are changed (0% to 100%) 3. cost to perform work on changed entities, i.e. workload (1 ns to 1s) 1 and 2 should be varied between benchmark runs. 3 can be added on computationally. We want to measure: - memory cost - run time We should collect these measurements across several frames (100?) to reduce bootup effects and accurately measure the mean, variance and drift. Entity-component change detection is much more important to benchmark than resource change detection, due to the orders of magnitude higher number of pieces of data. No change detection at all should be included in benchmarks as a second control for cases where missing changes is unacceptable. ## Graphs 1. y: performance, x: log_10(entity count), color: proposal, facet: performance metric. Set cost to perform work to 0. 2. y: run time, x: cost to perform work, color: proposal, facet: fraction changed. Set number of entities to 10^6 3. y: memory, x: frames, color: proposal # Conclusions 1. Is the theoretical categorization of the proposals correct according to our tests? 2. How does the performance of the proposals compare without any load? 3. How does the performance of the proposals compare with realistic loads? 4. At what workload does more exact change tracking become worth the (presumably) higher overhead? 5. When does adding change-detection to save on work become worthwhile? 6. Is there enough divergence in performance between the best solutions in each class to ship more than one change-tracking solution? # Implementation Plan 1. Write a test suite. 2. Verify that tests fail for existing approach. 3. Write a benchmark suite. 4. Get performance numbers for existing approach. 5. Implement, test and benchmark various solutions using a Git branch per proposal. 6. Create a draft PR with all solutions and present results to team. 7. Select a solution and replace existing change detection. Co-authored-by: Brice DAVIER <bricedavier@gmail.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-03-19 17:53:26 +00:00
ComponentStatus::Added => {
column.initialize(table_row, component_ptr, change_tick);
Reliable change detection (#1471) # Problem Definition The current change tracking (via flags for both components and resources) fails to detect changes made by systems that are scheduled to run earlier in the frame than they are. This issue is discussed at length in [#68](https://github.com/bevyengine/bevy/issues/68) and [#54](https://github.com/bevyengine/bevy/issues/54). This is very much a draft PR, and contributions are welcome and needed. # Criteria 1. Each change is detected at least once, no matter the ordering. 2. Each change is detected at most once, no matter the ordering. 3. Changes should be detected the same frame that they are made. 4. Competitive ergonomics. Ideally does not require opting-in. 5. Low CPU overhead of computation. 6. Memory efficient. This must not increase over time, except where the number of entities / resources does. 7. Changes should not be lost for systems that don't run. 8. A frame needs to act as a pure function. Given the same set of entities / components it needs to produce the same end state without side-effects. **Exact** change-tracking proposals satisfy criteria 1 and 2. **Conservative** change-tracking proposals satisfy criteria 1 but not 2. **Flaky** change tracking proposals satisfy criteria 2 but not 1. # Code Base Navigation There are three types of flags: - `Added`: A piece of data was added to an entity / `Resources`. - `Mutated`: A piece of data was able to be modified, because its `DerefMut` was accessed - `Changed`: The bitwise OR of `Added` and `Changed` The special behavior of `ChangedRes`, with respect to the scheduler is being removed in [#1313](https://github.com/bevyengine/bevy/pull/1313) and does not need to be reproduced. `ChangedRes` and friends can be found in "bevy_ecs/core/resources/resource_query.rs". The `Flags` trait for Components can be found in "bevy_ecs/core/query.rs". `ComponentFlags` are stored in "bevy_ecs/core/archetypes.rs", defined on line 446. # Proposals **Proposal 5 was selected for implementation.** ## Proposal 0: No Change Detection The baseline, where computations are performed on everything regardless of whether it changed. **Type:** Conservative **Pros:** - already implemented - will never miss events - no overhead **Cons:** - tons of repeated work - doesn't allow users to avoid repeating work (or monitoring for other changes) ## Proposal 1: Earlier-This-Tick Change Detection The current approach as of Bevy 0.4. Flags are set, and then flushed at the end of each frame. **Type:** Flaky **Pros:** - already implemented - simple to understand - low memory overhead (2 bits per component) - low time overhead (clear every flag once per frame) **Cons:** - misses systems based on ordering - systems that don't run every frame miss changes - duplicates detection when looping - can lead to unresolvable circular dependencies ## Proposal 2: Two-Tick Change Detection Flags persist for two frames, using a double-buffer system identical to that used in events. A change is observed if it is found in either the current frame's list of changes or the previous frame's. **Type:** Conservative **Pros:** - easy to understand - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - can result in a great deal of duplicated work - systems that don't run every frame miss changes - duplicates detection when looping ## Proposal 3: Last-Tick Change Detection Flags persist for two frames, using a double-buffer system identical to that used in events. A change is observed if it is found in the previous frame's list of changes. **Type:** Exact **Pros:** - exact - easy to understand - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - change detection is always delayed, possibly causing painful chained delays - systems that don't run every frame miss changes - duplicates detection when looping ## Proposal 4: Flag-Doubling Change Detection Combine Proposal 2 and Proposal 3. Differentiate between `JustChanged` (current behavior) and `Changed` (Proposal 3). Pack this data into the flags according to [this implementation proposal](https://github.com/bevyengine/bevy/issues/68#issuecomment-769174804). **Type:** Flaky + Exact **Pros:** - allows users to acc - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - users must specify the type of change detection required - still quite fragile to system ordering effects when using the flaky `JustChanged` form - cannot get immediate + exact results - systems that don't run every frame miss changes - duplicates detection when looping ## [SELECTED] Proposal 5: Generation-Counter Change Detection A global counter is increased after each system is run. Each component saves the time of last mutation, and each system saves the time of last execution. Mutation is detected when the component's counter is greater than the system's counter. Discussed [here](https://github.com/bevyengine/bevy/issues/68#issuecomment-769174804). How to handle addition detection is unsolved; the current proposal is to use the highest bit of the counter as in proposal 1. **Type:** Exact (for mutations), flaky (for additions) **Pros:** - low time overhead (set component counter on access, set system counter after execution) - robust to systems that don't run every frame - robust to systems that loop **Cons:** - moderately complex implementation - must be modified as systems are inserted dynamically - medium memory overhead (4 bytes per component + system) - unsolved addition detection ## Proposal 6: System-Data Change Detection For each system, track which system's changes it has seen. This approach is only worth fully designing and implementing if Proposal 5 fails in some way. **Type:** Exact **Pros:** - exact - conceptually simple **Cons:** - requires storing data on each system - implementation is complex - must be modified as systems are inserted dynamically ## Proposal 7: Total-Order Change Detection Discussed [here](https://github.com/bevyengine/bevy/issues/68#issuecomment-754326523). This proposal is somewhat complicated by the new scheduler, but I believe it should still be conceptually feasible. This approach is only worth fully designing and implementing if Proposal 5 fails in some way. **Type:** Exact **Pros:** - exact - efficient data storage relative to other exact proposals **Cons:** - requires access to the scheduler - complex implementation and difficulty grokking - must be modified as systems are inserted dynamically # Tests - We will need to verify properties 1, 2, 3, 7 and 8. Priority: 1 > 2 = 3 > 8 > 7 - Ideally we can use identical user-facing syntax for all proposals, allowing us to re-use the same syntax for each. - When writing tests, we need to carefully specify order using explicit dependencies. - These tests will need to be duplicated for both components and resources. - We need to be sure to handle cases where ambiguous system orders exist. `changing_system` is always the system that makes the changes, and `detecting_system` always detects the changes. The component / resource changed will be simple boolean wrapper structs. ## Basic Added / Mutated / Changed 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs before `detecting_system` - verify at the end of tick 2 ## At Least Once 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs after `detecting_system` - verify at the end of tick 2 ## At Most Once 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs once before `detecting_system` - increment a counter based on the number of changes detected - verify at the end of tick 2 ## Fast Detection 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs before `detecting_system` - verify at the end of tick 1 ## Ambiguous System Ordering Robustness 2 x 3 x 2 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs [before/after] `detecting_system` in tick 1 - `changing_system` runs [after/before] `detecting_system` in tick 2 ## System Pausing 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs in tick 1, then is disabled by run criteria - `detecting_system` is disabled by run criteria until it is run once during tick 3 - verify at the end of tick 3 ## Addition Causes Mutation 2 design: - Resources vs. Components - `adding_system_1` adds a component / resource - `adding system_2` adds the same component / resource - verify the `Mutated` flag at the end of the tick - verify the `Added` flag at the end of the tick First check tests for: https://github.com/bevyengine/bevy/issues/333 Second check tests for: https://github.com/bevyengine/bevy/issues/1443 ## Changes Made By Commands - `adding_system` runs in Update in tick 1, and sends a command to add a component - `detecting_system` runs in Update in tick 1 and 2, after `adding_system` - We can't detect the changes in tick 1, since they haven't been processed yet - If we were to track these changes as being emitted by `adding_system`, we can't detect the changes in tick 2 either, since `detecting_system` has already run once after `adding_system` :( # Benchmarks See: [general advice](https://github.com/bevyengine/bevy/blob/master/docs/profiling.md), [Criterion crate](https://github.com/bheisler/criterion.rs) There are several critical parameters to vary: 1. entity count (1 to 10^9) 2. fraction of entities that are changed (0% to 100%) 3. cost to perform work on changed entities, i.e. workload (1 ns to 1s) 1 and 2 should be varied between benchmark runs. 3 can be added on computationally. We want to measure: - memory cost - run time We should collect these measurements across several frames (100?) to reduce bootup effects and accurately measure the mean, variance and drift. Entity-component change detection is much more important to benchmark than resource change detection, due to the orders of magnitude higher number of pieces of data. No change detection at all should be included in benchmarks as a second control for cases where missing changes is unacceptable. ## Graphs 1. y: performance, x: log_10(entity count), color: proposal, facet: performance metric. Set cost to perform work to 0. 2. y: run time, x: cost to perform work, color: proposal, facet: fraction changed. Set number of entities to 10^6 3. y: memory, x: frames, color: proposal # Conclusions 1. Is the theoretical categorization of the proposals correct according to our tests? 2. How does the performance of the proposals compare without any load? 3. How does the performance of the proposals compare with realistic loads? 4. At what workload does more exact change tracking become worth the (presumably) higher overhead? 5. When does adding change-detection to save on work become worthwhile? 6. Is there enough divergence in performance between the best solutions in each class to ship more than one change-tracking solution? # Implementation Plan 1. Write a test suite. 2. Verify that tests fail for existing approach. 3. Write a benchmark suite. 4. Get performance numbers for existing approach. 5. Implement, test and benchmark various solutions using a Git branch per proposal. 6. Create a draft PR with all solutions and present results to team. 7. Select a solution and replace existing change detection. Co-authored-by: Brice DAVIER <bricedavier@gmail.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-03-19 17:53:26 +00:00
}
ComponentStatus::Mutated => {
column.replace(table_row, component_ptr, change_tick);
Reliable change detection (#1471) # Problem Definition The current change tracking (via flags for both components and resources) fails to detect changes made by systems that are scheduled to run earlier in the frame than they are. This issue is discussed at length in [#68](https://github.com/bevyengine/bevy/issues/68) and [#54](https://github.com/bevyengine/bevy/issues/54). This is very much a draft PR, and contributions are welcome and needed. # Criteria 1. Each change is detected at least once, no matter the ordering. 2. Each change is detected at most once, no matter the ordering. 3. Changes should be detected the same frame that they are made. 4. Competitive ergonomics. Ideally does not require opting-in. 5. Low CPU overhead of computation. 6. Memory efficient. This must not increase over time, except where the number of entities / resources does. 7. Changes should not be lost for systems that don't run. 8. A frame needs to act as a pure function. Given the same set of entities / components it needs to produce the same end state without side-effects. **Exact** change-tracking proposals satisfy criteria 1 and 2. **Conservative** change-tracking proposals satisfy criteria 1 but not 2. **Flaky** change tracking proposals satisfy criteria 2 but not 1. # Code Base Navigation There are three types of flags: - `Added`: A piece of data was added to an entity / `Resources`. - `Mutated`: A piece of data was able to be modified, because its `DerefMut` was accessed - `Changed`: The bitwise OR of `Added` and `Changed` The special behavior of `ChangedRes`, with respect to the scheduler is being removed in [#1313](https://github.com/bevyengine/bevy/pull/1313) and does not need to be reproduced. `ChangedRes` and friends can be found in "bevy_ecs/core/resources/resource_query.rs". The `Flags` trait for Components can be found in "bevy_ecs/core/query.rs". `ComponentFlags` are stored in "bevy_ecs/core/archetypes.rs", defined on line 446. # Proposals **Proposal 5 was selected for implementation.** ## Proposal 0: No Change Detection The baseline, where computations are performed on everything regardless of whether it changed. **Type:** Conservative **Pros:** - already implemented - will never miss events - no overhead **Cons:** - tons of repeated work - doesn't allow users to avoid repeating work (or monitoring for other changes) ## Proposal 1: Earlier-This-Tick Change Detection The current approach as of Bevy 0.4. Flags are set, and then flushed at the end of each frame. **Type:** Flaky **Pros:** - already implemented - simple to understand - low memory overhead (2 bits per component) - low time overhead (clear every flag once per frame) **Cons:** - misses systems based on ordering - systems that don't run every frame miss changes - duplicates detection when looping - can lead to unresolvable circular dependencies ## Proposal 2: Two-Tick Change Detection Flags persist for two frames, using a double-buffer system identical to that used in events. A change is observed if it is found in either the current frame's list of changes or the previous frame's. **Type:** Conservative **Pros:** - easy to understand - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - can result in a great deal of duplicated work - systems that don't run every frame miss changes - duplicates detection when looping ## Proposal 3: Last-Tick Change Detection Flags persist for two frames, using a double-buffer system identical to that used in events. A change is observed if it is found in the previous frame's list of changes. **Type:** Exact **Pros:** - exact - easy to understand - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - change detection is always delayed, possibly causing painful chained delays - systems that don't run every frame miss changes - duplicates detection when looping ## Proposal 4: Flag-Doubling Change Detection Combine Proposal 2 and Proposal 3. Differentiate between `JustChanged` (current behavior) and `Changed` (Proposal 3). Pack this data into the flags according to [this implementation proposal](https://github.com/bevyengine/bevy/issues/68#issuecomment-769174804). **Type:** Flaky + Exact **Pros:** - allows users to acc - easy to implement - low memory overhead (4 bits per component) - low time overhead (bit mask and shift every flag once per frame) **Cons:** - users must specify the type of change detection required - still quite fragile to system ordering effects when using the flaky `JustChanged` form - cannot get immediate + exact results - systems that don't run every frame miss changes - duplicates detection when looping ## [SELECTED] Proposal 5: Generation-Counter Change Detection A global counter is increased after each system is run. Each component saves the time of last mutation, and each system saves the time of last execution. Mutation is detected when the component's counter is greater than the system's counter. Discussed [here](https://github.com/bevyengine/bevy/issues/68#issuecomment-769174804). How to handle addition detection is unsolved; the current proposal is to use the highest bit of the counter as in proposal 1. **Type:** Exact (for mutations), flaky (for additions) **Pros:** - low time overhead (set component counter on access, set system counter after execution) - robust to systems that don't run every frame - robust to systems that loop **Cons:** - moderately complex implementation - must be modified as systems are inserted dynamically - medium memory overhead (4 bytes per component + system) - unsolved addition detection ## Proposal 6: System-Data Change Detection For each system, track which system's changes it has seen. This approach is only worth fully designing and implementing if Proposal 5 fails in some way. **Type:** Exact **Pros:** - exact - conceptually simple **Cons:** - requires storing data on each system - implementation is complex - must be modified as systems are inserted dynamically ## Proposal 7: Total-Order Change Detection Discussed [here](https://github.com/bevyengine/bevy/issues/68#issuecomment-754326523). This proposal is somewhat complicated by the new scheduler, but I believe it should still be conceptually feasible. This approach is only worth fully designing and implementing if Proposal 5 fails in some way. **Type:** Exact **Pros:** - exact - efficient data storage relative to other exact proposals **Cons:** - requires access to the scheduler - complex implementation and difficulty grokking - must be modified as systems are inserted dynamically # Tests - We will need to verify properties 1, 2, 3, 7 and 8. Priority: 1 > 2 = 3 > 8 > 7 - Ideally we can use identical user-facing syntax for all proposals, allowing us to re-use the same syntax for each. - When writing tests, we need to carefully specify order using explicit dependencies. - These tests will need to be duplicated for both components and resources. - We need to be sure to handle cases where ambiguous system orders exist. `changing_system` is always the system that makes the changes, and `detecting_system` always detects the changes. The component / resource changed will be simple boolean wrapper structs. ## Basic Added / Mutated / Changed 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs before `detecting_system` - verify at the end of tick 2 ## At Least Once 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs after `detecting_system` - verify at the end of tick 2 ## At Most Once 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs once before `detecting_system` - increment a counter based on the number of changes detected - verify at the end of tick 2 ## Fast Detection 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs before `detecting_system` - verify at the end of tick 1 ## Ambiguous System Ordering Robustness 2 x 3 x 2 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs [before/after] `detecting_system` in tick 1 - `changing_system` runs [after/before] `detecting_system` in tick 2 ## System Pausing 2 x 3 design: - Resources vs. Components - Added vs. Changed vs. Mutated - `changing_system` runs in tick 1, then is disabled by run criteria - `detecting_system` is disabled by run criteria until it is run once during tick 3 - verify at the end of tick 3 ## Addition Causes Mutation 2 design: - Resources vs. Components - `adding_system_1` adds a component / resource - `adding system_2` adds the same component / resource - verify the `Mutated` flag at the end of the tick - verify the `Added` flag at the end of the tick First check tests for: https://github.com/bevyengine/bevy/issues/333 Second check tests for: https://github.com/bevyengine/bevy/issues/1443 ## Changes Made By Commands - `adding_system` runs in Update in tick 1, and sends a command to add a component - `detecting_system` runs in Update in tick 1 and 2, after `adding_system` - We can't detect the changes in tick 1, since they haven't been processed yet - If we were to track these changes as being emitted by `adding_system`, we can't detect the changes in tick 2 either, since `detecting_system` has already run once after `adding_system` :( # Benchmarks See: [general advice](https://github.com/bevyengine/bevy/blob/master/docs/profiling.md), [Criterion crate](https://github.com/bheisler/criterion.rs) There are several critical parameters to vary: 1. entity count (1 to 10^9) 2. fraction of entities that are changed (0% to 100%) 3. cost to perform work on changed entities, i.e. workload (1 ns to 1s) 1 and 2 should be varied between benchmark runs. 3 can be added on computationally. We want to measure: - memory cost - run time We should collect these measurements across several frames (100?) to reduce bootup effects and accurately measure the mean, variance and drift. Entity-component change detection is much more important to benchmark than resource change detection, due to the orders of magnitude higher number of pieces of data. No change detection at all should be included in benchmarks as a second control for cases where missing changes is unacceptable. ## Graphs 1. y: performance, x: log_10(entity count), color: proposal, facet: performance metric. Set cost to perform work to 0. 2. y: run time, x: cost to perform work, color: proposal, facet: fraction changed. Set number of entities to 10^6 3. y: memory, x: frames, color: proposal # Conclusions 1. Is the theoretical categorization of the proposals correct according to our tests? 2. How does the performance of the proposals compare without any load? 3. How does the performance of the proposals compare with realistic loads? 4. At what workload does more exact change tracking become worth the (presumably) higher overhead? 5. When does adding change-detection to save on work become worthwhile? 6. Is there enough divergence in performance between the best solutions in each class to ship more than one change-tracking solution? # Implementation Plan 1. Write a test suite. 2. Verify that tests fail for existing approach. 3. Write a benchmark suite. 4. Get performance numbers for existing approach. 5. Implement, test and benchmark various solutions using a Git branch per proposal. 6. Create a draft PR with all solutions and present results to team. 7. Select a solution and replace existing change detection. Co-authored-by: Brice DAVIER <bricedavier@gmail.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-03-19 17:53:26 +00:00
}
}
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
StorageType::SparseSet => {
let sparse_set =
// SAFETY: If component_id is in self.component_ids, BundleInfo::new requires that
// a sparse set exists for the component.
unsafe { sparse_sets.get_mut(component_id).debug_checked_unwrap() };
sparse_set.insert(entity, component_ptr, change_tick);
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
}
bundle_component += 1;
});
}
/// Adds a bundle to the given archetype and returns the resulting archetype. This could be the
/// same [`ArchetypeId`], in the event that adding the given bundle does not result in an
/// [`Archetype`] change. Results are cached in the [`Archetype`] graph to avoid redundant work.
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
pub(crate) fn add_bundle_to_archetype(
&self,
archetypes: &mut Archetypes,
storages: &mut Storages,
components: &Components,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
archetype_id: ArchetypeId,
) -> ArchetypeId {
if let Some(add_bundle_id) = archetypes[archetype_id].edges().get_add_bundle(self.id) {
return add_bundle_id;
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
}
let mut new_table_components = Vec::new();
let mut new_sparse_set_components = Vec::new();
let mut bundle_status = Vec::with_capacity(self.component_ids.len());
let current_archetype = &mut archetypes[archetype_id];
for component_id in self.component_ids.iter().cloned() {
if current_archetype.contains(component_id) {
bundle_status.push(ComponentStatus::Mutated);
} else {
bundle_status.push(ComponentStatus::Added);
add more `SAFETY` comments and lint for missing ones in `bevy_ecs` (#4835) # Objective `SAFETY` comments are meant to be placed before `unsafe` blocks and should contain the reasoning of why in this case the usage of unsafe is okay. This is useful when reading the code because it makes it clear which assumptions are required for safety, and makes it easier to spot possible unsoundness holes. It also forces the code writer to think of something to write and maybe look at the safety contracts of any called unsafe methods again to double-check their correct usage. There's a clippy lint called `undocumented_unsafe_blocks` which warns when using a block without such a comment. ## Solution - since clippy expects `SAFETY` instead of `SAFE`, rename those - add `SAFETY` comments in more places - for the last remaining 3 places, add an `#[allow()]` and `// TODO` since I wasn't comfortable enough with the code to justify their safety - add ` #![warn(clippy::undocumented_unsafe_blocks)]` to `bevy_ecs` ### Note for reviewers The first commit only renames `SAFETY` to `SAFE` so it doesn't need a thorough review. https://github.com/bevyengine/bevy/pull/4835/files/cb042a416ecbe5e7d74797449969e064d8a5f13c..55cef2d6fa3aa634667a60f6d5abc16f43f16298 is the diff for all other changes. ### Safety comments where I'm not too familiar with the code https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/entity/mod.rs#L540-L546 https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/world/entity_ref.rs#L249-L252 ### Locations left undocumented with a `TODO` comment https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/schedule/executor_parallel.rs#L196-L199 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L287-L289 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L413-L415 Co-authored-by: Jakob Hellermann <hellermann@sipgate.de>
2022-07-04 14:44:24 +00:00
// SAFETY: component_id exists
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
let component_info = unsafe { components.get_info_unchecked(component_id) };
match component_info.storage_type() {
StorageType::Table => new_table_components.push(component_id),
StorageType::SparseSet => new_sparse_set_components.push(component_id),
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
}
}
}
if new_table_components.is_empty() && new_sparse_set_components.is_empty() {
let edges = current_archetype.edges_mut();
// the archetype does not change when we add this bundle
edges.insert_add_bundle(self.id, archetype_id, bundle_status);
archetype_id
} else {
let table_id;
let table_components;
let sparse_set_components;
// the archetype changes when we add this bundle. prepare the new archetype and storages
{
let current_archetype = &archetypes[archetype_id];
table_components = if new_table_components.is_empty() {
// if there are no new table components, we can keep using this table
table_id = current_archetype.table_id();
Remove redundant table and sparse set component IDs from Archetype (#4927) # Objective Archetype is a deceptively large type in memory. It stores metadata about which components are in which storage in multiple locations, which is only used when creating new Archetypes while moving entities. ## Solution Remove the redundant `Box<[ComponentId]>`s and iterate over the sparse set of component metadata instead. Reduces Archetype's size by 4 usizes (32 bytes on 64-bit systems), as well as the additional allocations for holding these slices. It'd seem like there's a downside that the origin archetype has it's component metadata iterated over twice when creating a new archetype, but this change also removes the extra `Vec<ArchetypeComponentId>` allocations when creating a new archetype which may amortize out to a net gain here. This change likely negatively impacts creating new archetypes with a large number of components, but that's a cost mitigated by the fact that these archetypal relationships are cached in Edges and is incurred only once for each edge created. ## Additional Context There are several other in-flight PRs that shrink Archetype: - #4800 merges the entities and rows Vecs together (shaves off 24 bytes per archetype) - #4809 removes unique_components and moves it to it's own dedicated storage (shaves off 72 bytes per archetype) --- ## Changelog Changed: `Archetype::table_components` and `Archetype::sparse_set_components` return iterators instead of slices. `Archetype::new` requires iterators instead of parallel slices/vecs. ## Migration Guide Do I still need to do this? I really hope people were not relying on the public facing APIs changed here.
2022-11-15 21:39:21 +00:00
current_archetype.table_components().collect()
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
} else {
new_table_components.extend(current_archetype.table_components());
// sort to ignore order while hashing
new_table_components.sort();
add more `SAFETY` comments and lint for missing ones in `bevy_ecs` (#4835) # Objective `SAFETY` comments are meant to be placed before `unsafe` blocks and should contain the reasoning of why in this case the usage of unsafe is okay. This is useful when reading the code because it makes it clear which assumptions are required for safety, and makes it easier to spot possible unsoundness holes. It also forces the code writer to think of something to write and maybe look at the safety contracts of any called unsafe methods again to double-check their correct usage. There's a clippy lint called `undocumented_unsafe_blocks` which warns when using a block without such a comment. ## Solution - since clippy expects `SAFETY` instead of `SAFE`, rename those - add `SAFETY` comments in more places - for the last remaining 3 places, add an `#[allow()]` and `// TODO` since I wasn't comfortable enough with the code to justify their safety - add ` #![warn(clippy::undocumented_unsafe_blocks)]` to `bevy_ecs` ### Note for reviewers The first commit only renames `SAFETY` to `SAFE` so it doesn't need a thorough review. https://github.com/bevyengine/bevy/pull/4835/files/cb042a416ecbe5e7d74797449969e064d8a5f13c..55cef2d6fa3aa634667a60f6d5abc16f43f16298 is the diff for all other changes. ### Safety comments where I'm not too familiar with the code https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/entity/mod.rs#L540-L546 https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/world/entity_ref.rs#L249-L252 ### Locations left undocumented with a `TODO` comment https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/schedule/executor_parallel.rs#L196-L199 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L287-L289 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L413-L415 Co-authored-by: Jakob Hellermann <hellermann@sipgate.de>
2022-07-04 14:44:24 +00:00
// SAFETY: all component ids in `new_table_components` exist
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
table_id = unsafe {
storages
.tables
.get_id_or_insert(&new_table_components, components)
};
new_table_components
};
sparse_set_components = if new_sparse_set_components.is_empty() {
Remove redundant table and sparse set component IDs from Archetype (#4927) # Objective Archetype is a deceptively large type in memory. It stores metadata about which components are in which storage in multiple locations, which is only used when creating new Archetypes while moving entities. ## Solution Remove the redundant `Box<[ComponentId]>`s and iterate over the sparse set of component metadata instead. Reduces Archetype's size by 4 usizes (32 bytes on 64-bit systems), as well as the additional allocations for holding these slices. It'd seem like there's a downside that the origin archetype has it's component metadata iterated over twice when creating a new archetype, but this change also removes the extra `Vec<ArchetypeComponentId>` allocations when creating a new archetype which may amortize out to a net gain here. This change likely negatively impacts creating new archetypes with a large number of components, but that's a cost mitigated by the fact that these archetypal relationships are cached in Edges and is incurred only once for each edge created. ## Additional Context There are several other in-flight PRs that shrink Archetype: - #4800 merges the entities and rows Vecs together (shaves off 24 bytes per archetype) - #4809 removes unique_components and moves it to it's own dedicated storage (shaves off 72 bytes per archetype) --- ## Changelog Changed: `Archetype::table_components` and `Archetype::sparse_set_components` return iterators instead of slices. `Archetype::new` requires iterators instead of parallel slices/vecs. ## Migration Guide Do I still need to do this? I really hope people were not relying on the public facing APIs changed here.
2022-11-15 21:39:21 +00:00
current_archetype.sparse_set_components().collect()
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
} else {
new_sparse_set_components.extend(current_archetype.sparse_set_components());
// sort to ignore order while hashing
new_sparse_set_components.sort();
new_sparse_set_components
};
};
let new_archetype_id =
archetypes.get_id_or_insert(table_id, table_components, sparse_set_components);
// add an edge from the old archetype to the new archetype
archetypes[archetype_id].edges_mut().insert_add_bundle(
self.id,
new_archetype_id,
bundle_status,
);
new_archetype_id
}
}
}
pub(crate) struct BundleInserter<'a, 'b> {
pub(crate) archetype: &'a mut Archetype,
pub(crate) entities: &'a mut Entities,
bundle_info: &'b BundleInfo,
table: &'a mut Table,
sparse_sets: &'a mut SparseSets,
result: InsertBundleResult<'a>,
archetypes_ptr: *mut Archetype,
change_tick: Tick,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
}
pub(crate) enum InsertBundleResult<'a> {
SameArchetype,
NewArchetypeSameTable {
new_archetype: &'a mut Archetype,
},
NewArchetypeNewTable {
new_archetype: &'a mut Archetype,
new_table: &'a mut Table,
},
}
impl<'a, 'b> BundleInserter<'a, 'b> {
/// # Safety
/// `entity` must currently exist in the source archetype for this inserter. `archetype_row`
/// must be `entity`'s location in the archetype. `T` must match this [`BundleInfo`]'s type
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[inline]
pub unsafe fn insert<T: DynamicBundle>(
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
&mut self,
entity: Entity,
Extend EntityLocation with TableId and TableRow (#6681) # Objective `Query::get` and other random access methods require looking up `EntityLocation` for every provided entity, then always looking up the `Archetype` to get the table ID and table row. This requires 4 total random fetches from memory: the `Entities` lookup, the `Archetype` lookup, the table row lookup, and the final fetch from table/sparse sets. If `EntityLocation` contains the table ID and table row, only the `Entities` lookup and the final storage fetch are required. ## Solution Add `TableId` and table row to `EntityLocation`. Ensure it's updated whenever entities are moved around. To ensure `EntityMeta` does not grow bigger, both `TableId` and `ArchetypeId` have been shrunk to u32, and the archetype index and table row are stored as u32s instead of as usizes. This should shrink `EntityMeta` by 4 bytes, from 24 to 20 bytes, as there is no padding anymore due to the change in alignment. This idea was partially concocted by @BoxyUwU. ## Performance This should restore the `Query::get` "gains" lost to #6625 that were introduced in #4800 without being unsound, and also incorporates some of the memory usage reductions seen in #3678. This also removes the same lookups during add/remove/spawn commands, so there may be a bit of a speedup in commands and `Entity{Ref,Mut}`. --- ## Changelog Added: `EntityLocation::table_id` Added: `EntityLocation::table_row`. Changed: `World`s can now only hold a maximum of 2<sup>32</sup>- 1 archetypes. Changed: `World`s can now only hold a maximum of 2<sup>32</sup> - 1 tables. ## Migration Guide A `World` can only hold a maximum of 2<sup>32</sup> - 1 archetypes and tables now. If your use case requires more than this, please file an issue explaining your use case.
2023-01-02 21:25:04 +00:00
location: EntityLocation,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
bundle: T,
) -> EntityLocation {
match &mut self.result {
InsertBundleResult::SameArchetype => {
// PERF: this could be looked up during Inserter construction and stored (but borrowing makes this nasty)
// SAFETY: The edge is assured to be initialized when creating the BundleInserter
let add_bundle = unsafe {
self.archetype
.edges()
.get_add_bundle_internal(self.bundle_info.id)
.debug_checked_unwrap()
};
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
self.bundle_info.write_components(
self.table,
self.sparse_sets,
add_bundle,
entity,
Extend EntityLocation with TableId and TableRow (#6681) # Objective `Query::get` and other random access methods require looking up `EntityLocation` for every provided entity, then always looking up the `Archetype` to get the table ID and table row. This requires 4 total random fetches from memory: the `Entities` lookup, the `Archetype` lookup, the table row lookup, and the final fetch from table/sparse sets. If `EntityLocation` contains the table ID and table row, only the `Entities` lookup and the final storage fetch are required. ## Solution Add `TableId` and table row to `EntityLocation`. Ensure it's updated whenever entities are moved around. To ensure `EntityMeta` does not grow bigger, both `TableId` and `ArchetypeId` have been shrunk to u32, and the archetype index and table row are stored as u32s instead of as usizes. This should shrink `EntityMeta` by 4 bytes, from 24 to 20 bytes, as there is no padding anymore due to the change in alignment. This idea was partially concocted by @BoxyUwU. ## Performance This should restore the `Query::get` "gains" lost to #6625 that were introduced in #4800 without being unsound, and also incorporates some of the memory usage reductions seen in #3678. This also removes the same lookups during add/remove/spawn commands, so there may be a bit of a speedup in commands and `Entity{Ref,Mut}`. --- ## Changelog Added: `EntityLocation::table_id` Added: `EntityLocation::table_row`. Changed: `World`s can now only hold a maximum of 2<sup>32</sup>- 1 archetypes. Changed: `World`s can now only hold a maximum of 2<sup>32</sup> - 1 tables. ## Migration Guide A `World` can only hold a maximum of 2<sup>32</sup> - 1 archetypes and tables now. If your use case requires more than this, please file an issue explaining your use case.
2023-01-02 21:25:04 +00:00
location.table_row,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
self.change_tick,
bundle,
);
location
}
InsertBundleResult::NewArchetypeSameTable { new_archetype } => {
let result = self.archetype.swap_remove(location.archetype_row);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
if let Some(swapped_entity) = result.swapped_entity {
let swapped_location =
// SAFETY: If the swap was successful, swapped_entity must be valid.
unsafe { self.entities.get(swapped_entity).debug_checked_unwrap() };
self.entities.set(
swapped_entity.index(),
EntityLocation {
archetype_id: swapped_location.archetype_id,
archetype_row: location.archetype_row,
table_id: swapped_location.table_id,
table_row: swapped_location.table_row,
},
);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
}
let new_location = new_archetype.allocate(entity, result.table_row);
Lock down access to Entities (#6740) # Objective The soundness of the ECS `World` partially relies on the correctness of the state of `Entities` stored within it. We're currently allowing users to (unsafely) mutate it, as well as readily construct it without using a `World`. While this is not strictly unsound so long as users (including `bevy_render`) safely use the APIs, it's a fairly easy path to unsoundness without much of a guard rail. Addresses #3362 for `bevy_ecs::entity`. Incorporates the changes from #3985. ## Solution Remove `Entities`'s `Default` implementation and force access to the type to only be through a properly constructed `World`. Additional cleanup for other parts of `bevy_ecs::entity`: - `Entity::index` and `Entity::generation` are no longer `pub(crate)`, opting to force the rest of bevy_ecs to use the public interface to access these values. - `EntityMeta` is no longer `pub` and also not `pub(crate)` to attempt to cut down on updating `generation` without going through an `Entities` API. It's currently inaccessible except via the `pub(crate)` Vec on `Entities`, there was no way for an outside user to use it. - Added `Entities::set`, an unsafe `pub(crate)` API for setting the location of an Entity (parallel to `Entities::get`) that replaces the internal case where we need to set the location of an entity when it's been spawned, moved, or despawned. - `Entities::alloc_at_without_replacement` is only used in `World::get_or_spawn` within the first party crates, and I cannot find a public use of this API in any ecosystem crate that I've checked (via GitHub search). - Attempted to document the few remaining undocumented public APIs in the module. --- ## Changelog Removed: `Entities`'s `Default` implementation. Removed: `EntityMeta` Removed: `Entities::alloc_at_without_replacement` and `AllocAtWithoutReplacement`. Co-authored-by: james7132 <contact@jamessliu.com> Co-authored-by: James Liu <contact@jamessliu.com>
2022-11-28 20:39:02 +00:00
self.entities.set(entity.index(), new_location);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
// PERF: this could be looked up during Inserter construction and stored (but borrowing makes this nasty)
// SAFETY: The edge is assured to be initialized when creating the BundleInserter
let add_bundle = unsafe {
self.archetype
.edges()
.get_add_bundle_internal(self.bundle_info.id)
.debug_checked_unwrap()
};
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
self.bundle_info.write_components(
self.table,
self.sparse_sets,
add_bundle,
entity,
result.table_row,
self.change_tick,
bundle,
);
new_location
}
InsertBundleResult::NewArchetypeNewTable {
new_archetype,
new_table,
} => {
let result = self.archetype.swap_remove(location.archetype_row);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
if let Some(swapped_entity) = result.swapped_entity {
let swapped_location =
// SAFETY: If the swap was successful, swapped_entity must be valid.
unsafe { self.entities.get(swapped_entity).debug_checked_unwrap() };
self.entities.set(
swapped_entity.index(),
EntityLocation {
archetype_id: swapped_location.archetype_id,
archetype_row: location.archetype_row,
table_id: swapped_location.table_id,
table_row: swapped_location.table_row,
},
);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
}
// PERF: store "non bundle" components in edge, then just move those to avoid
// redundant copies
let move_result = self
.table
.move_to_superset_unchecked(result.table_row, new_table);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
let new_location = new_archetype.allocate(entity, move_result.new_row);
Lock down access to Entities (#6740) # Objective The soundness of the ECS `World` partially relies on the correctness of the state of `Entities` stored within it. We're currently allowing users to (unsafely) mutate it, as well as readily construct it without using a `World`. While this is not strictly unsound so long as users (including `bevy_render`) safely use the APIs, it's a fairly easy path to unsoundness without much of a guard rail. Addresses #3362 for `bevy_ecs::entity`. Incorporates the changes from #3985. ## Solution Remove `Entities`'s `Default` implementation and force access to the type to only be through a properly constructed `World`. Additional cleanup for other parts of `bevy_ecs::entity`: - `Entity::index` and `Entity::generation` are no longer `pub(crate)`, opting to force the rest of bevy_ecs to use the public interface to access these values. - `EntityMeta` is no longer `pub` and also not `pub(crate)` to attempt to cut down on updating `generation` without going through an `Entities` API. It's currently inaccessible except via the `pub(crate)` Vec on `Entities`, there was no way for an outside user to use it. - Added `Entities::set`, an unsafe `pub(crate)` API for setting the location of an Entity (parallel to `Entities::get`) that replaces the internal case where we need to set the location of an entity when it's been spawned, moved, or despawned. - `Entities::alloc_at_without_replacement` is only used in `World::get_or_spawn` within the first party crates, and I cannot find a public use of this API in any ecosystem crate that I've checked (via GitHub search). - Attempted to document the few remaining undocumented public APIs in the module. --- ## Changelog Removed: `Entities`'s `Default` implementation. Removed: `EntityMeta` Removed: `Entities::alloc_at_without_replacement` and `AllocAtWithoutReplacement`. Co-authored-by: james7132 <contact@jamessliu.com> Co-authored-by: James Liu <contact@jamessliu.com>
2022-11-28 20:39:02 +00:00
self.entities.set(entity.index(), new_location);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
// if an entity was moved into this entity's table spot, update its table row
if let Some(swapped_entity) = move_result.swapped_entity {
let swapped_location =
// SAFETY: If the swap was successful, swapped_entity must be valid.
unsafe { self.entities.get(swapped_entity).debug_checked_unwrap() };
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
let swapped_archetype = if self.archetype.id() == swapped_location.archetype_id
{
&mut *self.archetype
} else if new_archetype.id() == swapped_location.archetype_id {
new_archetype
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
} else {
add more `SAFETY` comments and lint for missing ones in `bevy_ecs` (#4835) # Objective `SAFETY` comments are meant to be placed before `unsafe` blocks and should contain the reasoning of why in this case the usage of unsafe is okay. This is useful when reading the code because it makes it clear which assumptions are required for safety, and makes it easier to spot possible unsoundness holes. It also forces the code writer to think of something to write and maybe look at the safety contracts of any called unsafe methods again to double-check their correct usage. There's a clippy lint called `undocumented_unsafe_blocks` which warns when using a block without such a comment. ## Solution - since clippy expects `SAFETY` instead of `SAFE`, rename those - add `SAFETY` comments in more places - for the last remaining 3 places, add an `#[allow()]` and `// TODO` since I wasn't comfortable enough with the code to justify their safety - add ` #![warn(clippy::undocumented_unsafe_blocks)]` to `bevy_ecs` ### Note for reviewers The first commit only renames `SAFETY` to `SAFE` so it doesn't need a thorough review. https://github.com/bevyengine/bevy/pull/4835/files/cb042a416ecbe5e7d74797449969e064d8a5f13c..55cef2d6fa3aa634667a60f6d5abc16f43f16298 is the diff for all other changes. ### Safety comments where I'm not too familiar with the code https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/entity/mod.rs#L540-L546 https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/world/entity_ref.rs#L249-L252 ### Locations left undocumented with a `TODO` comment https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/schedule/executor_parallel.rs#L196-L199 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L287-L289 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L413-L415 Co-authored-by: Jakob Hellermann <hellermann@sipgate.de>
2022-07-04 14:44:24 +00:00
// SAFETY: the only two borrowed archetypes are above and we just did collision checks
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
&mut *self
.archetypes_ptr
.add(swapped_location.archetype_id.index())
};
self.entities.set(
swapped_entity.index(),
EntityLocation {
archetype_id: swapped_location.archetype_id,
archetype_row: swapped_location.archetype_row,
table_id: swapped_location.table_id,
table_row: result.table_row,
},
);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
swapped_archetype
.set_entity_table_row(swapped_location.archetype_row, result.table_row);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
}
// PERF: this could be looked up during Inserter construction and stored (but borrowing makes this nasty)
// SAFETY: The edge is assured to be initialized when creating the BundleInserter
let add_bundle = unsafe {
self.archetype
.edges()
.get_add_bundle_internal(self.bundle_info.id)
.debug_checked_unwrap()
};
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
self.bundle_info.write_components(
new_table,
self.sparse_sets,
add_bundle,
entity,
move_result.new_row,
self.change_tick,
bundle,
);
new_location
}
}
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
}
pub(crate) struct BundleSpawner<'a, 'b> {
pub(crate) archetype: &'a mut Archetype,
pub(crate) entities: &'a mut Entities,
bundle_info: &'b BundleInfo,
table: &'a mut Table,
sparse_sets: &'a mut SparseSets,
change_tick: Tick,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
}
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
impl<'a, 'b> BundleSpawner<'a, 'b> {
pub fn reserve_storage(&mut self, additional: usize) {
self.archetype.reserve(additional);
self.table.reserve(additional);
}
/// # Safety
/// `entity` must be allocated (but non-existent), `T` must match this [`BundleInfo`]'s type
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[inline]
pub unsafe fn spawn_non_existent<T: DynamicBundle>(
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
&mut self,
entity: Entity,
bundle: T,
) -> EntityLocation {
let table_row = self.table.allocate(entity);
let location = self.archetype.allocate(entity, table_row);
self.bundle_info.write_components(
self.table,
self.sparse_sets,
&SpawnBundleStatus,
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
entity,
table_row,
self.change_tick,
bundle,
);
Lock down access to Entities (#6740) # Objective The soundness of the ECS `World` partially relies on the correctness of the state of `Entities` stored within it. We're currently allowing users to (unsafely) mutate it, as well as readily construct it without using a `World`. While this is not strictly unsound so long as users (including `bevy_render`) safely use the APIs, it's a fairly easy path to unsoundness without much of a guard rail. Addresses #3362 for `bevy_ecs::entity`. Incorporates the changes from #3985. ## Solution Remove `Entities`'s `Default` implementation and force access to the type to only be through a properly constructed `World`. Additional cleanup for other parts of `bevy_ecs::entity`: - `Entity::index` and `Entity::generation` are no longer `pub(crate)`, opting to force the rest of bevy_ecs to use the public interface to access these values. - `EntityMeta` is no longer `pub` and also not `pub(crate)` to attempt to cut down on updating `generation` without going through an `Entities` API. It's currently inaccessible except via the `pub(crate)` Vec on `Entities`, there was no way for an outside user to use it. - Added `Entities::set`, an unsafe `pub(crate)` API for setting the location of an Entity (parallel to `Entities::get`) that replaces the internal case where we need to set the location of an entity when it's been spawned, moved, or despawned. - `Entities::alloc_at_without_replacement` is only used in `World::get_or_spawn` within the first party crates, and I cannot find a public use of this API in any ecosystem crate that I've checked (via GitHub search). - Attempted to document the few remaining undocumented public APIs in the module. --- ## Changelog Removed: `Entities`'s `Default` implementation. Removed: `EntityMeta` Removed: `Entities::alloc_at_without_replacement` and `AllocAtWithoutReplacement`. Co-authored-by: james7132 <contact@jamessliu.com> Co-authored-by: James Liu <contact@jamessliu.com>
2022-11-28 20:39:02 +00:00
self.entities.set(entity.index(), location);
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
location
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
/// # Safety
/// `T` must match this [`BundleInfo`]'s type
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[inline]
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
pub unsafe fn spawn<T: Bundle>(&mut self, bundle: T) -> Entity {
let entity = self.entities.alloc();
add more `SAFETY` comments and lint for missing ones in `bevy_ecs` (#4835) # Objective `SAFETY` comments are meant to be placed before `unsafe` blocks and should contain the reasoning of why in this case the usage of unsafe is okay. This is useful when reading the code because it makes it clear which assumptions are required for safety, and makes it easier to spot possible unsoundness holes. It also forces the code writer to think of something to write and maybe look at the safety contracts of any called unsafe methods again to double-check their correct usage. There's a clippy lint called `undocumented_unsafe_blocks` which warns when using a block without such a comment. ## Solution - since clippy expects `SAFETY` instead of `SAFE`, rename those - add `SAFETY` comments in more places - for the last remaining 3 places, add an `#[allow()]` and `// TODO` since I wasn't comfortable enough with the code to justify their safety - add ` #![warn(clippy::undocumented_unsafe_blocks)]` to `bevy_ecs` ### Note for reviewers The first commit only renames `SAFETY` to `SAFE` so it doesn't need a thorough review. https://github.com/bevyengine/bevy/pull/4835/files/cb042a416ecbe5e7d74797449969e064d8a5f13c..55cef2d6fa3aa634667a60f6d5abc16f43f16298 is the diff for all other changes. ### Safety comments where I'm not too familiar with the code https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/entity/mod.rs#L540-L546 https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/world/entity_ref.rs#L249-L252 ### Locations left undocumented with a `TODO` comment https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/schedule/executor_parallel.rs#L196-L199 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L287-L289 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L413-L415 Co-authored-by: Jakob Hellermann <hellermann@sipgate.de>
2022-07-04 14:44:24 +00:00
// SAFETY: entity is allocated (but non-existent), `T` matches this BundleInfo's type
Spawn specific entities: spawn or insert operations, refactor spawn internals, world clearing (#2673) This upstreams the code changes used by the new renderer to enable cross-app Entity reuse: * Spawning at specific entities * get_or_spawn: spawns an entity if it doesn't already exist and returns an EntityMut * insert_or_spawn_batch: the batched equivalent to `world.get_or_spawn(entity).insert_bundle(bundle)` * Clearing entities and storages * Allocating Entities with "invalid" archetypes. These entities cannot be queried / are treated as "non existent". They serve as "reserved" entities that won't show up when calling `spawn()`. They must be "specifically spawned at" using apis like `get_or_spawn(entity)`. In combination, these changes enable the "render world" to clear entities / storages each frame and reserve all "app world entities". These can then be spawned during the "render extract step". This refactors "spawn" and "insert" code in a way that I think is a massive improvement to legibility and re-usability. It also yields marginal performance wins by reducing some duplicate lookups (less than a percentage point improvement on insertion benchmarks). There is also some potential for future unsafe reduction (by making BatchSpawner and BatchInserter generic). But for now I want to cut down generic usage to a minimum to encourage smaller binaries and faster compiles. This is currently a draft because it needs more tests (although this code has already had some real-world testing on my custom-shaders branch). I also fixed the benchmarks (which currently don't compile!) / added new ones to illustrate batching wins. After these changes, Bevy ECS is basically ready to accommodate the new renderer. I think the biggest missing piece at this point is "sub apps".
2021-08-25 23:34:02 +00:00
self.spawn_non_existent(entity, bundle);
entity
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
}
/// Metadata for bundles. Stores a [`BundleInfo`] for each type of [`Bundle`] in a given world.
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[derive(Default)]
pub struct Bundles {
bundle_infos: Vec<BundleInfo>,
/// Cache static [`BundleId`]
use bevy_utils::HashMap for better performance. TypeId is predefined … (#7642) …u64, so hash safety is not a concern # Objective - While reading the code, just noticed the BundleInfo's HashMap is std::collections::HashMap, which uses a slow but safe hasher. ## Solution - Use bevy_utils::HashMap instead benchmark diff (I run several times in a linux box, the perf improvement is consistent, though numbers varies from time to time, I paste my last run result here): ``` bash cargo bench -- spawn Compiling bevy_ecs v0.9.0 (/home/lishuo/developer/pr/bevy/crates/bevy_ecs) Compiling bevy_app v0.9.0 (/home/lishuo/developer/pr/bevy/crates/bevy_app) Compiling benches v0.1.0 (/home/lishuo/developer/pr/bevy/benches) Finished bench [optimized] target(s) in 1m 17s Running benches/bevy_ecs/change_detection.rs (/home/lishuo/developer/pr/bevy/benches/target/release/deps/change_detection-86c5445d0dc34529) Gnuplot not found, using plotters backend Running benches/bevy_ecs/benches.rs (/home/lishuo/developer/pr/bevy/benches/target/release/deps/ecs-e49b3abe80bfd8c0) Gnuplot not found, using plotters backend spawn_commands/2000_entities time: [153.94 µs 159.19 µs 164.37 µs] change: [-14.706% -11.050% -6.9633%] (p = 0.00 < 0.05) Performance has improved. spawn_commands/4000_entities time: [328.77 µs 339.11 µs 349.11 µs] change: [-7.6331% -3.9932% +0.0487%] (p = 0.06 > 0.05) No change in performance detected. spawn_commands/6000_entities time: [445.01 µs 461.29 µs 477.36 µs] change: [-16.639% -13.358% -10.006%] (p = 0.00 < 0.05) Performance has improved. spawn_commands/8000_entities time: [657.94 µs 677.71 µs 696.95 µs] change: [-8.8708% -5.2591% -1.6847%] (p = 0.01 < 0.05) Performance has improved. get_or_spawn/individual time: [452.02 µs 466.70 µs 482.07 µs] change: [-17.218% -14.041% -10.728%] (p = 0.00 < 0.05) Performance has improved. get_or_spawn/batched time: [291.12 µs 301.12 µs 311.31 µs] change: [-12.281% -8.9163% -5.3660%] (p = 0.00 < 0.05) Performance has improved. spawn_world/1_entities time: [81.668 ns 84.284 ns 86.860 ns] change: [-12.251% -6.7872% -1.5402%] (p = 0.02 < 0.05) Performance has improved. spawn_world/10_entities time: [789.78 ns 821.96 ns 851.95 ns] change: [-19.738% -14.186% -8.0733%] (p = 0.00 < 0.05) Performance has improved. spawn_world/100_entities time: [7.9906 µs 8.2449 µs 8.5013 µs] change: [-12.417% -6.6837% -0.8766%] (p = 0.02 < 0.05) Change within noise threshold. spawn_world/1000_entities time: [81.602 µs 84.161 µs 86.833 µs] change: [-13.656% -8.6520% -3.0491%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking spawn_world/10000_entities: Warming up for 500.00 ms Warning: Unable to complete 100 samples in 4.0s. You may wish to increase target time to 4.0s, enable flat sampling, or reduce sample count to 70. spawn_world/10000_entities time: [813.02 µs 839.76 µs 865.41 µs] change: [-12.133% -6.1970% -0.2302%] (p = 0.05 < 0.05) Change within noise threshold. ``` --- ## Changelog > This section is optional. If this was a trivial fix, or has no externally-visible impact, you can delete this section. - use bevy_utils::HashMap for Bundles::bundle_ids ## Migration Guide > This section is optional. If there are no breaking changes, you can delete this section. - Not a breaking change, hashmap is internal impl.
2023-02-15 04:19:26 +00:00
bundle_ids: TypeIdMap<BundleId>,
/// Cache dynamic [`BundleId`] with multiple components
dynamic_bundle_ids: HashMap<Vec<ComponentId>, (BundleId, Vec<StorageType>)>,
/// Cache optimized dynamic [`BundleId`] with single component
dynamic_component_bundle_ids: HashMap<ComponentId, (BundleId, StorageType)>,
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
impl Bundles {
/// Gets the metadata associated with a specific type of bundle.
/// Returns `None` if the bundle is not registered with the world.
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[inline]
pub fn get(&self, bundle_id: BundleId) -> Option<&BundleInfo> {
self.bundle_infos.get(bundle_id.index())
}
/// Gets the value identifying a specific type of bundle.
/// Returns `None` if the bundle does not exist in the world,
/// or if `type_id` does not correspond to a type of bundle.
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
#[inline]
pub fn get_id(&self, type_id: TypeId) -> Option<BundleId> {
self.bundle_ids.get(&type_id).cloned()
}
/// Initializes a new [`BundleInfo`] for a statically known type.
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
pub(crate) fn init_info<'a, T: Bundle>(
&'a mut self,
components: &mut Components,
storages: &mut Storages,
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
) -> &'a BundleInfo {
let bundle_infos = &mut self.bundle_infos;
let id = self.bundle_ids.entry(TypeId::of::<T>()).or_insert_with(|| {
let mut component_ids = Vec::new();
T::component_ids(components, storages, &mut |id| component_ids.push(id));
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
let id = BundleId(bundle_infos.len());
Remove unnecessary branching from bundle insertion (#6902) # Objective Speed up bundle insertion and spawning from a bundle. ## Solution Use the same technique used in #6800 to remove the branch on storage type when writing components from a `Bundle` into storage. - Add a `StorageType` argument to the closure on `Bundle::get_components`. - Pass `C::Storage::STORAGE_TYPE` into that argument. - Match on that argument instead of reading from a `Vec<StorageType>` in `BundleInfo`. - Marked all implementations of `Bundle::get_components` as inline to encourage dead code elimination. The `Vec<StorageType>` in `BundleInfo` was also removed as it's no longer needed. If users were reliant on this, they can either use the compile time constants or fetch the information from `Components`. Should save a rather negligible amount of memory. ## Performance Microbenchmarks show a slight improvement to inserting components into existing entities, as well as spawning from a bundle. Ranging about 8-16% faster depending on the benchmark. ``` group main soft-constant-write-components ----- ---- ------------------------------ add_remove/sparse_set 1.08 1019.0±80.10µs ? ?/sec 1.00 944.6±66.86µs ? ?/sec add_remove/table 1.07 1343.3±20.37µs ? ?/sec 1.00 1257.3±18.13µs ? ?/sec add_remove_big/sparse_set 1.08 1132.4±263.10µs ? ?/sec 1.00 1050.8±240.74µs ? ?/sec add_remove_big/table 1.02 2.6±0.05ms ? ?/sec 1.00 2.5±0.08ms ? ?/sec get_or_spawn/batched 1.15 401.4±17.76µs ? ?/sec 1.00 349.3±11.26µs ? ?/sec get_or_spawn/individual 1.13 732.1±43.35µs ? ?/sec 1.00 645.6±41.44µs ? ?/sec insert_commands/insert 1.12 623.9±37.48µs ? ?/sec 1.00 557.4±34.99µs ? ?/sec insert_commands/insert_batch 1.16 401.4±17.00µs ? ?/sec 1.00 347.4±12.87µs ? ?/sec insert_simple/base 1.08 416.9±5.60µs ? ?/sec 1.00 385.2±4.14µs ? ?/sec insert_simple/unbatched 1.06 934.5±44.58µs ? ?/sec 1.00 881.3±47.86µs ? ?/sec spawn_commands/2000_entities 1.09 190.7±11.41µs ? ?/sec 1.00 174.7±9.15µs ? ?/sec spawn_commands/4000_entities 1.10 386.5±25.33µs ? ?/sec 1.00 352.3±18.81µs ? ?/sec spawn_commands/6000_entities 1.10 586.2±34.42µs ? ?/sec 1.00 535.3±27.25µs ? ?/sec spawn_commands/8000_entities 1.08 778.5±45.15µs ? ?/sec 1.00 718.0±33.66µs ? ?/sec spawn_world/10000_entities 1.04 1026.4±195.46µs ? ?/sec 1.00 985.8±253.37µs ? ?/sec spawn_world/1000_entities 1.06 103.8±20.23µs ? ?/sec 1.00 97.6±18.22µs ? ?/sec spawn_world/100_entities 1.15 11.4±4.25µs ? ?/sec 1.00 9.9±1.87µs ? ?/sec spawn_world/10_entities 1.05 1030.8±229.78ns ? ?/sec 1.00 986.2±231.12ns ? ?/sec spawn_world/1_entities 1.01 105.1±23.33ns ? ?/sec 1.00 104.6±31.84ns ? ?/sec ``` --- ## Changelog Changed: `Bundle::get_components` now takes a `FnMut(StorageType, OwningPtr)`. The provided storage type must be correct for the component being fetched.
2022-12-11 18:46:43 +00:00
let bundle_info =
// SAFETY: T::component_id ensures its:
// - info was created
// - appropriate storage for it has been initialized.
// - was created in the same order as the components in T
unsafe { BundleInfo::new(std::any::type_name::<T>(), components, component_ids, id) };
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
bundle_infos.push(bundle_info);
id
});
add more `SAFETY` comments and lint for missing ones in `bevy_ecs` (#4835) # Objective `SAFETY` comments are meant to be placed before `unsafe` blocks and should contain the reasoning of why in this case the usage of unsafe is okay. This is useful when reading the code because it makes it clear which assumptions are required for safety, and makes it easier to spot possible unsoundness holes. It also forces the code writer to think of something to write and maybe look at the safety contracts of any called unsafe methods again to double-check their correct usage. There's a clippy lint called `undocumented_unsafe_blocks` which warns when using a block without such a comment. ## Solution - since clippy expects `SAFETY` instead of `SAFE`, rename those - add `SAFETY` comments in more places - for the last remaining 3 places, add an `#[allow()]` and `// TODO` since I wasn't comfortable enough with the code to justify their safety - add ` #![warn(clippy::undocumented_unsafe_blocks)]` to `bevy_ecs` ### Note for reviewers The first commit only renames `SAFETY` to `SAFE` so it doesn't need a thorough review. https://github.com/bevyengine/bevy/pull/4835/files/cb042a416ecbe5e7d74797449969e064d8a5f13c..55cef2d6fa3aa634667a60f6d5abc16f43f16298 is the diff for all other changes. ### Safety comments where I'm not too familiar with the code https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/entity/mod.rs#L540-L546 https://github.com/bevyengine/bevy/blob/774012ece50e4add4fcc8324ec48bbecf5546c3c/crates/bevy_ecs/src/world/entity_ref.rs#L249-L252 ### Locations left undocumented with a `TODO` comment https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/schedule/executor_parallel.rs#L196-L199 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L287-L289 https://github.com/bevyengine/bevy/blob/5dde944a3051426ac69fdedc5699f7da97a7e147/crates/bevy_ecs/src/world/entity_ref.rs#L413-L415 Co-authored-by: Jakob Hellermann <hellermann@sipgate.de>
2022-07-04 14:44:24 +00:00
// SAFETY: index either exists, or was initialized
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
unsafe { self.bundle_infos.get_unchecked(id.0) }
}
/// Initializes a new [`BundleInfo`] for a dynamic [`Bundle`].
///
/// # Panics
///
/// Panics if any of the provided [`ComponentId`]s do not exist in the
/// provided [`Components`].
pub(crate) fn init_dynamic_info(
&mut self,
components: &Components,
component_ids: &[ComponentId],
) -> (&BundleInfo, &Vec<StorageType>) {
let bundle_infos = &mut self.bundle_infos;
// Use `raw_entry_mut` to avoid cloning `component_ids` to access `Entry`
let (_, (bundle_id, storage_types)) = self
.dynamic_bundle_ids
.raw_entry_mut()
.from_key(component_ids)
.or_insert_with(|| {
(
Vec::from(component_ids),
initialize_dynamic_bundle(bundle_infos, components, Vec::from(component_ids)),
)
});
// SAFETY: index either exists, or was initialized
let bundle_info = unsafe { bundle_infos.get_unchecked(bundle_id.0) };
(bundle_info, storage_types)
}
/// Initializes a new [`BundleInfo`] for a dynamic [`Bundle`] with single component.
///
/// # Panics
///
/// Panics if the provided [`ComponentId`] does not exist in the provided [`Components`].
pub(crate) fn init_component_info(
&mut self,
components: &Components,
component_id: ComponentId,
) -> (&BundleInfo, StorageType) {
let bundle_infos = &mut self.bundle_infos;
let (bundle_id, storage_types) = self
.dynamic_component_bundle_ids
.entry(component_id)
.or_insert_with(|| {
let (id, storage_type) =
initialize_dynamic_bundle(bundle_infos, components, vec![component_id]);
// SAFETY: `storage_type` guaranteed to have length 1
(id, storage_type[0])
});
// SAFETY: index either exists, or was initialized
let bundle_info = unsafe { bundle_infos.get_unchecked(bundle_id.0) };
(bundle_info, *storage_types)
}
Bevy ECS V2 (#1525) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * **Archetypal ECS**: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * **Sparse Set ECS**: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * **Tables** (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * **Sparse Sets** * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(|c| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(|(a, mut b)| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); *b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(|world: &mut World, a: &mut A| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements * Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)
2021-03-05 07:54:35 +00:00
}
/// Asserts that all components are part of [`Components`]
/// and initializes a [`BundleInfo`].
fn initialize_dynamic_bundle(
bundle_infos: &mut Vec<BundleInfo>,
components: &Components,
component_ids: Vec<ComponentId>,
) -> (BundleId, Vec<StorageType>) {
// Assert component existence
let storage_types = component_ids.iter().map(|&id| {
components.get_info(id).unwrap_or_else(|| {
panic!(
"init_dynamic_info called with component id {id:?} which doesn't exist in this world"
)
}).storage_type()
}).collect();
let id = BundleId(bundle_infos.len());
let bundle_info =
// SAFETY: `component_ids` are valid as they were just checked
unsafe { BundleInfo::new("<dynamic bundle>", components, component_ids, id) };
bundle_infos.push(bundle_info);
(id, storage_types)
}