Commit graph

119 commits

Author SHA1 Message Date
Patrick Walton
3188e5af61
Batch skinned meshes on platforms where storage buffers are available. (#16599)
This commit makes skinned meshes batchable on platforms other than WebGL
2. On supported platforms, it replaces the two uniform buffers used for
joint matrices with a pair of storage buffers containing all matrices
for all skinned meshes packed together. The indices into the buffer are
stored in the mesh uniform and mesh input uniform. The GPU mesh
preprocessing step copies the indices in if that step is enabled.

On the `many_foxes` demo, I observed a frame time decrease from 15.470ms
to 11.935ms. This is the result of reducing the `submit_graph_commands`
time from an average of 5.45ms to 0.489ms, an 11x speedup in that
portion of rendering.

![Screenshot 2024-12-01
192838](https://github.com/user-attachments/assets/7d2db997-8939-466e-8b9e-050d4a6a78ee)

This is what the profile looks like for `many_foxes` after these
changes.

![Screenshot 2024-12-01
193026](https://github.com/user-attachments/assets/68983fc3-01b8-41fd-835e-3d93cb65d0fa)

---------

Co-authored-by: François Mockers <mockersf@gmail.com>
2024-12-10 17:50:03 +00:00
Patrick Walton
7ed1f327d9
Make StandardMaterial bindless. (#16644)
This commit makes `StandardMaterial` use bindless textures, as
implemented in PR #16368. Non-bindless mode, as used for example in
Metal and WebGL 2, remains fully supported via a plethora of `#ifdef
BINDLESS` preprocessor definitions.

Unfortunately, this PR introduces quite a bit of unsightliness into the
PBR shaders. This is a result of the fact that WGSL supports neither
passing binding arrays to functions nor passing individual *elements* of
binding arrays to functions, except directly to texture sample
functions. Thus we're unable to use the `sample_texture` abstraction
that helped abstract over the meshlet and non-meshlet paths. I don't
think there's anything we can do to help this other than to suggest
improvements to upstream Naga.
2024-12-10 17:48:56 +00:00
Patrick Walton
56c70f8463
Make visibility range (HLOD) dithering work when prepasses are enabled. (#16286)
Currently, the prepass has no support for visibility ranges, so
artifacts appear when using dithering visibility ranges in conjunction
with a prepass. This patch fixes that problem.

Note that this patch changes the prepass to use sparse bind group
indices instead of sequential ones. I figured this is cleaner, because
it allows for greater sharing of WGSL code between the forward pipeline
and the prepass pipeline.

The `visibility_range` example has been updated to allow the prepass to
be toggled on and off.
2024-12-04 17:34:36 +00:00
Patrick Walton
5adf831b42
Add a bindless mode to AsBindGroup. (#16368)
This patch adds the infrastructure necessary for Bevy to support
*bindless resources*, by adding a new `#[bindless]` attribute to
`AsBindGroup`.

Classically, only a single texture (or sampler, or buffer) can be
attached to each shader binding. This means that switching materials
requires breaking a batch and issuing a new drawcall, even if the mesh
is otherwise identical. This adds significant overhead not only in the
driver but also in `wgpu`, as switching bind groups increases the amount
of validation work that `wgpu` must do.

*Bindless resources* are the typical solution to this problem. Instead
of switching bindings between each texture, the renderer instead
supplies a large *array* of all textures in the scene up front, and the
material contains an index into that array. This pattern is repeated for
buffers and samplers as well. The renderer now no longer needs to switch
binding descriptor sets while drawing the scene.

Unfortunately, as things currently stand, this approach won't quite work
for Bevy. Two aspects of `wgpu` conspire to make this ideal approach
unacceptably slow:

1. In the DX12 backend, all binding arrays (bindless resources) must
have a constant size declared in the shader, and all textures in an
array must be bound to actual textures. Changing the size requires a
recompile.

2. Changing even one texture incurs revalidation of all textures, a
process that takes time that's linear in the total size of the binding
array.

This means that declaring a large array of textures big enough to
encompass the entire scene is presently unacceptably slow. For example,
if you declare 4096 textures, then `wgpu` will have to revalidate all
4096 textures if even a single one changes. This process can take
multiple frames.

To work around this problem, this PR groups bindless resources into
small *slabs* and maintains a free list for each. The size of each slab
for the bindless arrays associated with a material is specified via the
`#[bindless(N)]` attribute. For instance, consider the following
declaration:

```rust
#[derive(AsBindGroup)]
#[bindless(16)]
struct MyMaterial {
    #[buffer(0)]
    color: Vec4,
    #[texture(1)]
    #[sampler(2)]
    diffuse: Handle<Image>,
}
```

The `#[bindless(N)]` attribute specifies that, if bindless arrays are
supported on the current platform, each resource becomes a binding array
of N instances of that resource. So, for `MyMaterial` above, the `color`
attribute is exposed to the shader as `binding_array<vec4<f32>, 16>`,
the `diffuse` texture is exposed to the shader as
`binding_array<texture_2d<f32>, 16>`, and the `diffuse` sampler is
exposed to the shader as `binding_array<sampler, 16>`. Inside the
material's vertex and fragment shaders, the applicable index is
available via the `material_bind_group_slot` field of the `Mesh`
structure. So, for instance, you can access the current color like so:

```wgsl
// `uniform` binding arrays are a non-sequitur, so `uniform` is automatically promoted
// to `storage` in bindless mode.
@group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>;
...
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    let color = material_color[mesh[in.instance_index].material_bind_group_slot];
    ...
}
```

Note that portable shader code can't guarantee that the current platform
supports bindless textures. Indeed, bindless mode is only available in
Vulkan and DX12. The `BINDLESS` shader definition is available for your
use to determine whether you're on a bindless platform or not. Thus a
portable version of the shader above would look like:

```wgsl
#ifdef BINDLESS
@group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>;
#else // BINDLESS
@group(2) @binding(0) var<uniform> material_color: Color;
#endif // BINDLESS
...
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
#ifdef BINDLESS
    let color = material_color[mesh[in.instance_index].material_bind_group_slot];
#else // BINDLESS
    let color = material_color;
#endif // BINDLESS
    ...
}
```

Importantly, this PR *doesn't* update `StandardMaterial` to be bindless.
So, for example, `scene_viewer` will currently not run any faster. I
intend to update `StandardMaterial` to use bindless mode in a follow-up
patch.

A new example, `shaders/shader_material_bindless`, has been added to
demonstrate how to use this new feature.

Here's a Tracy profile of `submit_graph_commands` of this patch and an
additional patch (not submitted yet) that makes `StandardMaterial` use
bindless. Red is those patches; yellow is `main`. The scene was Bistro
Exterior with a hack that forces all textures to opaque. You can see a
1.47x mean speedup.
![Screenshot 2024-11-12
161713](https://github.com/user-attachments/assets/4334b362-42c8-4d64-9cfb-6835f019b95c)

## Migration Guide

* `RenderAssets::prepare_asset` now takes an `AssetId` parameter.
* Bin keys now have Bevy-specific material bind group indices instead of
`wgpu` material bind group IDs, as part of the bindless change. Use the
new `MaterialBindGroupAllocator` to map from bind group index to bind
group ID.
2024-12-03 18:00:34 +00:00
JMS55
d221665386
Native unclipped depth on supported platforms (#16095)
# Objective
- Fixes #16078

## Solution

- Rename things to clarify that we _want_ unclipped depth for
directional light shadow views, and need some way of disabling the GPU's
builtin depth clipping
- Use DEPTH_CLIP_CONTROL instead of the fragment shader emulation on
supported platforms
- Pass only the clip position depth instead of the whole clip position
between vertex->fragment shader (no idea if this helps performance or
not, compiler might optimize it anyways)
- Meshlets
- HW raster always uses DEPTH_CLIP_CONTROL since it targets a more
limited set of platforms
- SW raster was not handling DEPTH_CLAMP_ORTHO correctly, it ended up
pretty much doing nothing.
- This PR made me realize that SW raster technically should have depth
clipping for all views that are not directional light shadows, but I
decided not to bother writing it. I'm not sure that it ever matters in
practice. If proven otherwise, I can add it.

## Testing

- Did you test these changes? If so, how?
- Lighting example. Both opaque (no fragment shader) and alpha masked
geometry (fragment shader emulation) are working with
depth_clip_control, and both work when it's turned off. Also tested
meshlet example.
- Are there any parts that need more testing?
  - Performance. I can't figure out a good test scene.
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
- Toggle depth_clip_control_supported in prepass/mod.rs line 323 to turn
this PR on or off.
- If relevant, what platforms did you test these changes on, and are
there any important ones you can't test?
  - Native

---

## Migration Guide
- `MeshPipelineKey::DEPTH_CLAMP_ORTHO` is now
`MeshPipelineKey::UNCLIPPED_DEPTH_ORTHO`
- The `DEPTH_CLAMP_ORTHO` shaderdef has been renamed to
`UNCLIPPED_DEPTH_ORTHO_EMULATION`
- `clip_position_unclamped: vec4<f32>` is now `unclipped_depth: f32`
2024-12-03 17:30:14 +00:00
atlv
c29e67153b
Expose Pipeline Compilation Zero Initialize Workgroup Memory Option (#16301)
# Objective

- wgpu 0.20 made workgroup vars stop being zero-init by default. this
broke some applications (cough foresight cough) and now we workaround
it. wgpu exposes a compilation option that zero initializes workgroup
memory by default, but bevy does not expose it.

## Solution

- expose the compilation option wgpu gives us

## Testing

- ran examples: 3d_scene, compute_shader_game_of_life, gpu_readback,
lines, specialized_mesh_pipeline. they all work
- confirmed fix for our own problems

---

</details>

## Migration Guide

- add `zero_initialize_workgroup_memory: false,` to
`ComputePipelineDescriptor` or `RenderPipelineDescriptor` structs to
preserve 0.14 functionality, add `zero_initialize_workgroup_memory:
true,` to restore bevy 0.13 functionality.
2024-11-08 21:42:37 +00:00
JMS55
3fb6cefb2f
Meshlet fill cluster buffers rewritten (#15955)
# Objective
- Make the meshlet fill cluster buffers pass slightly faster
- Address https://github.com/bevyengine/bevy/issues/15920 for meshlets
- Added PreviousGlobalTransform as a required meshlet component to avoid
extra archetype moves, slightly alleviating
https://github.com/bevyengine/bevy/issues/14681 for meshlets
- Enforce that MeshletPlugin::cluster_buffer_slots is not greater than
2^25 (glitches will occur otherwise). Technically this field controls
post-lod/culling cluster count, and the issue is on pre-lod/culling
cluster count, but it's still valid now, and in the future this will be
more true.

Needs to be merged after https://github.com/bevyengine/bevy/pull/15846
and https://github.com/bevyengine/bevy/pull/15886

## Solution

- Old pass dispatched a thread per cluster, and did a binary search over
the instances to find which instance the cluster belongs to, and what
meshlet index within the instance it is.
- New pass dispatches a workgroup per instance, and has the workgroup
loop over all meshlets in the instance in order to write out the cluster
data.
- Use a push constant instead of arrayLength to fix the linked bug
- Remap 1d->2d dispatch for software raster only if actually needed to
save on spawning excess workgroups

## Testing

- Did you test these changes? If so, how?
- Ran the meshlet example, and an example with 1041 instances of 32217
meshlets per instance. Profiled the second scene with nsight, went from
0.55ms -> 0.40ms. Small savings. We're pretty much VRAM bandwidth bound
at this point.
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
  - Run the meshlet example

## Changelog (non-meshlets)
- PreviousGlobalTransform now implements the Default trait
2024-10-23 19:18:49 +00:00
akimakinai
61350cd36f
Remove components if not extracted (#15948)
# Objective

- Fixes https://github.com/bevyengine/bevy/issues/15871
(Camera is done in #15946)

## Solution

- Do the same as #15904 for other extraction systems
- Added missing `SyncComponentPlugin` for DOF, TAA, and SSAO
(According to the
[documentation](https://dev-docs.bevyengine.org/bevy/render/sync_component/struct.SyncComponentPlugin.html),
this plugin "needs to be added for manual extraction implementations."
We may need to check this is done.)

## Testing

Modified example locally to add toggles if not exist.
- [x] DOF - toggling DOF component and perspective in `depth_of_field`
example
- [x] TAA - toggling `Camera.is_active` and TAA component
- [x] clusters - not entirely sure, toggling `Camera.is_active` in
`many_lights` example (no crash/glitch even without this PR)
- [x] previous_view - toggling `Camera.is_active` in `skybox` (no
crash/glitch even without this PR)
- [x] lights - toggling `Visibility` of `DirectionalLight` in `lighting`
example
- [x] SSAO - toggling `Camera.is_active` and SSAO component in `ssao`
example
- [x] default UI camera view - toggling `Camera.is_active` (nop without
#15946 because UI defaults to some camera even if `DefaultCameraView` is
not there)
- [x] volumetric fog - toggling existence of volumetric light. Looks
like optimization, no change in behavior/visuals
2024-10-19 15:13:39 +00:00
Sou1gh0st
3a478ad5c1
Fix lightmaps break when deferred rendering is enabled (#14599)
# Objective
- Fixes https://github.com/bevyengine/bevy/issues/13552

## Solution
- Thanks for the guidance from @DGriffin91, the current solution is to
transmit the light_map through the emissive channel to avoid increasing
the bandwidth of deferred shading.
- <del>Store lightmap sample result into G-Buffer and pass them into the
`Deferred Lighting Pipeline`, therefore we can get the correct indirect
lighting via the `apply_pbr_lighting` function.</del>
- <del>The original G-Buffer lacks storage for lightmap data, therefore
a new buffer is added. We can only use Rgba16Uint here due to the
32-byte limit on the render targets.</del>

## Testing
- Need to test all the examples that contains a prepass, with both the
forward and deferred rendering mode.
- I have tested the ones below.
- `lightmaps` (adjust the code based on the issue and check the
rendering result)
    - `transmission` (it contains a prepass)
    - `ssr` (it also uses the G-Bufffer)
    - `meshlet` (forward and deferred)
    - `pbr`

## Showcase
By updating the `lightmaps` example to use deferred rendering, this pull
request enables correct rendering result of the Cornell Box.

```
diff --git a/examples/3d/lightmaps.rs b/examples/3d/lightmaps.rs
index 564a3162b..11a748fba 100644
--- a/examples/3d/lightmaps.rs
+++ b/examples/3d/lightmaps.rs
@@ -1,12 +1,14 @@
 //! Rendering a scene with baked lightmaps.
 
-use bevy::pbr::Lightmap;
+use bevy::core_pipeline::prepass::DeferredPrepass;
+use bevy::pbr::{DefaultOpaqueRendererMethod, Lightmap};
 use bevy::prelude::*;
 
 fn main() {
     App::new()
         .add_plugins(DefaultPlugins)
         .insert_resource(AmbientLight::NONE)
+        .insert_resource(DefaultOpaqueRendererMethod::deferred())
         .add_systems(Startup, setup)
         .add_systems(Update, add_lightmaps_to_meshes)
         .run();
@@ -19,10 +21,12 @@ fn setup(mut commands: Commands, asset_server: Res<AssetServer>) {
         ..default()
     });
 
-    commands.spawn(Camera3dBundle {
-        transform: Transform::from_xyz(-278.0, 273.0, 800.0),
-        ..default()
-    });
+    commands
+        .spawn(Camera3dBundle {
+            transform: Transform::from_xyz(-278.0, 273.0, 800.0),
+            ..default()
+        })
+        .insert(DeferredPrepass);
 }
 
 fn add_lightmaps_to_meshes(
```

<img width="1280" alt="image"
src="https://github.com/user-attachments/assets/17fd3367-61cc-4c23-b956-e7cfc751af3c">

## Emissive Issue
**The emissive light object appears incorrectly rendered because the
alpha channel of emission is set to 1 in deferred rendering and 0 in
forward rendering, leading to different emissive light result. Could
this be a bug?**

```wgsl
// pbr_deferred_functions.wgsl - pbr_input_from_deferred_gbuffer
let emissive = rgb9e5::rgb9e5_to_vec3_(gbuffer.g);
if ((pbr.material.flags & STANDARD_MATERIAL_FLAGS_UNLIT_BIT) != 0u) {
    pbr.material.base_color = vec4(emissive, 1.0);
    pbr.material.emissive = vec4(vec3(0.0), 1.0);
} else {
    pbr.material.base_color = vec4(pow(base_rough.rgb, vec3(2.2)), 1.0);
    pbr.material.emissive = vec4(emissive, 1.0);
}

// pbr_functions.wgsl - apply_pbr_lighting
emissive_light = emissive_light * mix(1.0, view_bindings::view.exposure, emissive.a);
```

---------

Co-authored-by: JMS55 <47158642+JMS55@users.noreply.github.com>
2024-10-18 23:18:11 +00:00
Alice Cecile
a7e9330af9
Implement WorldQuery for MainWorld and RenderWorld components (#15745)
# Objective

#15320 is a particularly painful breaking change, and the new
`RenderEntity` in particular is very noisy, with a lot of `let entity =
entity.id()` spam.

## Solution

Implement `WorldQuery`, `QueryData` and `ReadOnlyQueryData` for
`RenderEntity` and `WorldEntity`.

These work the same as the `Entity` impls from a user-facing
perspective: they simply return an owned (copied) `Entity` identifier.
This dramatically reduces noise and eases migration.

Under the hood, these impls defer to the implementations for `&T` for
everything other than the "call .id() for the user" bit, as they involve
read-only access to component data. Doing it this way (as opposed to
implementing a custom fetch, as tried in the first commit) dramatically
reduces the maintenance risk of complex unsafe code outside of
`bevy_ecs`.

To make this easier (and encourage users to do this themselves!), I've
made `ReadFetch` and `WriteFetch` slightly more public: they're no
longer `doc(hidden)`. This is a good change, since trying to vendor the
logic is much worse than just deferring to the existing tested impls.

## Testing

I've run a handful of rendering examples (breakout, alien_cake_addict,
auto_exposure, fog_volumes, box_shadow) and nothing broke.

## Follow-up

We should lint for the uses of `&RenderEntity` and `&MainEntity` in
queries: this is just less nice for no reason.

---------

Co-authored-by: Trashtalk217 <trashtalk217@gmail.com>
2024-10-13 20:58:46 +00:00
charlotte
dd812b3e49
Type safe retained render world (#15756)
# Objective

In the Render World, there are a number of collections that are derived
from Main World entities and are used to drive rendering. The most
notable are:
- `VisibleEntities`, which is generated in the `check_visibility` system
and contains visible entities for a view.
- `ExtractedInstances`, which maps entity ids to asset ids.

In the old model, these collections were trivially kept in sync -- any
extracted phase item could look itself up because the render entity id
was guaranteed to always match the corresponding main world id.

After #15320, this became much more complicated, and was leading to a
number of subtle bugs in the Render World. The main rendering systems,
i.e. `queue_material_meshes` and `queue_material2d_meshes`, follow a
similar pattern:

```rust
for visible_entity in visible_entities.iter::<With<Mesh2d>>() {
    let Some(mesh_instance) = render_mesh_instances.get_mut(visible_entity) else {
        continue;
    };
            
    // Look some more stuff up and specialize the pipeline...
            
    let bin_key = Opaque2dBinKey {
        pipeline: pipeline_id,
        draw_function: draw_opaque_2d,
        asset_id: mesh_instance.mesh_asset_id.into(),
        material_bind_group_id: material_2d.get_bind_group_id().0,
    };
    opaque_phase.add(
        bin_key,
        *visible_entity,
        BinnedRenderPhaseType::mesh(mesh_instance.automatic_batching),
    );
}
```

In this case, `visible_entities` and `render_mesh_instances` are both
collections that are created and keyed by Main World entity ids, and so
this lookup happens to work by coincidence. However, there is a major
unintentional bug here: namely, because `visible_entities` is a
collection of Main World ids, the phase item being queued is created
with a Main World id rather than its correct Render World id.

This happens to not break mesh rendering because the render commands
used for drawing meshes do not access the `ItemQuery` parameter, but
demonstrates the confusion that is now possible: our UI phase items are
correctly being queued with Render World ids while our meshes aren't.

Additionally, this makes it very easy and error prone to use the wrong
entity id to look up things like assets. For example, if instead we
ignored visibility checks and queued our meshes via a query, we'd have
to be extra careful to use `&MainEntity` instead of the natural
`Entity`.

## Solution

Make all collections that are derived from Main World data use
`MainEntity` as their key, to ensure type safety and avoid accidentally
looking up data with the wrong entity id:

```rust
pub type MainEntityHashMap<V> = hashbrown::HashMap<MainEntity, V, EntityHash>;
```

Additionally, we make all `PhaseItem` be able to provide both their Main
and Render World ids, to allow render phase implementors maximum
flexibility as to what id should be used to look up data.

You can think of this like tracking at the type level whether something
in the Render World should use it's "primary key", i.e. entity id, or
needs to use a foreign key, i.e. `MainEntity`.

## Testing

##### TODO:

This will require extensive testing to make sure things didn't break!
Additionally, some extraction logic has become more complicated and
needs to be checked for regressions.

## Migration Guide

With the advent of the retained render world, collections that contain
references to `Entity` that are extracted into the render world have
been changed to contain `MainEntity` in order to prevent errors where a
render world entity id is used to look up an item by accident. Custom
rendering code may need to be changed to query for `&MainEntity` in
order to look up the correct item from such a collection. Additionally,
users who implement their own extraction logic for collections of main
world entity should strongly consider extracting into a different
collection that uses `MainEntity` as a key.

Additionally, render phases now require specifying both the `Entity` and
`MainEntity` for a given `PhaseItem`. Custom render phases should ensure
`MainEntity` is available when queuing a phase item.
2024-10-10 18:47:04 +00:00
Tim
3da0ef048e
Remove the Component trait implementation from Handle (#15796)
# Objective

- Closes #15716
- Closes #15718

## Solution

- Replace `Handle<MeshletMesh>` with a new `MeshletMesh3d` component
- As expected there were some random things that needed fixing:
- A couple tests were storing handles just to prevent them from being
dropped I believe, which seems to have been unnecessary in some.
- The `SpriteBundle` still had a `Handle<Image>` field. I've removed
this.
- Tests in `bevy_sprite` incorrectly added a `Handle<Image>` field
outside of the `Sprite` component.
- A few examples were still inserting `Handle`s, switched those to their
corresponding wrappers.
- 2 examples that were still querying for `Handle<Image>` were changed
to query `Sprite`

## Testing

- I've verified that the changed example work now

## Migration Guide

`Handle` can no longer be used as a `Component`. All existing Bevy types
using this pattern have been wrapped in their own semantically
meaningful type. You should do the same for any custom `Handle`
components your project needs.

The `Handle<MeshletMesh>` component is now `MeshletMesh3d`.

The `WithMeshletMesh` type alias has been removed. Use
`With<MeshletMesh3d>` instead.
2024-10-09 21:10:01 +00:00
Kristoffer Søholm
2d1b4939d2
Synchronize removed components with the render world (#15582)
# Objective

Fixes #15560
Fixes (most of) #15570

Currently a lot of examples (and presumably some user code) depend on
toggling certain render features by adding/removing a single component
to an entity, e.g. `SpotLight` to toggle a light. Because of the
retained render world this no longer works: Extract will add any new
components, but when it is removed the entity persists unchanged in the
render world.

## Solution

Add `SyncComponentPlugin<C: Component>` that registers
`SyncToRenderWorld` as a required component for `C`, and adds a
component hook that will clear all components from the render world
entity when `C` is removed. We add this plugin to
`ExtractComponentPlugin` which fixes most instances of the problem. For
custom extraction logic we can manually add `SyncComponentPlugin` for
that component.

We also rename `WorldSyncPlugin` to `SyncWorldPlugin` so we start a
naming convention like all the `Extract` plugins.

In this PR I also fixed a bunch of breakage related to the retained
render world, stemming from old code that assumed that `Entity` would be
the same in both worlds.

I found that using the `RenderEntity` wrapper instead of `Entity` in
data structures when referring to render world entities makes intent
much clearer, so I propose we make this an official pattern.

## Testing

Run examples like

```
cargo run --features pbr_multi_layer_material_textures --example clearcoat
cargo run --example volumetric_fog
```

and see that they work, and that toggles work correctly. But really we
should test every single example, as we might not even have caught all
the breakage yet.

---

## Migration Guide

The retained render world notes should be updated to explain this edge
case and `SyncComponentPlugin`

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Trashtalk217 <trashtalk217@gmail.com>
2024-10-08 22:23:17 +00:00
Trashtalk217
d1bd46d45e
Deprecate get_or_spawn (#15652)
# Objective

After merging retained rendering world #15320, we now have a good way of
creating a link between worlds (*HIYAA intensifies*). This means that
`get_or_spawn` is no longer necessary for that function. Entity should
be opaque as the warning above `get_or_spawn` says. This is also part of
#15459.

I'm deprecating `get_or_spawn_batch` in a different PR in order to keep
the PR small in size.

## Solution

Deprecate `get_or_spawn` and replace it with `get_entity` in most
contexts. If it's possible to query `&RenderEntity`, then the entity is
synced and `render_entity.id()` is initialized in the render world.

## Migration Guide

If you are given an `Entity` and you want to do something with it, use
`Commands.entity(...)` or `World.entity(...)`. If instead you want to
spawn something use `Commands.spawn(...)` or `World.spawn(...)`. If you
are not sure if an entity exists, you can always use `get_entity` and
match on the `Option<...>` that is returned.

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-10-07 16:08:22 +00:00
Tim
461305b3d7
Revert "Have EntityCommands methods consume self for easier chaining" (#15523)
As discussed in #15521

- Partial revert of #14897, reverting the change to the methods to
consume `self`
- The `insert_if` method is kept

The migration guide of #14897 should be removed
Closes #15521

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-10-02 12:47:26 +00:00
Joona Aalto
54006b107b
Migrate meshes and materials to required components (#15524)
# Objective

A big step in the migration to required components: meshes and
materials!

## Solution

As per the [selected
proposal](https://hackmd.io/@bevy/required_components/%2Fj9-PnF-2QKK0on1KQ29UWQ):

- Deprecate `MaterialMesh2dBundle`, `MaterialMeshBundle`, and
`PbrBundle`.
- Add `Mesh2d` and `Mesh3d` components, which wrap a `Handle<Mesh>`.
- Add `MeshMaterial2d<M: Material2d>` and `MeshMaterial3d<M: Material>`,
which wrap a `Handle<M>`.
- Meshes *without* a mesh material should be rendered with a default
material. The existence of a material is determined by
`HasMaterial2d`/`HasMaterial3d`, which is required by
`MeshMaterial2d`/`MeshMaterial3d`. This gets around problems with the
generics.

Previously:

```rust
commands.spawn(MaterialMesh2dBundle {
    mesh: meshes.add(Circle::new(100.0)).into(),
    material: materials.add(Color::srgb(7.5, 0.0, 7.5)),
    transform: Transform::from_translation(Vec3::new(-200., 0., 0.)),
    ..default()
});
```

Now:

```rust
commands.spawn((
    Mesh2d(meshes.add(Circle::new(100.0))),
    MeshMaterial2d(materials.add(Color::srgb(7.5, 0.0, 7.5))),
    Transform::from_translation(Vec3::new(-200., 0., 0.)),
));
```

If the mesh material is missing, previously nothing was rendered. Now,
it renders a white default `ColorMaterial` in 2D and a
`StandardMaterial` in 3D (this can be overridden). Below, only every
other entity has a material:

![Näyttökuva 2024-09-29
181746](https://github.com/user-attachments/assets/5c8be029-d2fe-4b8c-ae89-17a72ff82c9a)

![Näyttökuva 2024-09-29
181918](https://github.com/user-attachments/assets/58adbc55-5a1e-4c7d-a2c7-ed456227b909)

Why white? This is still open for discussion, but I think white makes
sense for a *default* material, while *invalid* asset handles pointing
to nothing should have something like a pink material to indicate that
something is broken (I don't handle that in this PR yet). This is kind
of a mix of Godot and Unity: Godot just renders a white material for
non-existent materials, while Unity renders nothing when no materials
exist, but renders pink for invalid materials. I can also change the
default material to pink if that is preferable though.

## Testing

I ran some 2D and 3D examples to test if anything changed visually. I
have not tested all examples or features yet however. If anyone wants to
test more extensively, it would be appreciated!

## Implementation Notes

- The relationship between `bevy_render` and `bevy_pbr` is weird here.
`bevy_render` needs `Mesh3d` for its own systems, but `bevy_pbr` has all
of the material logic, and `bevy_render` doesn't depend on it. I feel
like the two crates should be refactored in some way, but I think that's
out of scope for this PR.
- I didn't migrate meshlets to required components yet. That can
probably be done in a follow-up, as this is already a huge PR.
- It is becoming increasingly clear to me that we really, *really* want
to disallow raw asset handles as components. They caused me a *ton* of
headache here already, and it took me a long time to find every place
that queried for them or inserted them directly on entities, since there
were no compiler errors for it. If we don't remove the `Component`
derive, I expect raw asset handles to be a *huge* footgun for users as
we transition to wrapper components, especially as handles as components
have been the norm so far. I personally consider this to be a blocker
for 0.15: we need to migrate to wrapper components for asset handles
everywhere, and remove the `Component` derive. Also see
https://github.com/bevyengine/bevy/issues/14124.

---

## Migration Guide

Asset handles for meshes and mesh materials must now be wrapped in the
`Mesh2d` and `MeshMaterial2d` or `Mesh3d` and `MeshMaterial3d`
components for 2D and 3D respectively. Raw handles as components no
longer render meshes.

Additionally, `MaterialMesh2dBundle`, `MaterialMeshBundle`, and
`PbrBundle` have been deprecated. Instead, use the mesh and material
components directly.

Previously:

```rust
commands.spawn(MaterialMesh2dBundle {
    mesh: meshes.add(Circle::new(100.0)).into(),
    material: materials.add(Color::srgb(7.5, 0.0, 7.5)),
    transform: Transform::from_translation(Vec3::new(-200., 0., 0.)),
    ..default()
});
```

Now:

```rust
commands.spawn((
    Mesh2d(meshes.add(Circle::new(100.0))),
    MeshMaterial2d(materials.add(Color::srgb(7.5, 0.0, 7.5))),
    Transform::from_translation(Vec3::new(-200., 0., 0.)),
));
```

If the mesh material is missing, a white default material is now used.
Previously, nothing was rendered if the material was missing.

The `WithMesh2d` and `WithMesh3d` query filter type aliases have also
been removed. Simply use `With<Mesh2d>` or `With<Mesh3d>`.

---------

Co-authored-by: Tim Blackbird <justthecooldude@gmail.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2024-10-01 21:33:17 +00:00
Trashtalk217
56f8e526dd
The Cooler 'Retain Rendering World' (#15320)
- Adopted from #14449
- Still fixes #12144.

## Migration Guide

The retained render world is a complex change: migrating might take one
of a few different forms depending on the patterns you're using.

For every example, we specify in which world the code is run. Most of
the changes affect render world code, so for the average Bevy user who's
using Bevy's high-level rendering APIs, these changes are unlikely to
affect your code.

### Spawning entities in the render world

Previously, if you spawned an entity with `world.spawn(...)`,
`commands.spawn(...)` or some other method in the rendering world, it
would be despawned at the end of each frame. In 0.15, this is no longer
the case and so your old code could leak entities. This can be mitigated
by either re-architecting your code to no longer continuously spawn
entities (like you're used to in the main world), or by adding the
`bevy_render::world_sync::TemporaryRenderEntity` component to the entity
you're spawning. Entities tagged with `TemporaryRenderEntity` will be
removed at the end of each frame (like before).

### Extract components with `ExtractComponentPlugin`

```
// main world
app.add_plugins(ExtractComponentPlugin::<ComponentToExtract>::default());
```

`ExtractComponentPlugin` has been changed to only work with synced
entities. Entities are automatically synced if `ComponentToExtract` is
added to them. However, entities are not "unsynced" if any given
`ComponentToExtract` is removed, because an entity may have multiple
components to extract. This would cause the other components to no
longer get extracted because the entity is not synced.

So be careful when only removing extracted components from entities in
the render world, because it might leave an entity behind in the render
world. The solution here is to avoid only removing extracted components
and instead despawn the entire entity.

### Manual extraction using `Extract<Query<(Entity, ...)>>`

```rust
// in render world, inspired by bevy_pbr/src/cluster/mod.rs
pub fn extract_clusters(
    mut commands: Commands,
    views: Extract<Query<(Entity, &Clusters, &Camera)>>,
) {
    for (entity, clusters, camera) in &views {
        // some code
        commands.get_or_spawn(entity).insert(...);
    }
}
```
One of the primary consequences of the retained rendering world is that
there's no longer a one-to-one mapping from entity IDs in the main world
to entity IDs in the render world. Unlike in Bevy 0.14, Entity 42 in the
main world doesn't necessarily map to entity 42 in the render world.

Previous code which called `get_or_spawn(main_world_entity)` in the
render world (`Extract<Query<(Entity, ...)>>` returns main world
entities). Instead, you should use `&RenderEntity` and
`render_entity.id()` to get the correct entity in the render world. Note
that this entity does need to be synced first in order to have a
`RenderEntity`.

When performing manual abstraction, this won't happen automatically
(like with `ExtractComponentPlugin`) so add a `SyncToRenderWorld` marker
component to the entities you want to extract.

This results in the following code:
```rust
// in render world, inspired by bevy_pbr/src/cluster/mod.rs
pub fn extract_clusters(
    mut commands: Commands,
    views: Extract<Query<(&RenderEntity, &Clusters, &Camera)>>,
) {
    for (render_entity, clusters, camera) in &views {
        // some code
        commands.get_or_spawn(render_entity.id()).insert(...);
    }
}

// in main world, when spawning
world.spawn(Clusters::default(), Camera::default(), SyncToRenderWorld)
```

### Looking up `Entity` ids in the render world

As previously stated, there's now no correspondence between main world
and render world `Entity` identifiers.

Querying for `Entity` in the render world will return the `Entity` id in
the render world: query for `MainEntity` (and use its `id()` method) to
get the corresponding entity in the main world.

This is also a good way to tell the difference between synced and
unsynced entities in the render world, because unsynced entities won't
have a `MainEntity` component.

---------

Co-authored-by: re0312 <re0312@outlook.com>
Co-authored-by: re0312 <45868716+re0312@users.noreply.github.com>
Co-authored-by: Periwink <charlesbour@gmail.com>
Co-authored-by: Anselmo Sampietro <ans.samp@gmail.com>
Co-authored-by: Emerson Coskey <56370779+ecoskey@users.noreply.github.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Christian Hughes <9044780+ItsDoot@users.noreply.github.com>
2024-09-30 18:51:43 +00:00
Zachary Harrold
d70595b667
Add core and alloc over std Lints (#15281)
# Objective

- Fixes #6370
- Closes #6581

## Solution

- Added the following lints to the workspace:
  - `std_instead_of_core`
  - `std_instead_of_alloc`
  - `alloc_instead_of_core`
- Used `cargo +nightly fmt` with [item level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A)
to split all `use` statements into single items.
- Used `cargo clippy --workspace --all-targets --all-features --fix
--allow-dirty` to _attempt_ to resolve the new linting issues, and
intervened where the lint was unable to resolve the issue automatically
(usually due to needing an `extern crate alloc;` statement in a crate
root).
- Manually removed certain uses of `std` where negative feature gating
prevented `--all-features` from finding the offending uses.
- Used `cargo +nightly fmt` with [crate level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A)
to re-merge all `use` statements matching Bevy's previous styling.
- Manually fixed cases where the `fmt` tool could not re-merge `use`
statements due to conditional compilation attributes.

## Testing

- Ran CI locally

## Migration Guide

The MSRV is now 1.81. Please update to this version or higher.

## Notes

- This is a _massive_ change to try and push through, which is why I've
outlined the semi-automatic steps I used to create this PR, in case this
fails and someone else tries again in the future.
- Making this change has no impact on user code, but does mean Bevy
contributors will be warned to use `core` and `alloc` instead of `std`
where possible.
- This lint is a critical first step towards investigating `no_std`
options for Bevy.

---------

Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
Clar Fon
efda7f3f9c
Simpler lint fixes: makes ci lints work but disables a lint for now (#15376)
Takes the first two commits from #15375 and adds suggestions from this
comment:
https://github.com/bevyengine/bevy/pull/15375#issuecomment-2366968300

See #15375 for more reasoning/motivation.

## Rebasing (rerunning)

```rust
git switch simpler-lint-fixes
git reset --hard main
cargo fmt --all -- --unstable-features --config normalize_comments=true,imports_granularity=Crate
cargo fmt --all
git add --update
git commit --message "rustfmt"
cargo clippy --workspace --all-targets --all-features --fix
cargo fmt --all -- --unstable-features --config normalize_comments=true,imports_granularity=Crate
cargo fmt --all
git add --update
git commit --message "clippy"
git cherry-pick e6c0b94f6795222310fb812fa5c4512661fc7887
```
2024-09-24 11:42:59 +00:00
Allen Pocket
d93b78a66e
Remove unnecessary muts in RenderSet::QueueMeshes (#14953)
# Objective

Fixes #14952
2024-08-28 11:38:38 +00:00
Shane
484721be80
Have EntityCommands methods consume self for easier chaining (#14897)
# Objective

Fixes #14883

## Solution

Pretty simple update to `EntityCommands` methods to consume `self` and
return it rather than taking `&mut self`. The things probably worth
noting:

* I added `#[allow(clippy::should_implement_trait)]` to the `add` method
because it causes a linting conflict with `std::ops::Add`.
* `despawn` and `log_components` now return `Self`. I'm not sure if
that's exactly the desired behavior so I'm happy to adjust if that seems
wrong.

## Testing

Tested with `cargo run -p ci`. I think that should be sufficient to call
things good.

## Migration Guide

The most likely migration needed is changing code from this:

```
        let mut entity = commands.get_or_spawn(entity);

        if depth_prepass {
            entity.insert(DepthPrepass);
        }
        if normal_prepass {
            entity.insert(NormalPrepass);
        }
        if motion_vector_prepass {
            entity.insert(MotionVectorPrepass);
        }
        if deferred_prepass {
            entity.insert(DeferredPrepass);
        }
```

to this:

```
        let mut entity = commands.get_or_spawn(entity);

        if depth_prepass {
            entity = entity.insert(DepthPrepass);
        }
        if normal_prepass {
            entity = entity.insert(NormalPrepass);
        }
        if motion_vector_prepass {
            entity = entity.insert(MotionVectorPrepass);
        }
        if deferred_prepass {
            entity.insert(DeferredPrepass);
        }
```

as can be seen in several of the example code updates here. There will
probably also be instances where mutable `EntityCommands` vars no longer
need to be mutable.
2024-08-26 18:24:59 +00:00
JMS55
6cc96f4c1f
Meshlet software raster + start of cleanup (#14623)
# Objective
- Faster meshlet rasterization path for small triangles
- Avoid having to allocate and write out a triangle buffer
- Refactor gpu_scene.rs

## Solution
- Replace the 32bit visbuffer texture with a 64bit visbuffer buffer,
where the left 32 bits encode depth, and the right 32 bits encode the
existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga
doesn't support atomic ops on textures yet.
- Instead of writing out a buffer of packed cluster + triangle IDs (per
triangle) to raster, the culling pass now writes out a buffer of just
cluster IDs (per cluster, so less memory allocated, cheaper to write
out).
  - Clusters for software raster are allocated from the left side
- Clusters for hardware raster are allocated in the same buffer, from
the right side
- The buffer size is fixed at MeshletPlugin build time, and should be
set to a reasonable value for your scene (no warning on overflow, and no
good way to determine what value you need outside of renderdoc - I plan
to fix this in a future PR adding a meshlet stats overlay)
- Currently I don't have a heuristic for software vs hardware raster
selection for each cluster. The existing code is just a placeholder. I
need to profile on a release scene and come up with a heuristic,
probably in a future PR.
- The culling shader is getting pretty hard to follow at this point, but
I don't want to spend time improving it as the entire shader/pass is
getting rewritten/replaced in the near future.
- Software raster is a compute workgroup per-cluster. Each workgroup
loads and transforms the <=64 vertices of the cluster, and then
rasterizes the <=64 triangles of the cluster.
- Two variants are implemented: Scanline for clusters with any larger
triangles (still smaller than hardware is good at), and brute-force for
very very tiny triangles
- Once the shader determines that a pixel should be filled in, it does
an atomicMax() on the visbuffer to store the results, copying how Nanite
works
- On devices with a low max workgroups per dispatch limit, an extra
compute pass is inserted before software raster to convert from a 1d to
2d dispatch (I don't think 3d would ever be necessary).
- I haven't implemented the top-left rule or subpixel precision yet, I'm
leaving that for a future PR since I get usable results without it for
now
- Resources used:
https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters
6-8 of
https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index
- Hardware raster now spawns 64*3 vertex invocations per meshlet,
instead of the actual meshlet vertex count. Extra invocations just
early-exit.
- While this is slower than the existing system, hardware draws should
be rare now that software raster is usable, and it saves a ton of memory
using the unified cluster ID buffer. This would be fixed if wgpu had
support for mesh shaders.
- Instead of writing to a color+depth attachment, the hardware raster
pass also does the same atomic visbuffer writes that software raster
uses.
- We have to bind a dummy render target anyways, as wgpu doesn't
currently support render passes without any attachments
- Material IDs are no longer written out during the main rasterization
passes.
- If we had async compute queues, we could overlap the software and
hardware raster passes.
- New material and depth resolve passes run at the end of the visbuffer
node, and write out view depth and material ID depth textures

### Misc changes
- Fixed cluster culling importing, but never actually using the previous
view uniforms when doing occlusion culling
- Fixed incorrectly adding the LOD error twice when building the meshlet
mesh
- Splitup gpu_scene module into meshlet_mesh_manager, instance_manager,
and resource_manager
- resource_manager is still too complex and inefficient (extract and
prepare are way too expensive). I plan on improving this in a future PR,
but for now ResourceManager is mostly a 1:1 port of the leftover
MeshletGpuScene bits.
- Material draw passes have been renamed to the more accurate material
shade pass, as well as some other misc renaming (in the future, these
will be compute shaders even, and not actual draw calls)

---

## Migration Guide
- TBD (ask me at the end of the release for meshlet changes as a whole)

---------

Co-authored-by: vero <email@atlasdostal.com>
2024-08-26 17:54:34 +00:00
Patrick Walton
d235d41af1
Fix the example regressions from packed growable buffers. (#14375)
The "uberbuffers" PR #14257 caused some examples to fail intermittently
for different reasons:

1. `morph_targets` could fail because vertex displacements for morph
targets are keyed off the vertex index. With buffer packing, the vertex
index can vary based on the position in the buffer, which caused the
morph targets to be potentially incorrect. The solution is to include
the first vertex index with the `MeshUniform` (and `MeshInputUniform` if
GPU preprocessing is in use), so that the shader can calculate the true
vertex index before performing the morph operation. This results in
wasted space in `MeshUniform`, which is unfortunate, but we'll soon be
filling in the padding with the ID of the material when bindless
textures land, so this had to happen sooner or later anyhow.

Including the vertex index in the `MeshInputUniform` caused an ordering
problem. The `MeshInputUniform` was created during the extraction phase,
before the allocations occurred, so the extraction logic didn't know
where the mesh vertex data was going to end up. The solution is to move
the `MeshInputUniform` creation (the `collect_meshes_for_gpu_building`
system) to after the allocations phase. This should be better for
parallelism anyhow, because it allows the extraction phase to finish
quicker. It's also something we'll have to do for bindless in any event.

2. The `lines` and `fog_volumes` examples could fail because their
custom drawing nodes weren't updated to supply the vertex and index
offsets in their `draw_indexed` and `draw` calls. This commit fixes this
oversight.

Fixes #14366.
2024-07-22 18:55:51 +00:00
charlotte
03fd1b46ef
Move Msaa to component (#14273)
Switches `Msaa` from being a globally configured resource to a per
camera view component.

Closes #7194

# Objective

Allow individual views to describe their own MSAA settings. For example,
when rendering to different windows or to different parts of the same
view.

## Solution

Make `Msaa` a component that is required on all camera bundles.

## Testing

Ran a variety of examples to ensure that nothing broke.

TODO:
- [ ] Make sure android still works per previous comment in
`extract_windows`.

---

## Migration Guide

`Msaa` is no longer configured as a global resource, and should be
specified on each spawned camera if a non-default setting is desired.

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-07-22 18:28:23 +00:00
Patrick Walton
bc34216929
Pack multiple vertex and index arrays together into growable buffers. (#14257)
This commit uses the [`offset-allocator`] crate to combine vertex and
index arrays from different meshes into single buffers. Since the
primary source of `wgpu` overhead is from validation and synchronization
when switching buffers, this significantly improves Bevy's rendering
performance on many scenes.

This patch is a more flexible version of #13218, which also used slabs.
Unlike #13218, which used slabs of a fixed size, this commit implements
slabs that start small and can grow. In addition to reducing memory
usage, supporting slab growth reduces the number of vertex and index
buffer switches that need to happen during rendering, leading to
improved performance. To prevent pathological fragmentation behavior,
slabs are capped to a maximum size, and mesh arrays that are too large
get their own dedicated slabs.

As an additional improvement over #13218, this commit allows the
application to customize all allocator heuristics. The
`MeshAllocatorSettings` resource contains values that adjust the minimum
and maximum slab sizes, the cutoff point at which meshes get their own
dedicated slabs, and the rate at which slabs grow. Hopefully-sensible
defaults have been chosen for each value.

Unfortunately, WebGL 2 doesn't support the *base vertex* feature, which
is necessary to pack vertex arrays from different meshes into the same
buffer. `wgpu` represents this restriction as the downlevel flag
`BASE_VERTEX`. This patch detects that bit and ensures that all vertex
buffers get dedicated slabs on that platform. Even on WebGL 2, though,
we can combine all *index* arrays into single buffers to reduce buffer
changes, and we do so.

The following measurements are on Bistro:

Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup):
![Screenshot 2024-07-09
163521](https://github.com/bevyengine/bevy/assets/157897/5d83c824-c0ee-434c-bbaf-218ff7212c48)

Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup):
![Screenshot 2024-07-09
163559](https://github.com/bevyengine/bevy/assets/157897/d94e2273-c3a0-496a-9f88-20d394129610)

Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup):
![Screenshot 2024-07-09
163536](https://github.com/bevyengine/bevy/assets/157897/e4ef6e48-d60e-44ae-9a71-b9a731c99d9a)

## Migration Guide

### Changed

* Vertex and index buffers for meshes may now be packed alongside other
buffers, for performance.
* `GpuMesh` has been renamed to `RenderMesh`, to reflect the fact that
it no longer directly stores handles to GPU objects.
* Because meshes no longer have their own vertex and index buffers, the
responsibility for the buffers has moved from `GpuMesh` (now called
`RenderMesh`) to the `MeshAllocator` resource. To access the vertex data
for a mesh, use `MeshAllocator::mesh_vertex_slice`. To access the index
data for a mesh, use `MeshAllocator::mesh_index_slice`.

[`offset-allocator`]: https://github.com/pcwalton/offset-allocator
2024-07-16 20:33:15 +00:00
re0312
3b23aa0864
Fix prepass batch (#13943)
# Objective

- After #11804 , The queue_prepass_material_meshes function is now
executed in parallel with other queue_* systems. This optimization
introduced a potential issue where mesh_instance.should_batch() could
return false in queue_prepass_material_meshes due to an unset
material_bind_group_id.
2024-07-14 19:35:36 +00:00
Patrick Walton
44db8b7fac
Allow phase items not associated with meshes to be binned. (#14029)
As reported in #14004, many third-party plugins, such as Hanabi, enqueue
entities that don't have meshes into render phases. However, the
introduction of indirect mode added a dependency on mesh-specific data,
breaking this workflow. This is because GPU preprocessing requires that
the render phases manage indirect draw parameters, which don't apply to
objects that aren't meshes. The existing code skips over binned entities
that don't have indirect draw parameters, which causes the rendering to
be skipped for such objects.

To support this workflow, this commit adds a new field,
`non_mesh_items`, to `BinnedRenderPhase`. This field contains a simple
list of (bin key, entity) pairs. After drawing batchable and unbatchable
objects, the non-mesh items are drawn one after another. Bevy itself
doesn't enqueue any items into this list; it exists solely for the
application and/or plugins to use.

Additionally, this commit switches the asset ID in the standard bin keys
to be an untyped asset ID rather than that of a mesh. This allows more
flexibility, allowing bins to be keyed off any type of asset.

This patch adds a new example, `custom_phase_item`, which simultaneously
serves to demonstrate how to use this new feature and to act as a
regression test so this doesn't break again.

Fixes #14004.

## Changelog

### Added

* `BinnedRenderPhase` now contains a `non_mesh_items` field for plugins
to add custom items to.
2024-06-27 16:13:03 +00:00
François Mockers
19d078c609
don't crash without features bevy_pbr, ktx2, zstd (#14020)
# Objective

- Fixes #13728 

## Solution

- add a new feature `smaa_luts`. if enables, it also enables `ktx2` and
`zstd`. if not, it doesn't load the files but use placeholders instead
- adds all the resources needed in the same places that system that uses
them are added.
2024-06-26 03:08:23 +00:00
JMS55
d8b45ca136
Fix MeshletMesh material system ordering (#14016)
# Objective
- Fixes #13811 (probably, I lost my test code...)

## Solution
- Turns out that Queue and PrepareAssets are _not_ ordered. We should
probably either rethink our system sets (again), or improve the
documentation here. For reference, I've included the current ordering
below.
- The `prepare_meshlet_meshes_X` systems need to run after
`prepare_assets::<PreparedMaterial<M>>`, and have also been moved to
QueueMeshes.

```rust
schedule.configure_sets(
    (
        ExtractCommands,
        ManageViews,
        Queue,
        PhaseSort,
        Prepare,
        Render,
        Cleanup,
    )
        .chain(),
);

schedule.configure_sets((ExtractCommands, PrepareAssets, Prepare).chain());
schedule.configure_sets(QueueMeshes.in_set(Queue).after(prepare_assets::<GpuMesh>));
schedule.configure_sets(
    (PrepareResources, PrepareResourcesFlush, PrepareBindGroups)
        .chain()
        .in_set(Prepare),
);
```

## Testing
- Ambiguity checker to make sure I don't have ambiguous system ordering
2024-06-25 18:17:52 +00:00
Ricky Taylor
9b9d3d81cb
Normalise matrix naming (#13489)
# Objective
- Fixes #10909
- Fixes #8492

## Solution
- Name all matrices `x_from_y`, for example `world_from_view`.

## Testing
- I've tested most of the 3D examples. The `lighting` example
particularly should hit a lot of the changes and appears to run fine.

---

## Changelog
- Renamed matrices across the engine to follow a `y_from_x` naming,
making the space conversion more obvious.

## Migration Guide
- `Frustum`'s `from_view_projection`, `from_view_projection_custom_far`
and `from_view_projection_no_far` were renamed to
`from_clip_from_world`, `from_clip_from_world_custom_far` and
`from_clip_from_world_no_far`.
- `ComputedCameraValues::projection_matrix` was renamed to
`clip_from_view`.
- `CameraProjection::get_projection_matrix` was renamed to
`get_clip_from_view` (this affects implementations on `Projection`,
`PerspectiveProjection` and `OrthographicProjection`).
- `ViewRangefinder3d::from_view_matrix` was renamed to
`from_world_from_view`.
- `PreviousViewData`'s members were renamed to `view_from_world` and
`clip_from_world`.
- `ExtractedView`'s `projection`, `transform` and `view_projection` were
renamed to `clip_from_view`, `world_from_view` and `clip_from_world`.
- `ViewUniform`'s `view_proj`, `unjittered_view_proj`,
`inverse_view_proj`, `view`, `inverse_view`, `projection` and
`inverse_projection` were renamed to `clip_from_world`,
`unjittered_clip_from_world`, `world_from_clip`, `world_from_view`,
`view_from_world`, `clip_from_view` and `view_from_clip`.
- `GpuDirectionalCascade::view_projection` was renamed to
`clip_from_world`.
- `MeshTransforms`' `transform` and `previous_transform` were renamed to
`world_from_local` and `previous_world_from_local`.
- `MeshUniform`'s `transform`, `previous_transform`,
`inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed
to `world_from_local`, `previous_world_from_local`,
`local_from_world_transpose_a` and `local_from_world_transpose_b` (the
`Mesh` type in WGSL mirrors this, however `transform` and
`previous_transform` were named `model` and `previous_model`).
- `Mesh2dTransforms::transform` was renamed to `world_from_local`.
- `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and
`inverse_transpose_model_b` were renamed to `world_from_local`,
`local_from_world_transpose_a` and `local_from_world_transpose_b` (the
`Mesh2d` type in WGSL mirrors this).
- In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and
`get_previous_model_matrix` were renamed to `get_world_from_local` and
`get_previous_world_from_local`.
- In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed
to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
Aevyrie
b45786df41
Add Skybox Motion Vectors (#13617)
# Objective

- Add motion vector support to the skybox
- This fixes the last remaining "gap" to complete the motion blur
feature

## Solution

- Add a pipeline for the skybox to write motion vectors to the prepass

## Testing

- Used examples to test motion vectors using motion blur


https://github.com/bevyengine/bevy/assets/2632925/74c0778a-7e77-4e68-8111-05791e4bfdd2

---------

Co-authored-by: Patrick Walton <pcwalton@mimiga.net>
2024-06-02 16:09:28 +00:00
Patrick Walton
be053b1d7c
Implement motion vectors and TAA for skinned meshes and meshes with morph targets. (#13572)
This is a revamped equivalent to #9902, though it shares none of the
code. It handles all special cases that I've tested correctly.

The overall technique consists of double-buffering the joint matrix and
morph weights buffers, as most of the previous attempts to solve this
problem did. The process is generally straightforward. Note that, to
avoid regressing the ability of mesh extraction, skin extraction, and
morph target extraction to run in parallel, I had to add a new system to
rendering, `set_mesh_motion_vector_flags`. The comment there explains
the details; it generally runs very quickly.

I've tested this with modified versions of the `animated_fox`,
`morph_targets`, and `many_foxes` examples that add TAA, and the patch
works. To avoid bloating those examples, I didn't add switches for TAA
to them.

Addresses points (1) and (2) of #8423.

## Changelog

### Fixed

* Motion vectors, and therefore TAA, are now supported for meshes with
skins and/or morph targets.
2024-05-31 17:02:28 +00:00
Patrick Walton
9da0b2a0ec
Make render phases render world resources instead of components. (#13277)
This commit makes us stop using the render world ECS for
`BinnedRenderPhase` and `SortedRenderPhase` and instead use resources
with `EntityHashMap`s inside. There are three reasons to do this:

1. We can use `clear()` to clear out the render phase collections
instead of recreating the components from scratch, allowing us to reuse
allocations.

2. This is a prerequisite for retained bins, because components can't be
retained from frame to frame in the render world, but resources can.

3. We want to move away from storing anything in components in the
render world ECS, and this is a step in that direction.

This patch results in a small performance benefit, due to point (1)
above.

## Changelog

### Changed

* The `BinnedRenderPhase` and `SortedRenderPhase` render world
components have been replaced with `ViewBinnedRenderPhases` and
`ViewSortedRenderPhases` resources.

## Migration Guide

* The `BinnedRenderPhase` and `SortedRenderPhase` render world
components have been replaced with `ViewBinnedRenderPhases` and
`ViewSortedRenderPhases` resources. Instead of querying for the
components, look the camera entity up in the
`ViewBinnedRenderPhases`/`ViewSortedRenderPhases` tables.
2024-05-21 18:23:04 +00:00
Csányi István
f91fd322b7
Skip redundant mesh_position_local_to_world call in vertex prepass shader (#13158)
# Objective

Optimize vertex prepass shader maybe?
Make it consistent with the base vertex shader

## Solution

`mesh_position_local_to_clip` just calls `mesh_position_local_to_world`
and then `position_world_to_clip`
since `out.world_position` is getting calculated anyway a few lines
below, just move it up and use it's output to calculate `out.position`.

It is the same as in the base vertex shader (`mesh.wgsl`).

Note: I have no idea if there is a reason that it was this way. I'm not
an expert, just noticed this inconsistency while messing with custom
shaders.
2024-05-14 16:31:58 +00:00
Johannes Hackel
3f5a090b1b
Add UV channel selection to StandardMaterial (#13200)
# Objective

- The StandardMaterial always uses ATTRIBUTE_UV_0 for each texture
except lightmap. This is not flexible enough for a lot of gltf Files.
- Fixes #12496
- Fixes #13086
- Fixes #13122
- Closes #13153

## Solution

- The StandardMaterial gets extended for each texture by an UvChannel
enum. It defaults to Uv0 but can also be set to Uv1.
- The gltf loader now handles the texcoord information. If the texcoord
is not supported it creates a warning.
- It uses StandardMaterial shader defs to define which attribute to use.

## Testing

This fixes #12496 for example:

![wall_fixed](https://github.com/bevyengine/bevy/assets/688816/bc37c9e1-72ba-4e59-b092-5ee10dade603)

For testing of all kind of textures I used the TextureTransformMultiTest
from
https://github.com/KhronosGroup/glTF-Sample-Assets/tree/main/Models/TextureTransformMultiTest
Its purpose is to test multiple texture transfroms but it is also a good
test for different texcoords.
It also shows the issue with emission #13133.

Before:

![TextureTransformMultiTest_main](https://github.com/bevyengine/bevy/assets/688816/aa701d04-5a3f-4df1-a65f-fc770ab6f4ab)

After:

![TextureTransformMultiTest_texcoord](https://github.com/bevyengine/bevy/assets/688816/c3f91943-b830-4068-990f-e4f2c97771ee)
2024-05-13 18:23:09 +00:00
JMS55
e1a0da0fa6
Meshlet LOD-compatible two-pass occlusion culling (#12898)
Keeping track of explicit visibility per cluster between frames does not
work with LODs, and leads to worse culling (using the final depth buffer
from the previous frame is more accurate).

Instead, we need to generate a second depth pyramid after the second
raster pass, and then use that in the first culling pass in the next
frame to test if a cluster would have been visible last frame or not.

As part of these changes, the write_index_buffer pass has been folded
into the culling pass for a large performance gain, and to avoid
tracking a lot of extra state that would be needed between passes.

Prepass previous model/view stuff was adapted to work with meshlets as
well.

Also fixed a bug with materials, and other misc improvements.

---------

Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: atlas dostal <rodol@rivalrebels.com>
Co-authored-by: vero <email@atlasdostal.com>
Co-authored-by: Patrick Walton <pcwalton@mimiga.net>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
2024-04-28 05:30:20 +00:00
JMS55
17633c1f75
Remove unused push constants (#13076)
The shader code was removed in #11280, but we never cleaned up the rust
code.
2024-04-23 21:43:46 +00:00
Patrick Walton
1141e731ff
Implement alpha to coverage (A2C) support. (#12970)
[Alpha to coverage] (A2C) replaces alpha blending with a
hardware-specific multisample coverage mask when multisample
antialiasing is in use. It's a simple form of [order-independent
transparency] that relies on MSAA. ["Anti-aliased Alpha Test: The
Esoteric Alpha To Coverage"] is a good summary of the motivation for and
best practices relating to A2C.

This commit implements alpha to coverage support as a new variant for
`AlphaMode`. You can supply `AlphaMode::AlphaToCoverage` as the
`alpha_mode` field in `StandardMaterial` to use it. When in use, the
standard material shader automatically applies the texture filtering
method from ["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"].
Objects with alpha-to-coverage materials are binned in the opaque pass,
as they're fully order-independent.

The `transparency_3d` example has been updated to feature an object with
alpha to coverage. Happily, the example was already using MSAA.

This is part of #2223, as far as I can tell.

[Alpha to coverage]: https://en.wikipedia.org/wiki/Alpha_to_coverage

[order-independent transparency]:
https://en.wikipedia.org/wiki/Order-independent_transparency

["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"]:
https://bgolus.medium.com/anti-aliased-alpha-test-the-esoteric-alpha-to-coverage-8b177335ae4f

---

## Changelog

### Added

* The `AlphaMode` enum now supports `AlphaToCoverage`, to provide
limited order-independent transparency when multisample antialiasing is
in use.
2024-04-15 20:37:52 +00:00
Patrick Walton
5caf085dac
Divide the single VisibleEntities list into separate lists for 2D meshes, 3D meshes, lights, and UI elements, for performance. (#12582)
This commit splits `VisibleEntities::entities` into four separate lists:
one for lights, one for 2D meshes, one for 3D meshes, and one for UI
elements. This allows `queue_material_meshes` and similar methods to
avoid examining entities that are obviously irrelevant. In particular,
this separation helps scenes with many skinned meshes, as the individual
bones are considered visible entities but have no rendered appearance.

Internally, `VisibleEntities::entities` is a `HashMap` from the `TypeId`
representing a `QueryFilter` to the appropriate `Entity` list. I had to
do this because `VisibleEntities` is located within an upstream crate
from the crates that provide lights (`bevy_pbr`) and 2D meshes
(`bevy_sprite`). As an added benefit, this setup allows apps to provide
their own types of renderable components, by simply adding a specialized
`check_visibility` to the schedule.

This provides a 16.23% end-to-end speedup on `many_foxes` with 10,000
foxes (24.06 ms/frame to 20.70 ms/frame).

## Migration guide

* `check_visibility` and `VisibleEntities` now store the four types of
renderable entities--2D meshes, 3D meshes, lights, and UI
elements--separately. If your custom rendering code examines
`VisibleEntities`, it will now need to specify which type of entity it's
interested in using the `WithMesh2d`, `WithMesh`, `WithLight`, and
`WithNode` types respectively. If your app introduces a new type of
renderable entity, you'll need to add an explicit call to
`check_visibility` to the schedule to accommodate your new component or
components.

## Analysis

`many_foxes`, 10,000 foxes: `main`:
![Screenshot 2024-03-31
114444](https://github.com/bevyengine/bevy/assets/157897/16ecb2ff-6e04-46c0-a4b0-b2fde2084bad)

`many_foxes`, 10,000 foxes, this branch:
![Screenshot 2024-03-31
114256](https://github.com/bevyengine/bevy/assets/157897/94dedae4-bd00-45b2-9aaf-dfc237004ddb)

`queue_material_meshes` (yellow = this branch, red = `main`):
![Screenshot 2024-03-31
114637](https://github.com/bevyengine/bevy/assets/157897/f90912bd-45bd-42c4-bd74-57d98a0f036e)

`queue_shadows` (yellow = this branch, red = `main`):
![Screenshot 2024-03-31
114607](https://github.com/bevyengine/bevy/assets/157897/6ce693e3-20c0-4234-8ec9-a6f191299e2d)
2024-04-11 20:33:20 +00:00
Patrick Walton
11817f4ba4
Generate MeshUniforms on the GPU via compute shader where available. (#12773)
Currently, `MeshUniform`s are rather large: 160 bytes. They're also
somewhat expensive to compute, because they involve taking the inverse
of a 3x4 matrix. Finally, if a mesh is present in multiple views, that
mesh will have a separate `MeshUniform` for each and every view, which
is wasteful.

This commit fixes these issues by introducing the concept of a *mesh
input uniform* and adding a *mesh uniform building* compute shader pass.
The `MeshInputUniform` is simply the minimum amount of data needed for
the GPU to compute the full `MeshUniform`. Most of this data is just the
transform and is therefore only 64 bytes. `MeshInputUniform`s are
computed during the *extraction* phase, much like skins are today, in
order to avoid needlessly copying transforms around on CPU. (In fact,
the render app has been changed to only store the translation of each
mesh; it no longer cares about any other part of the transform, which is
stored only on the GPU and the main world.) Before rendering, the
`build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the
full `MeshUniform`.

The mesh uniform building pass does the following, all on GPU:

1. Copy the appropriate fields of the `MeshInputUniform` to the
`MeshUniform` slot. If a single mesh is present in multiple views, this
effectively duplicates it into each view.

2. Compute the inverse transpose of the model transform, used for
transforming normals.

3. If applicable, copy the mesh's transform from the previous frame for
TAA. To support this, we double-buffer the `MeshInputUniform`s over two
frames and swap the buffers each frame. The `MeshInputUniform`s for the
current frame contain the index of that mesh's `MeshInputUniform` for
the previous frame.

This commit produces wins in virtually every CPU part of the pipeline:
`extract_meshes`, `queue_material_meshes`,
`batch_and_prepare_render_phase`, and especially
`write_batched_instance_buffer` are all faster. Shrinking the amount of
CPU data that has to be shuffled around speeds up the entire rendering
process.

| Benchmark              | This branch | `main`  | Speedup |
|------------------------|-------------|---------|---------|
| `many_cubes -nfc`      |      17.259 |  24.529 |  42.12% |
| `many_cubes -nfc -vpi` |     302.116 | 312.123 |   3.31% |
| `many_foxes`           |       3.227 |   3.515 |   8.92% |

Because mesh uniform building requires compute shader, and WebGL 2 has
no compute shader, the existing CPU mesh uniform building code has been
left as-is. Many types now have both CPU mesh uniform building and GPU
mesh uniform building modes. Developers can opt into the old CPU mesh
uniform building by setting the `use_gpu_uniform_builder` option on
`PbrPlugin` to `false`.

Below are graphs of the CPU portions of `many-cubes
--no-frustum-culling`. Yellow is this branch, red is `main`.

`extract_meshes`:
![Screenshot 2024-04-02
124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10)
It's notable that we get a small win even though we're now writing to a
GPU buffer.

`queue_material_meshes`:
![Screenshot 2024-04-02
124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94)
There's a bit of a regression here; not sure what's causing it. In any
case it's very outweighed by the other gains.

`batch_and_prepare_render_phase`:
![Screenshot 2024-04-02
125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435)
There's a huge win here, enough to make batching basically drop off the
profile.

`write_batched_instance_buffer`:
![Screenshot 2024-04-02
125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2)
There's a massive improvement here, as expected. Note that a lot of it
simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't
a maintainability problem in my view because `MeshInputUniform` is so
simple: just 16 tightly-packed words.)

## Changelog

### Added

* Per-mesh instance data is now generated on GPU with a compute shader
instead of CPU, resulting in rendering performance improvements on
platforms where compute shaders are supported.

## Migration guide

* Custom render phases now need multiple systems beyond just
`batch_and_prepare_render_phase`. Code that was previously creating
custom render phases should now add a `BinnedRenderPhasePlugin` or
`SortedRenderPhasePlugin` as appropriate instead of directly adding
`batch_and_prepare_render_phase`.
2024-04-10 05:33:32 +00:00
Robert Swain
ab7cbfa8fc
Consolidate Render(Ui)Materials(2d) into RenderAssets (#12827)
# Objective

- Replace `RenderMaterials` / `RenderMaterials2d` / `RenderUiMaterials`
with `RenderAssets` to enable implementing changes to one thing,
`RenderAssets`, that applies to all use cases rather than duplicating
changes everywhere for multiple things that should be one thing.
- Adopts #8149 

## Solution

- Make RenderAsset generic over the destination type rather than the
source type as in #8149
- Use `RenderAssets<PreparedMaterial<M>>` etc for render materials

---

## Changelog

- Changed:
- The `RenderAsset` trait is now implemented on the destination type.
Its `SourceAsset` associated type refers to the type of the source
asset.
- `RenderMaterials`, `RenderMaterials2d`, and `RenderUiMaterials` have
been replaced by `RenderAssets<PreparedMaterial<M>>` and similar.

## Migration Guide

- `RenderAsset` is now implemented for the destination type rather that
the source asset type. The source asset type is now the `RenderAsset`
trait's `SourceAsset` associated type.
2024-04-09 13:26:34 +00:00
JMS55
31b5943ad4
Add previous_view_uniforms.inverse_view (#12902)
# Objective
- Upload previous frame's inverse_view matrix to the GPU for use with
https://github.com/bevyengine/bevy/pull/12898.

---

## Changelog
- Added `prepass_bindings::previous_view_uniforms.inverse_view`.
- Renamed `prepass_bindings::previous_view_proj` to
`prepass_bindings::previous_view_uniforms.view_proj`.
- Renamed `PreviousViewProjectionUniformOffset` to
`PreviousViewUniformOffset`.
- Renamed `PreviousViewProjection` to `PreviousViewData`.

## Migration Guide
- Renamed `prepass_bindings::previous_view_proj` to
`prepass_bindings::previous_view_uniforms.view_proj`.
- Renamed `PreviousViewProjectionUniformOffset` to
`PreviousViewUniformOffset`.
- Renamed `PreviousViewProjection` to `PreviousViewData`.
2024-04-07 18:59:16 +00:00
Patrick Walton
37522fd0ae
Micro-optimize queue_material_meshes, primarily to remove bit manipulation. (#12791)
This commit makes the following optimizations:

## `MeshPipelineKey`/`BaseMeshPipelineKey` split

`MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives
in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`.
Conceptually, `BaseMeshPipelineKey` is a superclass of
`MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the
highest (most significant) bit and grow downward toward the lowest bit;
for `MeshPipelineKey`, the bits start at the lowest bit and grow upward
toward the highest bit. This prevents them from colliding.

The goal of this is to avoid having to reassemble bits of the pipeline
key for every mesh every frame. Instead, we can just use a bitwise or
operation to combine the pieces that make up a `MeshPipelineKey`.

## `specialize_slow`

Previously, all of `specialize()` was marked as `#[inline]`. This
bloated `queue_material_meshes` unnecessarily, as a large chunk of it
ended up being a slow path that was rarely hit. This commit refactors
the function to move the slow path to `specialize_slow()`.

Together, these two changes shave about 5% off `queue_material_meshes`:

![Screenshot 2024-03-29
130002](https://github.com/bevyengine/bevy/assets/157897/a7e5a994-a807-4328-b314-9003429dcdd2)

## Migration Guide

- The `primitive_topology` field on `GpuMesh` is now an accessor method:
`GpuMesh::primitive_topology()`.
- For performance reasons, `MeshPipelineKey` has been split into
`BaseMeshPipelineKey`, which lives in `bevy_render`, and
`MeshPipelineKey`, which lives in `bevy_pbr`. These two should be
combined with bitwise-or to produce the final `MeshPipelineKey`.
2024-04-01 21:58:53 +00:00
Cameron
01649f13e2
Refactor App and SubApp internals for better separation (#9202)
# Objective

This is a necessary precursor to #9122 (this was split from that PR to
reduce the amount of code to review all at once).

Moving `!Send` resource ownership to `App` will make it unambiguously
`!Send`. `SubApp` must be `Send`, so it can't wrap `App`.

## Solution

Refactor `App` and `SubApp` to not have a recursive relationship. Since
`SubApp` no longer wraps `App`, once `!Send` resources are moved out of
`World` and into `App`, `SubApp` will become unambiguously `Send`.

There could be less code duplication between `App` and `SubApp`, but
that would break `App` method chaining.

## Changelog

- `SubApp` no longer wraps `App`.
- `App` fields are no longer publicly accessible.
- `App` can no longer be converted into a `SubApp`.
- Various methods now return references to a `SubApp` instead of an
`App`.
## Migration Guide

- To construct a sub-app, use `SubApp::new()`. `App` can no longer
convert into `SubApp`.
- If you implemented a trait for `App`, you may want to implement it for
`SubApp` as well.
- If you're accessing `app.world` directly, you now have to use
`app.world()` and `app.world_mut()`.
- `App::sub_app` now returns `&SubApp`.
- `App::sub_app_mut`  now returns `&mut SubApp`.
- `App::get_sub_app` now returns `Option<&SubApp>.`
- `App::get_sub_app_mut` now returns `Option<&mut SubApp>.`
2024-03-31 03:16:10 +00:00
Patrick Walton
4dadebd9c4
Improve performance by binning together opaque items instead of sorting them. (#12453)
Today, we sort all entities added to all phases, even the phases that
don't strictly need sorting, such as the opaque and shadow phases. This
results in a performance loss because our `PhaseItem`s are rather large
in memory, so sorting is slow. Additionally, determining the boundaries
of batches is an O(n) process.

This commit makes Bevy instead applicable place phase items into *bins*
keyed by *bin keys*, which have the invariant that everything in the
same bin is potentially batchable. This makes determining batch
boundaries O(1), because everything in the same bin can be batched.
Instead of sorting each entity, we now sort only the bin keys. This
drops the sorting time to near-zero on workloads with few bins like
`many_cubes --no-frustum-culling`. Memory usage is improved too, with
batch boundaries and dynamic indices now implicit instead of explicit.
The improved memory usage results in a significant win even on
unbatchable workloads like `many_cubes --no-frustum-culling
--vary-material-data-per-instance`, presumably due to cache effects.

Not all phases can be binned; some, such as transparent and transmissive
phases, must still be sorted. To handle this, this commit splits
`PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the
logic that today deals with `PhaseItem`s has been moved to
`SortedPhaseItem`. `BinnedPhaseItem` has the new logic.

Frame time results (in ms/frame) are as follows:

| Benchmark                | `binning` | `main`  | Speedup |
| ------------------------ | --------- | ------- | ------- |
| `many_cubes -nfc -vpi` | 232.179     | 312.123   | 34.43%  |
| `many_cubes -nfc`        | 25.874 | 30.117 | 16.40%  |
| `many_foxes`             | 3.276 | 3.515 | 7.30%   |

(`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for
`--vary-per-instance`.)

---

## Changelog

### Changed

* Render phases have been split into binned and sorted phases. Binned
phases, such as the common opaque phase, achieve improved CPU
performance by avoiding the sorting step.

## Migration Guide

- `PhaseItem` has been split into `BinnedPhaseItem` and
`SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need
to migrate them to one of these two types. `SortedPhaseItem` requires
the fewest code changes, but you may want to pick `BinnedPhaseItem` if
your phase doesn't require sorting, as that enables higher performance.

## Tracy graphs

`many-cubes --no-frustum-culling`, `main` branch:
<img width="1064" alt="Screenshot 2024-03-12 180037"
src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55">

`many-cubes --no-frustum-culling`, this branch:
<img width="1064" alt="Screenshot 2024-03-12 180011"
src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c">

You can see that `batch_and_prepare_binned_render_phase` is a much
smaller fraction of the time. Zooming in on that function, with yellow
being this branch and red being `main`, we see:

<img width="1064" alt="Screenshot 2024-03-12 175832"
src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0">

The binning happens in `queue_material_meshes`. Again with yellow being
this branch and red being `main`:
<img width="1064" alt="Screenshot 2024-03-12 175755"
src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96">

We can see that there is a small regression in `queue_material_meshes`
performance, but it's not nearly enough to outweigh the large gains in
`batch_and_prepare_binned_render_phase`.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
JMS55
4f20faaa43
Meshlet rendering (initial feature) (#10164)
# Objective
- Implements a more efficient, GPU-driven
(https://github.com/bevyengine/bevy/issues/1342) rendering pipeline
based on meshlets.
- Meshes are split into small clusters of triangles called meshlets,
each of which acts as a mini index buffer into the larger mesh data.
Meshlets can be compressed, streamed, culled, and batched much more
efficiently than monolithic meshes.


![image](https://github.com/bevyengine/bevy/assets/47158642/cb2aaad0-7a9a-4e14-93b0-15d4e895b26a)

![image](https://github.com/bevyengine/bevy/assets/47158642/7534035b-1eb7-4278-9b99-5322e4401715)

# Misc
* Future work: https://github.com/bevyengine/bevy/issues/11518
* Nanite reference:
https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf
Two pass occlusion culling explained very well:
https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

---------

Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com>
Co-authored-by: vero <email@atlasdostal.com>
Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: atlas dostal <rodol@rivalrebels.com>
2024-03-25 19:08:27 +00:00
Gino Valente
4c47e31be6
bevy_reflect: Remove U32Visitor (#12433)
# Objective

The `U32Visitor` struct has been unused since its introduction in #6140.
It's made itself known now by causing a recent [CI
failure](https://github.com/bevyengine/bevy/actions/runs/8243333274/job/22543736624).

## Solution

Remove the unused `U32Visitor` struct.

Also removed `PrepassLightsViewFlush` as it was causing a [similar CI
failure](https://github.com/bevyengine/bevy/actions/runs/8243838066/job/22545103746?pr=12433#step:6:269)
on this PR.
2024-03-12 06:19:29 +00:00
Patrick Walton
f9cc91d5a1
Intern mesh vertex buffer layouts so that we don't have to compare them over and over. (#12216)
Although we cached hashes of `MeshVertexBufferLayout`, we were paying
the cost of `PartialEq` on `InnerMeshVertexBufferLayout` for every
entity, every frame. This patch changes that logic to place
`MeshVertexBufferLayout`s in `Arc`s so that they can be compared and
hashed by pointer. This results in a 28% speedup in the
`queue_material_meshes` phase of `many_cubes`, with frustum culling
disabled.

Additionally, this patch contains two minor changes:

1. This commit flattens the specialized mesh pipeline cache to one level
of hash tables instead of two. This saves a hash lookup.

2. The example `many_cubes` has been given a `--no-frustum-culling`
flag, to aid in benchmarking.

See the Tracy profile:

<img width="1064" alt="Screenshot 2024-02-29 144406"
src="https://github.com/bevyengine/bevy/assets/157897/18632f1d-1fdd-4ac7-90ed-2d10306b2a1e">

## Migration guide

* Duplicate `MeshVertexBufferLayout`s are now combined into a single
object, `MeshVertexBufferLayoutRef`, which contains an
atomically-reference-counted pointer to the layout. Code that was using
`MeshVertexBufferLayout` may need to be updated to use
`MeshVertexBufferLayoutRef` instead.
2024-03-01 20:56:21 +00:00
Elabajaba
78b6fa1f1b
sort alpha masked pipelines by pipeline & mesh instead of by distance (#12117)
# Objective

- followup to https://github.com/bevyengine/bevy/pull/11671
- I forgot to change the alpha masked phases.

## Solution

- Change the sorting for alpha mask phases to sort by pipeline+mesh
instead of distance, for much better batching for alpha masked
materials.

I also fixed some docs that I missed in the previous PR.

---

## Changelog
- Alpha masked materials are now sorted by pipeline and mesh.
2024-02-26 11:14:59 +00:00
eri
5f8f3b532c
Check cfg during CI and fix feature typos (#12103)
# Objective

- Add the new `-Zcheck-cfg` checks to catch more warnings
- Fixes #12091

## Solution

- Create a new `cfg-check` to the CI that runs `cargo check -Zcheck-cfg
--workspace` using cargo nightly (and fails if there are warnings)
- Fix all warnings generated by the new check

---

## Changelog

- Remove all redundant imports
- Fix cfg wasm32 targets
- Add 3 dead code exceptions (should StandardColor be unused?)
- Convert ios_simulator to a feature (I'm not sure if this is the right
way to do it, but the check complained before)

## Migration Guide

No breaking changes

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-02-25 15:19:27 +00:00