bevy/crates
Patrick Walton 11817f4ba4
Generate MeshUniforms on the GPU via compute shader where available. (#12773)
Currently, `MeshUniform`s are rather large: 160 bytes. They're also
somewhat expensive to compute, because they involve taking the inverse
of a 3x4 matrix. Finally, if a mesh is present in multiple views, that
mesh will have a separate `MeshUniform` for each and every view, which
is wasteful.

This commit fixes these issues by introducing the concept of a *mesh
input uniform* and adding a *mesh uniform building* compute shader pass.
The `MeshInputUniform` is simply the minimum amount of data needed for
the GPU to compute the full `MeshUniform`. Most of this data is just the
transform and is therefore only 64 bytes. `MeshInputUniform`s are
computed during the *extraction* phase, much like skins are today, in
order to avoid needlessly copying transforms around on CPU. (In fact,
the render app has been changed to only store the translation of each
mesh; it no longer cares about any other part of the transform, which is
stored only on the GPU and the main world.) Before rendering, the
`build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the
full `MeshUniform`.

The mesh uniform building pass does the following, all on GPU:

1. Copy the appropriate fields of the `MeshInputUniform` to the
`MeshUniform` slot. If a single mesh is present in multiple views, this
effectively duplicates it into each view.

2. Compute the inverse transpose of the model transform, used for
transforming normals.

3. If applicable, copy the mesh's transform from the previous frame for
TAA. To support this, we double-buffer the `MeshInputUniform`s over two
frames and swap the buffers each frame. The `MeshInputUniform`s for the
current frame contain the index of that mesh's `MeshInputUniform` for
the previous frame.

This commit produces wins in virtually every CPU part of the pipeline:
`extract_meshes`, `queue_material_meshes`,
`batch_and_prepare_render_phase`, and especially
`write_batched_instance_buffer` are all faster. Shrinking the amount of
CPU data that has to be shuffled around speeds up the entire rendering
process.

| Benchmark              | This branch | `main`  | Speedup |
|------------------------|-------------|---------|---------|
| `many_cubes -nfc`      |      17.259 |  24.529 |  42.12% |
| `many_cubes -nfc -vpi` |     302.116 | 312.123 |   3.31% |
| `many_foxes`           |       3.227 |   3.515 |   8.92% |

Because mesh uniform building requires compute shader, and WebGL 2 has
no compute shader, the existing CPU mesh uniform building code has been
left as-is. Many types now have both CPU mesh uniform building and GPU
mesh uniform building modes. Developers can opt into the old CPU mesh
uniform building by setting the `use_gpu_uniform_builder` option on
`PbrPlugin` to `false`.

Below are graphs of the CPU portions of `many-cubes
--no-frustum-culling`. Yellow is this branch, red is `main`.

`extract_meshes`:
![Screenshot 2024-04-02
124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10)
It's notable that we get a small win even though we're now writing to a
GPU buffer.

`queue_material_meshes`:
![Screenshot 2024-04-02
124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94)
There's a bit of a regression here; not sure what's causing it. In any
case it's very outweighed by the other gains.

`batch_and_prepare_render_phase`:
![Screenshot 2024-04-02
125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435)
There's a huge win here, enough to make batching basically drop off the
profile.

`write_batched_instance_buffer`:
![Screenshot 2024-04-02
125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2)
There's a massive improvement here, as expected. Note that a lot of it
simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't
a maintainability problem in my view because `MeshInputUniform` is so
simple: just 16 tightly-packed words.)

## Changelog

### Added

* Per-mesh instance data is now generated on GPU with a compute shader
instead of CPU, resulting in rendering performance improvements on
platforms where compute shaders are supported.

## Migration guide

* Custom render phases now need multiple systems beyond just
`batch_and_prepare_render_phase`. Code that was previously creating
custom render phases should now add a `BinnedRenderPhasePlugin` or
`SortedRenderPhasePlugin` as appropriate instead of directly adding
`batch_and_prepare_render_phase`.
2024-04-10 05:33:32 +00:00
..
bevy_a11y Set the logo and favicon for all of Bevy's published crates (#12696) 2024-03-25 18:52:50 +00:00
bevy_animation Clean up some low level dependencies (#12858) 2024-04-08 19:45:42 +00:00
bevy_app Clean up some low level dependencies (#12858) 2024-04-08 19:45:42 +00:00
bevy_asset Error info has been added to LoadState::Failed (#12709) 2024-04-04 14:04:27 +00:00
bevy_audio updated audio_source.rs documentation (#12765) 2024-03-28 19:10:09 +00:00
bevy_color Clean up some low level dependencies (#12858) 2024-04-08 19:45:42 +00:00
bevy_core Refactor App and SubApp internals for better separation (#9202) 2024-03-31 03:16:10 +00:00
bevy_core_pipeline Consolidate Render(Ui)Materials(2d) into RenderAssets (#12827) 2024-04-09 13:26:34 +00:00
bevy_derive Forbid unsafe in most crates in the engine (#12684) 2024-03-27 03:30:08 +00:00
bevy_dev_tools remove close_on_esc (#12859) 2024-04-03 18:02:50 +00:00
bevy_diagnostic Refactor App and SubApp internals for better separation (#9202) 2024-03-31 03:16:10 +00:00
bevy_dylib Set the logo and favicon for all of Bevy's published crates (#12696) 2024-03-25 18:52:50 +00:00
bevy_dynamic_plugin Forbid unsafe in most crates in the engine (#12684) 2024-03-27 03:30:08 +00:00
bevy_ecs Clean up some low level dependencies (#12858) 2024-04-08 19:45:42 +00:00
bevy_ecs_compile_fail_tests Fix Ci failing over dead code in tests (#12623) 2024-03-21 18:08:47 +00:00
bevy_encase_derive Forbid unsafe in most crates in the engine (#12684) 2024-03-27 03:30:08 +00:00
bevy_gilrs Forbid unsafe in most crates in the engine (#12684) 2024-03-27 03:30:08 +00:00
bevy_gizmos Use impl Into<Color> for gizmos.primitive_3d(...) (#12915) 2024-04-09 17:33:34 +00:00
bevy_gltf Refactor App and SubApp internals for better separation (#9202) 2024-03-31 03:16:10 +00:00
bevy_hierarchy Forbid unsafe in most crates in the engine (#12684) 2024-03-27 03:30:08 +00:00
bevy_input fix previous_position / previous_force being discarded too early (#12556) 2024-04-01 21:45:47 +00:00
bevy_internal Fix ambiguities causing a crash (#12780) 2024-03-29 16:00:13 +00:00
bevy_log Refactor App and SubApp internals for better separation (#9202) 2024-03-31 03:16:10 +00:00
bevy_macro_utils Clean up some low level dependencies (#12858) 2024-04-08 19:45:42 +00:00
bevy_macros_compile_fail_tests Fix Ci failing over dead code in tests (#12623) 2024-03-21 18:08:47 +00:00
bevy_math Random sampling of directions and quaternions (#12857) 2024-04-04 23:13:00 +00:00
bevy_mikktspace Forbid unsafe in most crates in the engine (#12684) 2024-03-27 03:30:08 +00:00
bevy_pbr Generate MeshUniforms on the GPU via compute shader where available. (#12773) 2024-04-10 05:33:32 +00:00
bevy_ptr Document the lifetime requirement of byte_offset and byte_add (#12893) 2024-04-08 17:13:35 +00:00
bevy_reflect Meshing for Annulus primitive (#12734) 2024-04-01 21:55:49 +00:00
bevy_reflect_compile_fail_tests Fix Ci failing over dead code in tests (#12623) 2024-03-21 18:08:47 +00:00
bevy_render Generate MeshUniforms on the GPU via compute shader where available. (#12773) 2024-04-10 05:33:32 +00:00
bevy_scene Clean up some low level dependencies (#12858) 2024-04-08 19:45:42 +00:00
bevy_sprite Generate MeshUniforms on the GPU via compute shader where available. (#12773) 2024-04-10 05:33:32 +00:00
bevy_tasks Fix beta CI (#12913) 2024-04-09 17:33:59 +00:00
bevy_text Refactor App and SubApp internals for better separation (#9202) 2024-03-31 03:16:10 +00:00
bevy_time Remove redundant imports (#12817) 2024-04-01 19:59:08 +00:00
bevy_transform Remove redundant imports (#12817) 2024-04-01 19:59:08 +00:00
bevy_ui Consolidate Render(Ui)Materials(2d) into RenderAssets (#12827) 2024-04-09 13:26:34 +00:00
bevy_utils Moves intern and label modules into bevy_ecs (#12772) 2024-04-08 15:34:11 +00:00
bevy_window Move close_on_esc to bevy_dev_tools (#12855) 2024-04-03 01:29:06 +00:00
bevy_winit Fix beta CI (#12913) 2024-04-09 17:33:59 +00:00