# Objective
- Make the meshlet fill cluster buffers pass slightly faster
- Address https://github.com/bevyengine/bevy/issues/15920 for meshlets
- Added PreviousGlobalTransform as a required meshlet component to avoid
extra archetype moves, slightly alleviating
https://github.com/bevyengine/bevy/issues/14681 for meshlets
- Enforce that MeshletPlugin::cluster_buffer_slots is not greater than
2^25 (glitches will occur otherwise). Technically this field controls
post-lod/culling cluster count, and the issue is on pre-lod/culling
cluster count, but it's still valid now, and in the future this will be
more true.
Needs to be merged after https://github.com/bevyengine/bevy/pull/15846
and https://github.com/bevyengine/bevy/pull/15886
## Solution
- Old pass dispatched a thread per cluster, and did a binary search over
the instances to find which instance the cluster belongs to, and what
meshlet index within the instance it is.
- New pass dispatches a workgroup per instance, and has the workgroup
loop over all meshlets in the instance in order to write out the cluster
data.
- Use a push constant instead of arrayLength to fix the linked bug
- Remap 1d->2d dispatch for software raster only if actually needed to
save on spawning excess workgroups
## Testing
- Did you test these changes? If so, how?
- Ran the meshlet example, and an example with 1041 instances of 32217
meshlets per instance. Profiled the second scene with nsight, went from
0.55ms -> 0.40ms. Small savings. We're pretty much VRAM bandwidth bound
at this point.
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
- Run the meshlet example
## Changelog (non-meshlets)
- PreviousGlobalTransform now implements the Default trait
Take a bunch more improvements from @zeux's nanite.cpp code.
* Use position-only vertices (discard other attributes) to determine
meshlet connectivity for grouping
* Rather than using the lock borders flag when simplifying meshlet
groups, provide the locked vertices ourselves. The lock borders flag
locks the entire border of the meshlet group, but really we only want to
lock the edges between meshlet groups - outwards facing edges are fine
to unlock. This gives a really significant increase to the DAG quality.
* Add back stuck meshlets (group has only a single meshlet,
simplification failed) to the simplification queue to allow them to get
used later on and have another attempt at simplifying
* Target 8 meshlets per group instead of 4 (second biggest improvement
after manual locks)
* Provide a seed to metis for deterministic meshlet building
* Misc other improvements
We can remove the usage of unsafe after the next upstream meshopt
release, but for now we need to use the ffi function directly. I'll do
another round of improvements later, mainly attribute-aware
simplification and using spatial weights for meshlet grouping.
Need to merge https://github.com/bevyengine/bevy/pull/15846 first.
# Objective
1. Prevent weird glitches with stray pixels scattered around the scene
![image](https://github.com/user-attachments/assets/f12adb38-5996-4dc7-bea6-bd326b7317e1)
2. Prevent weird glitchy full-screen triangles that pop-up and destroy
perf (SW rasterizing huge triangles is slow)
![image](https://github.com/user-attachments/assets/d3705427-13a5-47bc-a54b-756f0409da0b)
## Solution
1. Use floating point math in the SW rasterizer bounding box calculation
to handle negative verticss, and add backface culling
2. Force hardware raster for clusters that clip the near plane, and let
the hardware rasterizer handle the clipping
I also adjusted the SW rasterizer threshold to < 64 pixels (little bit
better perf in my test scene, but still need to do a more comprehensive
test), and enabled backface culling for the hardware raster pipeline.
## Testing
- Did you test these changes? If so, how?
- Yes, on an example scene. Issues no longer occur.
- Are there any parts that need more testing?
- No.
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
- Run the meshlet example.
# Objective
In the Render World, there are a number of collections that are derived
from Main World entities and are used to drive rendering. The most
notable are:
- `VisibleEntities`, which is generated in the `check_visibility` system
and contains visible entities for a view.
- `ExtractedInstances`, which maps entity ids to asset ids.
In the old model, these collections were trivially kept in sync -- any
extracted phase item could look itself up because the render entity id
was guaranteed to always match the corresponding main world id.
After #15320, this became much more complicated, and was leading to a
number of subtle bugs in the Render World. The main rendering systems,
i.e. `queue_material_meshes` and `queue_material2d_meshes`, follow a
similar pattern:
```rust
for visible_entity in visible_entities.iter::<With<Mesh2d>>() {
let Some(mesh_instance) = render_mesh_instances.get_mut(visible_entity) else {
continue;
};
// Look some more stuff up and specialize the pipeline...
let bin_key = Opaque2dBinKey {
pipeline: pipeline_id,
draw_function: draw_opaque_2d,
asset_id: mesh_instance.mesh_asset_id.into(),
material_bind_group_id: material_2d.get_bind_group_id().0,
};
opaque_phase.add(
bin_key,
*visible_entity,
BinnedRenderPhaseType::mesh(mesh_instance.automatic_batching),
);
}
```
In this case, `visible_entities` and `render_mesh_instances` are both
collections that are created and keyed by Main World entity ids, and so
this lookup happens to work by coincidence. However, there is a major
unintentional bug here: namely, because `visible_entities` is a
collection of Main World ids, the phase item being queued is created
with a Main World id rather than its correct Render World id.
This happens to not break mesh rendering because the render commands
used for drawing meshes do not access the `ItemQuery` parameter, but
demonstrates the confusion that is now possible: our UI phase items are
correctly being queued with Render World ids while our meshes aren't.
Additionally, this makes it very easy and error prone to use the wrong
entity id to look up things like assets. For example, if instead we
ignored visibility checks and queued our meshes via a query, we'd have
to be extra careful to use `&MainEntity` instead of the natural
`Entity`.
## Solution
Make all collections that are derived from Main World data use
`MainEntity` as their key, to ensure type safety and avoid accidentally
looking up data with the wrong entity id:
```rust
pub type MainEntityHashMap<V> = hashbrown::HashMap<MainEntity, V, EntityHash>;
```
Additionally, we make all `PhaseItem` be able to provide both their Main
and Render World ids, to allow render phase implementors maximum
flexibility as to what id should be used to look up data.
You can think of this like tracking at the type level whether something
in the Render World should use it's "primary key", i.e. entity id, or
needs to use a foreign key, i.e. `MainEntity`.
## Testing
##### TODO:
This will require extensive testing to make sure things didn't break!
Additionally, some extraction logic has become more complicated and
needs to be checked for regressions.
## Migration Guide
With the advent of the retained render world, collections that contain
references to `Entity` that are extracted into the render world have
been changed to contain `MainEntity` in order to prevent errors where a
render world entity id is used to look up an item by accident. Custom
rendering code may need to be changed to query for `&MainEntity` in
order to look up the correct item from such a collection. Additionally,
users who implement their own extraction logic for collections of main
world entity should strongly consider extracting into a different
collection that uses `MainEntity` as a key.
Additionally, render phases now require specifying both the `Entity` and
`MainEntity` for a given `PhaseItem`. Custom render phases should ensure
`MainEntity` is available when queuing a phase item.
# Objective
- Closes#15716
- Closes#15718
## Solution
- Replace `Handle<MeshletMesh>` with a new `MeshletMesh3d` component
- As expected there were some random things that needed fixing:
- A couple tests were storing handles just to prevent them from being
dropped I believe, which seems to have been unnecessary in some.
- The `SpriteBundle` still had a `Handle<Image>` field. I've removed
this.
- Tests in `bevy_sprite` incorrectly added a `Handle<Image>` field
outside of the `Sprite` component.
- A few examples were still inserting `Handle`s, switched those to their
corresponding wrappers.
- 2 examples that were still querying for `Handle<Image>` were changed
to query `Sprite`
## Testing
- I've verified that the changed example work now
## Migration Guide
`Handle` can no longer be used as a `Component`. All existing Bevy types
using this pattern have been wrapped in their own semantically
meaningful type. You should do the same for any custom `Handle`
components your project needs.
The `Handle<MeshletMesh>` component is now `MeshletMesh3d`.
The `WithMeshletMesh` type alias has been removed. Use
`With<MeshletMesh3d>` instead.
# Objective
- Prepare for streaming by storing vertex data per-meshlet, rather than
per-mesh (this means duplicating vertices per-meshlet)
- Compress vertex data to reduce the cost of this
## Solution
The important parts are in from_mesh.rs, the changes to the Meshlet type
in asset.rs, and the changes in meshlet_bindings.wgsl. Everything else
is pretty secondary/boilerplate/straightforward changes.
- Positions are quantized in centimeters with a user-provided power of 2
factor (ideally auto-determined, but that's a TODO for the future),
encoded as an offset relative to the minimum value within the meshlet,
and then stored as a packed list of bits using the minimum number of
bits needed for each vertex position channel for that meshlet
- E.g. quantize positions (lossly, throws away precision that's not
needed leading to using less bits in the bitstream encoding)
- Get the min/max quantized value of each X/Y/Z channel of the quantized
positions within a meshlet
- Encode values relative to the min value of the meshlet. E.g. convert
from [min, max] to [0, max - min]
- The new max value in the meshlet is (max - min), which only takes N
bits, so we only need N bits to store each channel within the meshlet
(lossless)
- We can store the min value and that it takes N bits per channel in the
meshlet metadata, and reconstruct the position from the bitstream
- Normals are octahedral encoded and than snorm2x16 packed and stored as
a single u32.
- Would be better to implement the precise variant of octhedral encoding
for extra precision (no extra decode cost), but decided to keep it
simple for now and leave that as a followup
- Tried doing a quantizing and bitstream encoding scheme like I did for
positions, but struggled to get it smaller. Decided to go with this for
simplicity for now
- UVs are uncompressed and take a full 64bits per vertex which is
expensive
- In the future this should be improved
- Tangents, as of the previous PR, are not explicitly stored and are
instead derived from screen space gradients
- While I'm here, split up MeshletMeshSaverLoader into two separate
types
Other future changes include implementing a smaller encoding of triangle
data (3 u8 indices = 24 bits per triangle currently), and more
disk-oriented compression schemes.
References:
* "A Deep Dive into UE5's Nanite Virtualized Geometry"
https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf#page=128
(also available on youtube)
* "Towards Practical Meshlet Compression"
https://arxiv.org/pdf/2404.06359
* "Vertex quantization in Omniforce Game Engine"
https://daniilvinn.github.io/2024/05/04/omniforce-vertex-quantization.html
## Testing
- Did you test these changes? If so, how?
- Converted the stanford bunny, and rendered it with a debug material
showing normals, and confirmed that it's identical to what's on main.
EDIT: See additional testing in the comments below.
- Are there any parts that need more testing?
- Could use some more size comparisons on various meshes, and testing
different quantization factors. Not sure if 4 is a good default. EDIT:
See additional testing in the comments below.
- Also did not test runtime performance of the shaders. EDIT: See
additional testing in the comments below.
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
- Use my unholy script, replacing the meshlet example
https://paste.rs/7xQHk.rs (must make MeshletMesh fields pub instead of
pub crate, must add lz4_flex as a dev-dependency) (must compile with
meshlet and meshlet_processor features, mesh must have only positions,
normals, and UVs, no vertex colors or tangents)
---
## Migration Guide
- TBD by JMS55 at the end of the release
# Objective
- First step towards #15558
## Solution
- Rename `get_vertex_buffer_data` to `create_packed_vertex_buffer_data`
to make it clear that it is not "free" and actually allocates
- Compute length analytically for preallocation instead of creating the
buffer to get its length and immediately discard it
- Use existing vertex attribute size calculation method to reduce code
duplication
- Fix a bug where mesh index data was being replaced by unnecessarily
newly created mesh vertex data in some cases
- Overall reduces mesh copies by two. We still have plenty to go, but
these were the easy ones.
## Testing
- I ran 3d_scene, lighting, and many_cubes, they look fine.
- Benchmarks would be nice, but this is very obviously a win in perf and
correctness.
---
## Migration Guide
- `Mesh::create_packed_vertex_buffer_data` has been renamed
`Mesh::create_packed_vertex_buffer_data` to reflect the fact that it
copies data and allocates.
## Showcase
- look mom, less copies
# Objective
Fixes#15541
A bunch of lifetimes were added during the Assets V2 rework, but after
moving to async traits in #12550 they can be elided. That PR mentions
that this might be the case, but apparently it wasn't followed up on at
the time.
~~I ended up grepping for `<'a` and finding a similar case in
`bevy_reflect` which I also fixed.~~ (edit: that one was needed
apparently)
Note that elided lifetimes are unstable in `impl Trait`. If that gets
stabilized then we can elide even more.
## Solution
Remove the extra lifetimes.
## Testing
Everything still compiles. If I have messed something up there is a
small risk that some user code stops compiling, but all the examples
still work at least.
---
## Migration Guide
The traits `AssetLoader`, `AssetSaver` and `Process` traits from
`bevy_asset` now use elided lifetimes. If you implement these then
remove the named lifetime.
- Adopted from #14449
- Still fixes#12144.
## Migration Guide
The retained render world is a complex change: migrating might take one
of a few different forms depending on the patterns you're using.
For every example, we specify in which world the code is run. Most of
the changes affect render world code, so for the average Bevy user who's
using Bevy's high-level rendering APIs, these changes are unlikely to
affect your code.
### Spawning entities in the render world
Previously, if you spawned an entity with `world.spawn(...)`,
`commands.spawn(...)` or some other method in the rendering world, it
would be despawned at the end of each frame. In 0.15, this is no longer
the case and so your old code could leak entities. This can be mitigated
by either re-architecting your code to no longer continuously spawn
entities (like you're used to in the main world), or by adding the
`bevy_render::world_sync::TemporaryRenderEntity` component to the entity
you're spawning. Entities tagged with `TemporaryRenderEntity` will be
removed at the end of each frame (like before).
### Extract components with `ExtractComponentPlugin`
```
// main world
app.add_plugins(ExtractComponentPlugin::<ComponentToExtract>::default());
```
`ExtractComponentPlugin` has been changed to only work with synced
entities. Entities are automatically synced if `ComponentToExtract` is
added to them. However, entities are not "unsynced" if any given
`ComponentToExtract` is removed, because an entity may have multiple
components to extract. This would cause the other components to no
longer get extracted because the entity is not synced.
So be careful when only removing extracted components from entities in
the render world, because it might leave an entity behind in the render
world. The solution here is to avoid only removing extracted components
and instead despawn the entire entity.
### Manual extraction using `Extract<Query<(Entity, ...)>>`
```rust
// in render world, inspired by bevy_pbr/src/cluster/mod.rs
pub fn extract_clusters(
mut commands: Commands,
views: Extract<Query<(Entity, &Clusters, &Camera)>>,
) {
for (entity, clusters, camera) in &views {
// some code
commands.get_or_spawn(entity).insert(...);
}
}
```
One of the primary consequences of the retained rendering world is that
there's no longer a one-to-one mapping from entity IDs in the main world
to entity IDs in the render world. Unlike in Bevy 0.14, Entity 42 in the
main world doesn't necessarily map to entity 42 in the render world.
Previous code which called `get_or_spawn(main_world_entity)` in the
render world (`Extract<Query<(Entity, ...)>>` returns main world
entities). Instead, you should use `&RenderEntity` and
`render_entity.id()` to get the correct entity in the render world. Note
that this entity does need to be synced first in order to have a
`RenderEntity`.
When performing manual abstraction, this won't happen automatically
(like with `ExtractComponentPlugin`) so add a `SyncToRenderWorld` marker
component to the entities you want to extract.
This results in the following code:
```rust
// in render world, inspired by bevy_pbr/src/cluster/mod.rs
pub fn extract_clusters(
mut commands: Commands,
views: Extract<Query<(&RenderEntity, &Clusters, &Camera)>>,
) {
for (render_entity, clusters, camera) in &views {
// some code
commands.get_or_spawn(render_entity.id()).insert(...);
}
}
// in main world, when spawning
world.spawn(Clusters::default(), Camera::default(), SyncToRenderWorld)
```
### Looking up `Entity` ids in the render world
As previously stated, there's now no correspondence between main world
and render world `Entity` identifiers.
Querying for `Entity` in the render world will return the `Entity` id in
the render world: query for `MainEntity` (and use its `id()` method) to
get the corresponding entity in the main world.
This is also a good way to tell the difference between synced and
unsynced entities in the render world, because unsynced entities won't
have a `MainEntity` component.
---------
Co-authored-by: re0312 <re0312@outlook.com>
Co-authored-by: re0312 <45868716+re0312@users.noreply.github.com>
Co-authored-by: Periwink <charlesbour@gmail.com>
Co-authored-by: Anselmo Sampietro <ans.samp@gmail.com>
Co-authored-by: Emerson Coskey <56370779+ecoskey@users.noreply.github.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Christian Hughes <9044780+ItsDoot@users.noreply.github.com>
* Save 16 bytes per vertex by calculating tangents in the shader at
runtime, rather than storing them in the vertex data.
* Based on https://jcgt.org/published/0009/03/04,
https://www.jeremyong.com/graphics/2023/12/16/surface-gradient-bump-mapping.
* Fixed visbuffer resolve to use the updated algorithm that flips ddy
correctly
* Added some more docs about meshlet material limitations, and some
TODOs about transforming UV coordinates for the future.
![image](https://github.com/user-attachments/assets/222d8192-8c82-4d77-945d-53670a503761)
For testing add a normal map to the bunnies with StandardMaterial like
below, and then test that on both main and this PR (make sure to
download the correct bunny for each). Results should be mostly
identical.
```rust
normal_map_texture: Some(asset_server.load_with_settings(
"textures/BlueNoise-Normal.png",
|settings: &mut ImageLoaderSettings| settings.is_srgb = false,
)),
```
# Objective
- Fixes#6370
- Closes#6581
## Solution
- Added the following lints to the workspace:
- `std_instead_of_core`
- `std_instead_of_alloc`
- `alloc_instead_of_core`
- Used `cargo +nightly fmt` with [item level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A)
to split all `use` statements into single items.
- Used `cargo clippy --workspace --all-targets --all-features --fix
--allow-dirty` to _attempt_ to resolve the new linting issues, and
intervened where the lint was unable to resolve the issue automatically
(usually due to needing an `extern crate alloc;` statement in a crate
root).
- Manually removed certain uses of `std` where negative feature gating
prevented `--all-features` from finding the offending uses.
- Used `cargo +nightly fmt` with [crate level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A)
to re-merge all `use` statements matching Bevy's previous styling.
- Manually fixed cases where the `fmt` tool could not re-merge `use`
statements due to conditional compilation attributes.
## Testing
- Ran CI locally
## Migration Guide
The MSRV is now 1.81. Please update to this version or higher.
## Notes
- This is a _massive_ change to try and push through, which is why I've
outlined the semi-automatic steps I used to create this PR, in case this
fails and someone else tries again in the future.
- Making this change has no impact on user code, but does mean Bevy
contributors will be warned to use `core` and `alloc` instead of `std`
where possible.
- This lint is a critical first step towards investigating `no_std`
options for Bevy.
---------
Co-authored-by: François Mockers <francois.mockers@vleue.com>
# Objective
- Fixes#15236
## Solution
- Use bevy_math::ops instead of std floating point operations.
## Testing
- Did you test these changes? If so, how?
Unit tests and `cargo run -p ci -- test`
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
Execute `cargo run -p ci -- test` on Windows.
- If relevant, what platforms did you test these changes on, and are
there any important ones you can't test?
Windows
## Migration Guide
- Not a breaking change
- Projects should use bevy math where applicable
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: IQuick 143 <IQuick143cz@gmail.com>
Co-authored-by: Joona Aalto <jondolf.dev@gmail.com>
# Objective
The names of numerous rendering components in Bevy are inconsistent and
a bit confusing. Relevant names include:
- `AutoExposureSettings`
- `AutoExposureSettingsUniform`
- `BloomSettings`
- `BloomUniform` (no `Settings`)
- `BloomPrefilterSettings`
- `ChromaticAberration` (no `Settings`)
- `ContrastAdaptiveSharpeningSettings`
- `DepthOfFieldSettings`
- `DepthOfFieldUniform` (no `Settings`)
- `FogSettings`
- `SmaaSettings`, `Fxaa`, `TemporalAntiAliasSettings` (really
inconsistent??)
- `ScreenSpaceAmbientOcclusionSettings`
- `ScreenSpaceReflectionsSettings`
- `VolumetricFogSettings`
Firstly, there's a lot of inconsistency between `Foo`/`FooSettings` and
`FooUniform`/`FooSettingsUniform` and whether names are abbreviated or
not.
Secondly, the `Settings` post-fix seems unnecessary and a bit confusing
semantically, since it makes it seem like the component is mostly just
auxiliary configuration instead of the core *thing* that actually
enables the feature. This will be an even bigger problem once bundles
like `TemporalAntiAliasBundle` are deprecated in favor of required
components, as users will expect a component named `TemporalAntiAlias`
(or similar), not `TemporalAntiAliasSettings`.
## Solution
Drop the `Settings` post-fix from the component names, and change some
names to be more consistent.
- `AutoExposure`
- `AutoExposureUniform`
- `Bloom`
- `BloomUniform`
- `BloomPrefilter`
- `ChromaticAberration`
- `ContrastAdaptiveSharpening`
- `DepthOfField`
- `DepthOfFieldUniform`
- `DistanceFog`
- `Smaa`, `Fxaa`, `TemporalAntiAliasing` (note: we might want to change
to `Taa`, see "Discussion")
- `ScreenSpaceAmbientOcclusion`
- `ScreenSpaceReflections`
- `VolumetricFog`
I kept the old names as deprecated type aliases to make migration a bit
less painful for users. We should remove them after the next release.
(And let me know if I should just... not add them at all)
I also added some very basic docs for a few types where they were
missing, like on `Fxaa` and `DepthOfField`.
## Discussion
- `TemporalAntiAliasing` is still inconsistent with `Smaa` and `Fxaa`.
Consensus [on
Discord](https://discord.com/channels/691052431525675048/743663924229963868/1280601167209955431)
seemed to be that renaming to `Taa` would probably be fine, but I think
it's a bit more controversial, and it would've required renaming a lot
of related types like `TemporalAntiAliasNode`,
`TemporalAntiAliasBundle`, and `TemporalAntiAliasPlugin`, so I think
it's better to leave to a follow-up.
- I think `Fog` should probably have a more specific name like
`DistanceFog` considering it seems to be distinct from `VolumetricFog`.
~~This should probably be done in a follow-up though, so I just removed
the `Settings` post-fix for now.~~ (done)
---
## Migration Guide
Many rendering components have been renamed for improved consistency and
clarity.
- `AutoExposureSettings` → `AutoExposure`
- `BloomSettings` → `Bloom`
- `BloomPrefilterSettings` → `BloomPrefilter`
- `ContrastAdaptiveSharpeningSettings` → `ContrastAdaptiveSharpening`
- `DepthOfFieldSettings` → `DepthOfField`
- `FogSettings` → `DistanceFog`
- `SmaaSettings` → `Smaa`
- `TemporalAntiAliasSettings` → `TemporalAntiAliasing`
- `ScreenSpaceAmbientOcclusionSettings` → `ScreenSpaceAmbientOcclusion`
- `ScreenSpaceReflectionsSettings` → `ScreenSpaceReflections`
- `VolumetricFogSettings` → `VolumetricFog`
---------
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
### Builder changes
- Increased meshlet max vertices/triangles from 64v/64t to 255v/128t
(meshoptimizer won't allow 256v sadly). This gives us a much greater
percentage of meshlets with max triangle count (128). Still not perfect,
we still end up with some tiny <=10 triangle meshlets that never really
get simplified, but it's progress.
- Removed the error target limit. Now we allow meshoptimizer to simplify
as much as possible. No reason to cap this out, as the cluster culling
code will choose a good LOD level anyways. Again leads to higher quality
LOD trees.
- After some discussion and consulting the Nanite slides again, changed
meshlet group error from _adding_ the max child's error to the group
error, to doing `group_error = max(group_error, max_child_error)`. Error
is already cumulative between LODs as the edges we're collapsing during
simplification get longer each time.
- Bumped the 65% simplification threshold to allow up to 95% of the
original geometry (e.g. accept simplification as valid even if we only
simplified 5% of the triangles). This gives us closer to
log2(initial_meshlet_count) LOD levels, and fewer meshlet roots in the
DAG.
Still more work to be done in the future here. Maybe trying METIS for
meshlet building instead of meshoptimizer.
Using ~8 clusters per group instead of ~4 might also make a big
difference. The Nanite slides say that they have 8-32 meshlets per
group, suggesting some kind of heuristic. Unfortunately meshopt's
compute_cluster_bounds won't work with large groups atm
(https://github.com/zeux/meshoptimizer/discussions/750#discussioncomment-10562641)
so hard to test.
Based on discussion from
https://github.com/bevyengine/bevy/discussions/14998,
https://github.com/zeux/meshoptimizer/discussions/750, and discord.
### Runtime changes
- cluster:triangle packed IDs are now stored 25:7 instead of 26:6 bits,
as max triangles per cluster are now 128 instead of 64
- Hardware raster now spawns 128 * 3 vertices instead of 64 * 3 vertices
to account for the new max triangles limit
- Hardware raster now outputs NaN triangles (0 / 0) instead of
zero-positioned triangles for extra vertex invocations over the cluster
triangle count. Shouldn't really be a difference idt, but I did it
anyways.
- Software raster now does 128 threads per workgroup instead of 64
threads. Each thread now loads, projects, and caches a vertex (vertices
0-127), and then if needed does so again (vertices 128-254). Each thread
then rasterizes one of 128 triangles.
- Fixed a bug with `needs_dispatch_remap`. I had the condition backwards
in my last PR, I probably committed it by accident after testing the
non-default code path on my GPU.
# Objective
- Fixes#14974
## Solution
- Replace all* instances of `NonZero*` with `NonZero<*>`
## Testing
- CI passed locally.
---
## Notes
Within the `bevy_reflect` implementations for `std` types,
`impl_reflect_value!()` will continue to use the type aliases instead,
as it inappropriately parses the concrete type parameter as a generic
argument. If the `ZeroablePrimitive` trait was stable, or the macro
could be modified to accept a finite list of types, then we could fully
migrate.
# Objective
- Faster meshlet rasterization path for small triangles
- Avoid having to allocate and write out a triangle buffer
- Refactor gpu_scene.rs
## Solution
- Replace the 32bit visbuffer texture with a 64bit visbuffer buffer,
where the left 32 bits encode depth, and the right 32 bits encode the
existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga
doesn't support atomic ops on textures yet.
- Instead of writing out a buffer of packed cluster + triangle IDs (per
triangle) to raster, the culling pass now writes out a buffer of just
cluster IDs (per cluster, so less memory allocated, cheaper to write
out).
- Clusters for software raster are allocated from the left side
- Clusters for hardware raster are allocated in the same buffer, from
the right side
- The buffer size is fixed at MeshletPlugin build time, and should be
set to a reasonable value for your scene (no warning on overflow, and no
good way to determine what value you need outside of renderdoc - I plan
to fix this in a future PR adding a meshlet stats overlay)
- Currently I don't have a heuristic for software vs hardware raster
selection for each cluster. The existing code is just a placeholder. I
need to profile on a release scene and come up with a heuristic,
probably in a future PR.
- The culling shader is getting pretty hard to follow at this point, but
I don't want to spend time improving it as the entire shader/pass is
getting rewritten/replaced in the near future.
- Software raster is a compute workgroup per-cluster. Each workgroup
loads and transforms the <=64 vertices of the cluster, and then
rasterizes the <=64 triangles of the cluster.
- Two variants are implemented: Scanline for clusters with any larger
triangles (still smaller than hardware is good at), and brute-force for
very very tiny triangles
- Once the shader determines that a pixel should be filled in, it does
an atomicMax() on the visbuffer to store the results, copying how Nanite
works
- On devices with a low max workgroups per dispatch limit, an extra
compute pass is inserted before software raster to convert from a 1d to
2d dispatch (I don't think 3d would ever be necessary).
- I haven't implemented the top-left rule or subpixel precision yet, I'm
leaving that for a future PR since I get usable results without it for
now
- Resources used:
https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters
6-8 of
https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index
- Hardware raster now spawns 64*3 vertex invocations per meshlet,
instead of the actual meshlet vertex count. Extra invocations just
early-exit.
- While this is slower than the existing system, hardware draws should
be rare now that software raster is usable, and it saves a ton of memory
using the unified cluster ID buffer. This would be fixed if wgpu had
support for mesh shaders.
- Instead of writing to a color+depth attachment, the hardware raster
pass also does the same atomic visbuffer writes that software raster
uses.
- We have to bind a dummy render target anyways, as wgpu doesn't
currently support render passes without any attachments
- Material IDs are no longer written out during the main rasterization
passes.
- If we had async compute queues, we could overlap the software and
hardware raster passes.
- New material and depth resolve passes run at the end of the visbuffer
node, and write out view depth and material ID depth textures
### Misc changes
- Fixed cluster culling importing, but never actually using the previous
view uniforms when doing occlusion culling
- Fixed incorrectly adding the LOD error twice when building the meshlet
mesh
- Splitup gpu_scene module into meshlet_mesh_manager, instance_manager,
and resource_manager
- resource_manager is still too complex and inefficient (extract and
prepare are way too expensive). I plan on improving this in a future PR,
but for now ResourceManager is mostly a 1:1 port of the leftover
MeshletGpuScene bits.
- Material draw passes have been renamed to the more accurate material
shade pass, as well as some other misc renaming (in the future, these
will be compute shaders even, and not actual draw calls)
---
## Migration Guide
- TBD (ask me at the end of the release for meshlet changes as a whole)
---------
Co-authored-by: vero <email@atlasdostal.com>
# Objective
Fixes#14365
## Migration Guide
- When using the iterator returned by `Mesh::attributes` or
`Mesh::attributes_mut` the first value of the tuple is not the
`MeshVertexAttribute` instead of `MeshVertexAttributeId`. To access the
`MeshVertexAttributeId` use the `MeshVertexAttribute.id` field.
Signed-off-by: Sarthak Singh <sarthak.singh99@gmail.com>
The "uberbuffers" PR #14257 caused some examples to fail intermittently
for different reasons:
1. `morph_targets` could fail because vertex displacements for morph
targets are keyed off the vertex index. With buffer packing, the vertex
index can vary based on the position in the buffer, which caused the
morph targets to be potentially incorrect. The solution is to include
the first vertex index with the `MeshUniform` (and `MeshInputUniform` if
GPU preprocessing is in use), so that the shader can calculate the true
vertex index before performing the morph operation. This results in
wasted space in `MeshUniform`, which is unfortunate, but we'll soon be
filling in the padding with the ID of the material when bindless
textures land, so this had to happen sooner or later anyhow.
Including the vertex index in the `MeshInputUniform` caused an ordering
problem. The `MeshInputUniform` was created during the extraction phase,
before the allocations occurred, so the extraction logic didn't know
where the mesh vertex data was going to end up. The solution is to move
the `MeshInputUniform` creation (the `collect_meshes_for_gpu_building`
system) to after the allocations phase. This should be better for
parallelism anyhow, because it allows the extraction phase to finish
quicker. It's also something we'll have to do for bindless in any event.
2. The `lines` and `fog_volumes` examples could fail because their
custom drawing nodes weren't updated to supply the vertex and index
offsets in their `draw_indexed` and `draw` calls. This commit fixes this
oversight.
Fixes#14366.
Switches `Msaa` from being a globally configured resource to a per
camera view component.
Closes#7194
# Objective
Allow individual views to describe their own MSAA settings. For example,
when rendering to different windows or to different parts of the same
view.
## Solution
Make `Msaa` a component that is required on all camera bundles.
## Testing
Ran a variety of examples to ensure that nothing broke.
TODO:
- [ ] Make sure android still works per previous comment in
`extract_windows`.
---
## Migration Guide
`Msaa` is no longer configured as a global resource, and should be
specified on each spawned camera if a non-default setting is desired.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: François Mockers <francois.mockers@vleue.com>
# Objective
- Fixes: https://github.com/bevyengine/bevy/issues/14036
## Solution
- Add a world space transformation for the environment sample direction.
## Testing
- I have tested the newly added `transform` field using the newly added
`rotate_environment_map` example.
https://github.com/user-attachments/assets/2de77c65-14bc-48ee-b76a-fb4e9782dbdb
## Migration Guide
- Since we have added a new filed to the `EnvironmentMapLight` struct,
users will need to include `..default()` or some rotation value in their
initialization code.
# Objective
- Using bincode to deserialize binary into a MeshletMesh is expensive
(~77ms for a 5mb file).
## Solution
- Write a custom deserializer using bytemuck's Pod types and slice
casting.
- Total asset load time has gone from ~102ms to ~12ms.
- Change some types I never meant to be public to private and other misc
cleanup.
## Testing
- Ran the meshlet example and added timing spans to the asset loader.
---
## Changelog
- Improved `MeshletMesh` loading speed
- The `MeshletMesh` disk format has changed, and
`MESHLET_MESH_ASSET_VERSION` has been bumped
- `MeshletMesh` fields are now private
- Renamed `MeshletMeshSaverLoad` to `MeshletMeshSaverLoader`
- The `Meshlet`, `MeshletBoundingSpheres`, and `MeshletBoundingSphere`
types are now private
- Removed `MeshletMeshSaveOrLoadError::SerializationOrDeserialization`
- Added `MeshletMeshSaveOrLoadError::WrongFileType`
## Migration Guide
- Regenerate your `MeshletMesh` assets, as the disk format has changed,
and `MESHLET_MESH_ASSET_VERSION` has been bumped
- `MeshletMesh` fields are now private
- `MeshletMeshSaverLoad` is now named `MeshletMeshSaverLoader`
- The `Meshlet`, `MeshletBoundingSpheres`, and `MeshletBoundingSphere`
types are now private
- `MeshletMeshSaveOrLoadError::SerializationOrDeserialization` has been
removed
- Added `MeshletMeshSaveOrLoadError::WrongFileType`, match on this
variant if you match on `MeshletMeshSaveOrLoadError`
# Objective
The `AssetReader` trait allows customizing the behavior of fetching
bytes for an `AssetPath`, and expects implementors to return `dyn
AsyncRead + AsyncSeek`. This gives implementors of `AssetLoader` great
flexibility to tightly integrate their asset loading behavior with the
asynchronous task system.
However, almost all implementors of `AssetLoader` don't use the async
functionality at all, and just call `AsyncReadExt::read_to_end(&mut
Vec<u8>)`. This is incredibly inefficient, as this method repeatedly
calls `poll_read` on the trait object, filling the vector 32 bytes at a
time. At my work we have assets that are hundreds of megabytes which
makes this a meaningful overhead.
## Solution
Turn the `Reader` type alias into an actual trait, with a provided
method `read_to_end`. This provided method should be more efficient than
the existing extension method, as the compiler will know the underlying
type of `Reader` when generating this function, which removes the
repeated dynamic dispatches and allows the compiler to make further
optimizations after inlining. Individual implementors are able to
override the provided implementation -- for simple asset readers that
just copy bytes from one buffer to another, this allows removing a large
amount of overhead from the provided implementation.
Now that `Reader` is an actual trait, I also improved the ergonomics for
implementing `AssetReader`. Currently, implementors are expected to box
their reader and return it as a trait object, which adds unnecessary
boilerplate to implementations. This PR changes that trait method to
return a pseudo trait alias, which allows implementors to return `impl
Reader` instead of `Box<dyn Reader>`. Now, the boilerplate for boxing
occurs in `ErasedAssetReader`.
## Testing
I made identical changes to my company's fork of bevy. Our app, which
makes heavy use of `read_to_end` for asset loading, still worked
properly after this. I am not aware if we have a more systematic way of
testing asset loading for correctness.
---
## Migration Guide
The trait method `bevy_asset::io::AssetReader::read` (and `read_meta`)
now return an opaque type instead of a boxed trait object. Implementors
of these methods should change the type signatures appropriately
```rust
impl AssetReader for MyReader {
// Before
async fn read<'a>(&'a self, path: &'a Path) -> Result<Box<Reader<'a>>, AssetReaderError> {
let reader = // construct a reader
Box::new(reader) as Box<Reader<'a>>
}
// After
async fn read<'a>(&'a self, path: &'a Path) -> Result<impl Reader + 'a, AssetReaderError> {
// create a reader
}
}
```
`bevy::asset::io::Reader` is now a trait, rather than a type alias for a
trait object. Implementors of `AssetLoader::load` will need to adjust
the method signature accordingly
```rust
impl AssetLoader for MyLoader {
async fn load<'a>(
&'a self,
// Before:
reader: &'a mut bevy::asset::io::Reader,
// After:
reader: &'a mut dyn bevy::asset::io::Reader,
_: &'a Self::Settings,
load_context: &'a mut LoadContext<'_>,
) -> Result<Self::Asset, Self::Error> {
}
```
Additionally, implementors of `AssetReader` that return a type
implementing `futures_io::AsyncRead` and `AsyncSeek` might need to
explicitly implement `bevy::asset::io::Reader` for that type.
```rust
impl bevy::asset::io::Reader for MyAsyncReadAndSeek {}
```
The comment was incorrect - we are already looking at the pyramid
texture so we do not need to transform the size in any way. Doing that
resulted in a mip that was too fine to be selected in certain cases,
which resulted in a 2x2 pixel footprint not actually fully covering the
cluster sphere - sometimes this could lead to a non-conservative depth
value being computed which resulted in the cluster being marked as
invisible incorrectly.
This change updates meshopt-rs to 0.3 to take advantage of the newly
added sparse simplification mode: by default, simplifier assumes that
the entire mesh is simplified and runs a set of calculations that are
O(vertex count), but in our case we simplify many small mesh subsets
which is inefficient.
Sparse mode instead assumes that the simplified subset is only using a
portion of the vertex buffer, and optimizes accordingly. This changes
the meaning of the error (as it becomes relative to the subset, in our
case a meshlet group); to ensure consistent error selection, we also use
the ErrorAbsolute mode which allows us to operate in mesh coordinate
space.
Additionally, meshopt 0.3 runs optimizeMeshlet automatically as part of
`build_meshlets` so we no longer need to call it ourselves.
This reduces the time to build meshlet representation for Stanford Bunny
mesh from ~1.65s to ~0.45s (3.7x) in optimized builds.
* Fixes https://github.com/bevyengine/bevy/issues/13813
* Fixes https://github.com/bevyengine/bevy/issues/13810
Tested a combined scene with both regular meshes and meshlet meshes
with:
* Regular forward setup
* Forward + normal/motion vector prepasses
* Deferred (with depth prepass since that's required)
* Deferred + depth/normal/motion vector prepasses
Still broken:
* Using meshlet meshes rendering in deferred and regular meshes
rendering in forward + depth/normal prepass. I don't know how to fix
this at the moment, so for now I've just add instructions to not mix
them.
This is a followup to https://github.com/bevyengine/bevy/pull/13904
based on the discussion there, and switches two HashMaps that used
meshlet ids as keys to Vec.
In addition to a small further performance boost for `from_mesh` (1.66s
=> 1.60s), this makes processing deterministic modulo threading issues
wrt CRT rand described in the linked PR. This is valuable for debugging,
as you can visually or programmatically inspect the meshlet distribution
before/after making changes that should not change the output, whereas
previously every asset rebuild would change the meshlet structure.
Tested with https://github.com/bevyengine/bevy/pull/13431; after this
change, the visual output of meshlets is consistent between asset
rebuilds, and the MD5 of the output GLB file does not change either,
which was not the case before.
This change reworks `find_connected_meshlets` to scale more linearly
with the mesh size, which significantly reduces the cost of building
meshlet representations. As a small extra complexity reduction, it moves
`simplify_scale` call out of the loop so that it's called once (it only
depends on the vertex data => is safe to cache).
The new implementation of connectivity analysis builds edge=>meshlet
list data structure, which allows us to only iterate through
`tuple_combinations` of a (usually) small list. There is still some
redundancy as if two meshlets share two edges, they will be represented
in the meshlet lists twice, but it's overall much faster.
Since the hash traversal is non-deterministic, to keep this part of the
algorithm deterministic for reproducible results we sort the output
adjacency lists.
Overall this reduces the time to process bunny mesh from ~4.2s to ~1.7s
when using release; in unoptimized builds the delta is even more
significant.
This was tested by using https://github.com/bevyengine/bevy/pull/13431
and:
a) comparing the result of `find_connected_meshlets` using old and new
code; they are equal in all steps of the clustering process
b) comparing the rendered result of the old code vs new code *after*
making the rest of the algorithm deterministic: right now the loop that
iterates through the result of `group_meshlets()` call executes in
different order between program runs. This is orthogonal to this change
and can be fixed separately.
Note: a future change can shrink the processing time further from ~1.7s
to ~0.4s with a small diff but that requires an update to meshopt crate
which is pending in https://github.com/gwihlidal/meshopt-rs/pull/42.
This change is independent.
# Objective
- Mikktspace requires that we normalize world normals/tangents _before_
interpolation across vertices, and then do _not_ normalize after. I had
it backwards.
- We do not (am not supposed to?) need a second set of barycentrics for
motion vectors. If you think about the typical raster pipeline, in the
vertex shader we calculate previous_world_position, and then it gets
interpolated using the current triangle's barycentrics.
## Solution
- Fix normal/tangent processing
- Reuse barycentrics for motion vector calculations
- Not implementing this for 0.14, but long term I aim to remove explicit
vertex tangents and calculate them in the shader on the fly.
## Testing
- I tested out some of the normal maps we have in repo. Didn't seem to
make a difference, but mikktspace is all about correctness across
various baking tools. I probably just didn't have any of the ones that
would cause it to break.
- Didn't test motion vectors as there's a known bug with the depth
buffer and meshlets that I'm waiting on the render graph rewrite to fix.
* Rename cull_meshlets -> cull_clusters
* Rename meshlet_visible -> cluster_visible
* Add an if statement around meshlet_second_pass_candidates writes,
maybe a small bit of performance.
# Objective
- Fixes#10909
- Fixes#8492
## Solution
- Name all matrices `x_from_y`, for example `world_from_view`.
## Testing
- I've tested most of the 3D examples. The `lighting` example
particularly should hit a lot of the changes and appears to run fine.
---
## Changelog
- Renamed matrices across the engine to follow a `y_from_x` naming,
making the space conversion more obvious.
## Migration Guide
- `Frustum`'s `from_view_projection`, `from_view_projection_custom_far`
and `from_view_projection_no_far` were renamed to
`from_clip_from_world`, `from_clip_from_world_custom_far` and
`from_clip_from_world_no_far`.
- `ComputedCameraValues::projection_matrix` was renamed to
`clip_from_view`.
- `CameraProjection::get_projection_matrix` was renamed to
`get_clip_from_view` (this affects implementations on `Projection`,
`PerspectiveProjection` and `OrthographicProjection`).
- `ViewRangefinder3d::from_view_matrix` was renamed to
`from_world_from_view`.
- `PreviousViewData`'s members were renamed to `view_from_world` and
`clip_from_world`.
- `ExtractedView`'s `projection`, `transform` and `view_projection` were
renamed to `clip_from_view`, `world_from_view` and `clip_from_world`.
- `ViewUniform`'s `view_proj`, `unjittered_view_proj`,
`inverse_view_proj`, `view`, `inverse_view`, `projection` and
`inverse_projection` were renamed to `clip_from_world`,
`unjittered_clip_from_world`, `world_from_clip`, `world_from_view`,
`view_from_world`, `clip_from_view` and `view_from_clip`.
- `GpuDirectionalCascade::view_projection` was renamed to
`clip_from_world`.
- `MeshTransforms`' `transform` and `previous_transform` were renamed to
`world_from_local` and `previous_world_from_local`.
- `MeshUniform`'s `transform`, `previous_transform`,
`inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed
to `world_from_local`, `previous_world_from_local`,
`local_from_world_transpose_a` and `local_from_world_transpose_b` (the
`Mesh` type in WGSL mirrors this, however `transform` and
`previous_transform` were named `model` and `previous_model`).
- `Mesh2dTransforms::transform` was renamed to `world_from_local`.
- `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and
`inverse_transpose_model_b` were renamed to `world_from_local`,
`local_from_world_transpose_a` and `local_from_world_transpose_b` (the
`Mesh2d` type in WGSL mirrors this).
- In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and
`get_previous_model_matrix` were renamed to `get_world_from_local` and
`get_previous_world_from_local`.
- In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed
to `get_world_from_local`.
# Objective
- Using multiple raster passes to generate the depth pyramid is
extremely slow
- Pulling data from the source image is the largest bottleneck, it's
important to sample in a cache-aware pattern
- Barriers and pipeline drain between the raster passes is the second
largest bottleneck
- Each separate RenderPass on the CPU is _really_ expensive
## Solution
- Port [FidelityFX SPD](https://gpuopen.com/fidelityfx-spd) to WGSL,
replacing meshlet's existing multiple raster passes with a ~~single~~
two compute dispatches. Lack of coherent buffers means we have to do the
the last 64x64 tile from mip 7+ in a separate dispatch to ensure the mip
6 writes were flushed :(
- Workgroup shared memory version only at the moment, as the subgroup
operation is blocked by our upgrade to wgpu 0.20 #13186
- Don't enforce a power-of-2 depth pyramid texture size, simply scaling
by 0.5 is fine
# Objective
- Add motion vector support to the skybox
- This fixes the last remaining "gap" to complete the motion blur
feature
## Solution
- Add a pipeline for the skybox to write motion vectors to the prepass
## Testing
- Used examples to test motion vectors using motion blur
https://github.com/bevyengine/bevy/assets/2632925/74c0778a-7e77-4e68-8111-05791e4bfdd2
---------
Co-authored-by: Patrick Walton <pcwalton@mimiga.net>
This commit, a revamp of #12959, implements screen-space reflections
(SSR), which approximate real-time reflections based on raymarching
through the depth buffer and copying samples from the final rendered
frame. This patch is a relatively minimal implementation of SSR, so as
to provide a flexible base on which to customize and build in the
future. However, it's based on the production-quality [raymarching code
by Tomasz
Stachowiak](https://gist.github.com/h3r2tic/9c8356bdaefbe80b1a22ae0aaee192db).
For a general basic overview of screen-space reflections, see
[1](https://lettier.github.io/3d-game-shaders-for-beginners/screen-space-reflection.html).
The raymarching shader uses the basic algorithm of tracing forward in
large steps, refining that trace in smaller increments via binary
search, and then using the secant method. No temporal filtering or
roughness blurring, is performed at all; for this reason, SSR currently
only operates on very shiny surfaces. No acceleration via the
hierarchical Z-buffer is implemented (though note that
https://github.com/bevyengine/bevy/pull/12899 will add the
infrastructure for this). Reflections are traced at full resolution,
which is often considered slow. All of these improvements and more can
be follow-ups.
SSR is built on top of the deferred renderer and is currently only
supported in that mode. Forward screen-space reflections are possible
albeit uncommon (though e.g. *Doom Eternal* uses them); however, they
require tracing from the previous frame, which would add complexity.
This patch leaves the door open to implementing SSR in the forward
rendering path but doesn't itself have such an implementation.
Screen-space reflections aren't supported in WebGL 2, because they
require sampling from the depth buffer, which Naga can't do because of a
bug (`sampler2DShadow` is incorrectly generated instead of `sampler2D`;
this is the same reason why depth of field is disabled on that
platform).
To add screen-space reflections to a camera, use the
`ScreenSpaceReflectionsBundle` bundle or the
`ScreenSpaceReflectionsSettings` component. In addition to
`ScreenSpaceReflectionsSettings`, `DepthPrepass` and `DeferredPrepass`
must also be present for the reflections to show up. The
`ScreenSpaceReflectionsSettings` component contains several settings
that artists can tweak, and also comes with sensible defaults.
A new example, `ssr`, has been added. It's loosely based on the
[three.js ocean
sample](https://threejs.org/examples/webgl_shaders_ocean.html), but all
the assets are original. Note that the three.js demo has no screen-space
reflections and instead renders a mirror world. In contrast to #12959,
this demo tests not only a cube but also a more complex model (the
flight helmet).
## Changelog
### Added
* Screen-space reflections can be enabled for very smooth surfaces by
adding the `ScreenSpaceReflections` component to a camera. Deferred
rendering must be enabled for the reflections to appear.
![Screenshot 2024-05-18
143555](https://github.com/bevyengine/bevy/assets/157897/b8675b39-8a89-433e-a34e-1b9ee1233267)
![Screenshot 2024-05-18
143606](https://github.com/bevyengine/bevy/assets/157897/cc9e1cd0-9951-464a-9a08-e589210e5606)
# Objective
- The volumetric fog PR originally needed to be modified to use
`.view_layouts` but that was changed in another PR. The merge with main
still kept those around.
## Solution
- Remove them because they aren't necessary
This commit implements a more physically-accurate, but slower, form of
fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog*
allows for light beams from directional lights to shine through,
creating what is known as *light shafts* or *god rays*.
To add volumetric fog to a scene, add `VolumetricFogSettings` to the
camera, and add `VolumetricLight` to directional lights that you wish to
be volumetric. `VolumetricFogSettings` has numerous settings that allow
you to define the accuracy of the simulation, as well as the look of the
fog. Currently, only interaction with directional lights that have
shadow maps is supported. Note that the overhead of the effect scales
directly with the number of directional lights in use, so apply
`VolumetricLight` sparingly for the best results.
The overall algorithm, which is implemented as a postprocessing effect,
is a combination of the techniques described in [Scratchapixel] and
[this blog post]. It uses raymarching in screen space, transformed into
shadow map space for sampling and combined with physically-based
modeling of absorption and scattering. Bevy employs the widely-used
[Henyey-Greenstein phase function] to model asymmetry; this essentially
allows light shafts to fade into and out of existence as the user views
them.
Volumetric rendering is a huge subject, and I deliberately kept the
scope of this commit small. Possible follow-ups include:
1. Raymarching at a lower resolution.
2. A post-processing blur (especially useful when combined with (1)).
3. Supporting point lights and spot lights.
4. Supporting lights with no shadow maps.
5. Supporting irradiance volumes and reflection probes.
6. Voxel components that reuse the volumetric fog code to create voxel
shapes.
7. *Horizon: Zero Dawn*-style clouds.
These are all useful, but out of scope of this patch for now, to keep
things tidy and easy to review.
A new example, `volumetric_fog`, has been added to demonstrate the
effect.
## Changelog
### Added
* A new component, `VolumetricFog`, is available, to allow for a more
physically-accurate, but more resource-intensive, form of fog.
* A new component, `VolumetricLight`, can be placed on directional
lights to make them interact with `VolumetricFog`. Notably, this allows
such lights to emit light shafts/god rays.
![Screenshot 2024-04-21
162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9)
![Screenshot 2024-04-21
132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995)
[Scratchapixel]:
https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html
[this blog post]: https://www.alexandre-pestana.com/volumetric-lights/
[Henyey-Greenstein phase function]:
https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
# Objective
- Per-cluster (instance of a meshlet) data upload is ridiculously
expensive in both CPU and GPU time (8 bytes per cluster, millions of
clusters, you very quickly run into PCIE bandwidth maximums, and lots of
CPU-side copies and malloc).
- We need to be uploading only per-instance/entity data. Anything else
needs to be done on the GPU.
## Solution
- Per instance, upload:
- `meshlet_instance_meshlet_counts_prefix_sum` - An exclusive prefix sum
over the count of how many clusters each instance has.
- `meshlet_instance_meshlet_slice_starts` - The starting index of the
meshlets for each instance within the `meshlets` buffer.
- A new `fill_cluster_buffers` pass once at the start of the frame has a
thread per cluster, and finds its instance ID and meshlet ID via a
binary search of `meshlet_instance_meshlet_counts_prefix_sum` to find
what instance it belongs to, and then uses that plus
`meshlet_instance_meshlet_slice_starts` to find what number meshlet
within the instance it is. The shader then writes out the per-cluster
instance/meshlet ID buffers for later passes to quickly read from.
- I've gone from 45 -> 180 FPS in my stress test scene, and saved
~30ms/frame of overall CPU/GPU time.
# Objective
- There is an unfortunate lack of dragons in the meshlet docs.
- Dragons are symbolic of majesty, power, storms, and meshlets.
- A dragon habitat such as our docs requires cultivation to ensure each
winged lizard reaches their fullest, fiery selves.
## Solution
- Fix the link to the dragon image.
- The link originally targeted the `meshlet` branch, but that was later
deleted after it was merged into `main`.
---
## Changelog
- Added a dragon back into the `MeshletPlugin` documentation.
Keeping track of explicit visibility per cluster between frames does not
work with LODs, and leads to worse culling (using the final depth buffer
from the previous frame is more accurate).
Instead, we need to generate a second depth pyramid after the second
raster pass, and then use that in the first culling pass in the next
frame to test if a cluster would have been visible last frame or not.
As part of these changes, the write_index_buffer pass has been folded
into the culling pass for a large performance gain, and to avoid
tracking a lot of extra state that would be needed between passes.
Prepass previous model/view stuff was adapted to work with meshlets as
well.
Also fixed a bug with materials, and other misc improvements.
---------
Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: atlas dostal <rodol@rivalrebels.com>
Co-authored-by: vero <email@atlasdostal.com>
Co-authored-by: Patrick Walton <pcwalton@mimiga.net>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
# Objective
Make visibility system ordering explicit. Fixes#12953.
## Solution
Specify `CheckVisibility` happens after all other `VisibilitySystems`
sets have happened.
---------
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>