# Objective
Reduce the catch-all grab-bag of functionality in bevy_core by moving FloatOrd to bevy_utils.
A step in addressing #2931 and splitting bevy_core into more specific locations.
## Solution
Move FloatOrd into bevy_utils. Fix the compile errors.
As a result, bevy_core_pipeline, bevy_pbr, bevy_sprite, bevy_text, and bevy_ui no longer depend on bevy_core (they were only using it for `FloatOrd` previously).
# Objective
- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes#3605
- Based on top of #4079
This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.
## Solution
- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases
## Context on some decisions / open questions
- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective
Animation with shadows crashes with:
```
thread 'main' panicked at 'wgpu error: Validation Error
Caused by:
In Device::create_render_pipeline
note: label = `shadow_pipeline`
error matching VERTEX shader requirements against the pipeline
shader global ResourceBinding { group: 1, binding: 1 } is not available in the layout pipeline layout
visibility flags don't include the shader stage
```
Animation with wireframe crashes with:
```
thread 'main' panicked at 'wgpu error: Validation Error
Caused by:
In Device::create_render_pipeline
note: label = `opaque_mesh_pipeline`
error matching VERTEX shader requirements against the pipeline
shader global ResourceBinding { group: 2, binding: 0 } is not available in the layout pipeline layout
binding is missing from the pipeline layout
```
## Solution
- Fix the bindings
# Objective
Load skeletal weights and indices from GLTF files. Animate meshes.
## Solution
- Load skeletal weights and indices from GLTF files.
- Added `SkinnedMesh` component and ` SkinnedMeshInverseBindPose` asset
- Added `extract_skinned_meshes` to extract joint matrices.
- Added queue phase systems for enqueuing the buffer writes.
Some notes:
- This ports part of # #2359 to the current main.
- This generates new `BufferVec`s and bind groups every frame. The expectation here is that the number of `Query::get` calls during extract is probably going to be the stronger bottleneck, with up to 256 calls per skinned mesh. Until that is optimized, caching buffers and bind groups is probably a non-concern.
- Unfortunately, due to the uniform size requirements, this means a 16KB buffer is allocated for every skinned mesh every frame. There's probably a few ways to get around this, but most of them require either compute shaders or storage buffers, which are both incompatible with WebGL2.
Co-authored-by: james7132 <contact@jamessliu.com>
Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: James Liu <contact@jamessliu.com>
* Refactor assign_lights_to_clusters to always clear + update clusters, even if the screen size isn't available yet / is zero. This fixes#4167. We still avoid the "expensive" per-light work when the screen size isn't available yet. I also consolidated some logic to eliminate some redundancies.
* Removed _a ton_ of (potentially very large) per-frame reallocations
* Removed `Res<VisiblePointLights>` (a vec) in favor of `Res<GlobalVisiblePointLights>` (a hashmap). We were allocating a new hashmap every frame, the collecting it into a vec every frame, then in another system _re-generating the hashmap_. It is always used like a hashmap, might as well embrace that. We now reuse the same hashmap every frame and dont use any intermediate collections.
* We were re-allocating Clusters aabb and light vectors every frame by re-constructing Clusters every frame. We now re-use the existing collections.
* Reuse per-camera VisiblePointLight vecs when possible instead of allocating them every frame. We now only insert VisiblePointLights if the component doesn't exist yet.
# Objective
- Fixes#3970
- To support Bevy's shader abstraction(shader defs, shader imports and hot shader reloading) for compute shaders, I have followed carts advice and change the `PipelinenCache` to accommodate both compute and render pipelines.
## Solution
- renamed `RenderPipelineCache` to `PipelineCache`
- Cached Pipelines are now represented by an enum (render, compute)
- split the `SpecializedPipelines` into `SpecializedRenderPipelines` and `SpecializedComputePipelines`
- updated the game of life example
## Open Questions
- should `SpecializedRenderPipelines` and `SpecializedComputePipelines` be merged and how would we do that?
- should the `get_render_pipeline` and `get_compute_pipeline` methods be merged?
- is pipeline specialization for different entry points a good pattern
Co-authored-by: Kurt Kühnert <51823519+Ku95@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective
provide some customisation for default cluster setup
avoid "cluster index lists is full" in all cases (using a strategy outlined by @superdump)
## Solution
Add ClusterConfig enum (which can be inserted into a view at any time) to allow specifying cluster setup with variants:
- None (do not do any light assignment - for views which do not require light info, e.g. minimaps etc)
- Single (one cluster)
- XYZ (explicit cluster counts in each dimension)
- FixedZ (most similar to current - specify Z-slices and total, then x and y counts are dynamically determined to give approximately square clusters based on current aspect ratio)
Defaults to FixedZ { total: 4096, z: 24 } which is similar to the current setup.
Per frame, estimate the number of indices that would be required for the current config and decrease the cluster counts / increase the cluster sizes in the x and y dimensions if the index list would be too small.
notes:
- I didn't put ClusterConfig in the camera bundles to avoid introducing a dependency from bevy_render to bevy_pbr. the ClusterConfig enum comes with a pbr-centric impl block so i didn't want to move that into bevy_render either.
- ~Might want to add None variant to cluster config for views that don't care about lights?~
- Not well tested for orthographic
- ~there's a cluster_muck branch on my repo which includes some diagnostics / a modified lighting example which may be useful for tyre-kicking~ (outdated, i will bring it up to date if required)
anecdotal timings:
FPS on the lighting demo is negligibly better (~5%), maybe due to a small optimisation constraining the light aabb to be in front of the camera
FPS on the lighting demo with 100 extra lights added is ~33% faster, and also renders correctly as the cluster index count is no longer exceeded
# Objective
Add Visibility for lights
## Solution
- add Visibility to PointLightBundle and DirectionLightBundle
- filter lights used by Visibility.is_visible
note: includes changes from #3916 due to overlap, will be cleaner after that is merged
# Objective
fix#3915
## Solution
the issues are caused by
- lights are assigned to clusters before being filtered down to MAX_POINT_LIGHTS, leading to cluster counts potentially being too high
- after fixing the above, packing the count into 8 bits still causes overflow with exactly 256 lights affecting a cluster
to fix:
```assign_lights_to_clusters```
- limit extracted lights to MAX_POINT_LIGHTS, selecting based on shadow-caster & intensity (if required)
- warn if MAX_POINT_LIGHT count is exceeded
```prepare_lights```
- limit the lights assigned to a cluster to CLUSTER_COUNT_MASK (which is 1 less than MAX_POINT_LIGHTS) to avoid overflowing into the offset bits
notes:
- a better solution to the overflow may be to use more than 8 bits for cluster_count (the comment states only 14 of the remaining 24 bits are used for the offset). this would touch more of the code base but i'm happy to try if it has some benefit.
- intensity is only one way to select lights. it may be worth allowing user configuration of the light filtering, but i can't see a clean way to do that
# Objective
- In the large majority of cases, users were calling `.unwrap()` immediately after `.get_resource`.
- Attempting to add more helpful error messages here resulted in endless manual boilerplate (see #3899 and the linked PRs).
## Solution
- Add an infallible variant named `.resource` and so on.
- Use these infallible variants over `.get_resource().unwrap()` across the code base.
## Notes
I did not provide equivalent methods on `WorldCell`, in favor of removing it entirely in #3939.
## Migration Guide
Infallible variants of `.get_resource` have been added that implicitly panic, rather than needing to be unwrapped.
Replace `world.get_resource::<Foo>().unwrap()` with `world.resource::<Foo>()`.
## Impact
- `.unwrap` search results before: 1084
- `.unwrap` search results after: 942
- internal `unwrap_or_else` calls added: 4
- trivial unwrap calls removed from tests and code: 146
- uses of the new `try_get_resource` API: 11
- percentage of the time the unwrapping API was used internally: 93%
This PR makes a number of changes to how meshes and vertex attributes are handled, which the goal of enabling easy and flexible custom vertex attributes:
* Reworks the `Mesh` type to use the newly added `VertexAttribute` internally
* `VertexAttribute` defines the name, a unique `VertexAttributeId`, and a `VertexFormat`
* `VertexAttributeId` is used to produce consistent sort orders for vertex buffer generation, replacing the more expensive and often surprising "name based sorting"
* Meshes can be used to generate a `MeshVertexBufferLayout`, which defines the layout of the gpu buffer produced by the mesh. `MeshVertexBufferLayouts` can then be used to generate actual `VertexBufferLayouts` according to the requirements of a specific pipeline. This decoupling of "mesh layout" vs "pipeline vertex buffer layout" is what enables custom attributes. We don't need to standardize _mesh layouts_ or contort meshes to meet the needs of a specific pipeline. As long as the mesh has what the pipeline needs, it will work transparently.
* Mesh-based pipelines now specialize on `&MeshVertexBufferLayout` via the new `SpecializedMeshPipeline` trait (which behaves like `SpecializedPipeline`, but adds `&MeshVertexBufferLayout`). The integrity of the pipeline cache is maintained because the `MeshVertexBufferLayout` is treated as part of the key (which is fully abstracted from implementers of the trait ... no need to add any additional info to the specialization key).
* Hashing `MeshVertexBufferLayout` is too expensive to do for every entity, every frame. To make this scalable, I added a generalized "pre-hashing" solution to `bevy_utils`: `Hashed<T>` keys and `PreHashMap<K, V>` (which uses `Hashed<T>` internally) . Why didn't I just do the quick and dirty in-place "pre-compute hash and use that u64 as a key in a hashmap" that we've done in the past? Because its wrong! Hashes by themselves aren't enough because two different values can produce the same hash. Re-hashing a hash is even worse! I decided to build a generalized solution because this pattern has come up in the past and we've chosen to do the wrong thing. Now we can do the right thing! This did unfortunately require pulling in `hashbrown` and using that in `bevy_utils`, because avoiding re-hashes requires the `raw_entry_mut` api, which isn't stabilized yet (and may never be ... `entry_ref` has favor now, but also isn't available yet). If std's HashMap ever provides the tools we need, we can move back to that. Note that adding `hashbrown` doesn't increase our dependency count because it was already in our tree. I will probably break these changes out into their own PR.
* Specializing on `MeshVertexBufferLayout` has one non-obvious behavior: it can produce identical pipelines for two different MeshVertexBufferLayouts. To optimize the number of active pipelines / reduce re-binds while drawing, I de-duplicate pipelines post-specialization using the final `VertexBufferLayout` as the key. For example, consider a pipeline that needs the layout `(position, normal)` and is specialized using two meshes: `(position, normal, uv)` and `(position, normal, other_vec2)`. If both of these meshes result in `(position, normal)` specializations, we can use the same pipeline! Now we do. Cool!
To briefly illustrate, this is what the relevant section of `MeshPipeline`'s specialization code looks like now:
```rust
impl SpecializedMeshPipeline for MeshPipeline {
type Key = MeshPipelineKey;
fn specialize(
&self,
key: Self::Key,
layout: &MeshVertexBufferLayout,
) -> RenderPipelineDescriptor {
let mut vertex_attributes = vec![
Mesh::ATTRIBUTE_POSITION.at_shader_location(0),
Mesh::ATTRIBUTE_NORMAL.at_shader_location(1),
Mesh::ATTRIBUTE_UV_0.at_shader_location(2),
];
let mut shader_defs = Vec::new();
if layout.contains(Mesh::ATTRIBUTE_TANGENT) {
shader_defs.push(String::from("VERTEX_TANGENTS"));
vertex_attributes.push(Mesh::ATTRIBUTE_TANGENT.at_shader_location(3));
}
let vertex_buffer_layout = layout
.get_layout(&vertex_attributes)
.expect("Mesh is missing a vertex attribute");
```
Notice that this is _much_ simpler than it was before. And now any mesh with any layout can be used with this pipeline, provided it has vertex postions, normals, and uvs. We even got to remove `HAS_TANGENTS` from MeshPipelineKey and `has_tangents` from `GpuMesh`, because that information is redundant with `MeshVertexBufferLayout`.
This is still a draft because I still need to:
* Add more docs
* Experiment with adding error handling to mesh pipeline specialization (which would print errors at runtime when a mesh is missing a vertex attribute required by a pipeline). If it doesn't tank perf, we'll keep it.
* Consider breaking out the PreHash / hashbrown changes into a separate PR.
* Add an example illustrating this change
* Verify that the "mesh-specialized pipeline de-duplication code" works properly
Please dont yell at me for not doing these things yet :) Just trying to get this in peoples' hands asap.
Alternative to #3120Fixes#3030
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective
- `WgpuOptions` is mutated to be updated with the actual device limits and features, but this information is readily available to both the main and render worlds through the `RenderDevice` which has .limits() and .features() methods
- Information about the adapter in terms of its name, the backend in use, etc were not being exposed but have clear use cases for being used to take decisions about what rendering code to use. For example, if something works well on AMD GPUs but poorly on Intel GPUs. Or perhaps something works well in Vulkan but poorly in DX12.
## Solution
- Stop mutating `WgpuOptions `and don't insert the updated values into the main and render worlds
- Return `AdapterInfo` from `initialize_renderer` and insert it into the main and render worlds
- Use `RenderDevice` limits in the lighting code that was using `WgpuOptions.limits`.
- Renamed `WgpuOptions` to `WgpuSettings`
What is says on the tin.
This has got more to do with making `clippy` slightly more *quiet* than it does with changing anything that might greatly impact readability or performance.
that said, deriving `Default` for a couple of structs is a nice easy win
# Objective
- Do not panic when mroe than 256 point lights are added the scene
- Fixes https://github.com/bevyengine/bevy/issues/3682
## Solution
- Only iterate the first `MAX_POINT_LIGHTS` lights instead of as many as there are
## Open questions
- Should we warn that there are more than the maximum allowed number of point lights in the scene?
# Objective
- Using plain exponential depth slicing for perspective projection cameras results in unnecessarily many slices very close together close to the camera. If the camera is then moved close to a collection of point lights, they will likely exhaust the available uniform buffer space for the lists of which lights affect which clusters.
## Solution
- A simple solution to this is to use a different near plane value for the depth slicing and set it to where the first slice's far plane should be. The default value is 5 and works well. This results in the configured number of depth slices, maintains the exponential slicing beyond the initial slice, and no slices are too small such that they cause problems that are sensitive to the view position.
# Objective
- Allow the user to specify the priority when configuring wgpu features/limits and by default use the maximum capabilities of the chosen adapter.
## Solution
- Add a `WgpuOptionsPriority` enum with `Compatibility`, `Functionality` and `WebGL2` options.
- Add a `priority: WgpuOptionsPriority` member to `WgpuOptions`.
- When initialising the renderer, if `WgpuOptions::priority == WgpuOptionsPriority::Functionality`, query the adapter for the available features and limits, use them when creating a device, and update `WgpuOptions` with those values. If `Compatibility` use the behaviour as before this PR. If `WebGL2` then use the WebGL2 downlevel limits as used when when building for wasm, for convenience of testing WebGL2 limits without having to build for wasm.
- Add an environment variable `WGPU_OPTIONS_PRIO` that takes `compatibility`, `functionality`, `webgl2`.
- Default to `WgpuOptionsPriority::Functionality`.
- Insert updated `WgpuOptions` into render app world as well. This is useful for applying the limits when rendering, such as limiting the directional light shadow map texture to 2048x2048 when using WebGL2 downlevel limits but not on wasm.
- Reduced `draw_state` logs from `debug` to `trace` and added `debug` level logs for the wgpu features and limits. Use `RUST_LOG=bevy_render=debug` to see the output.
# Objective
- Our crevice is still called "crevice", which we can't use for a release
- Users would need to use our "crevice" directly to be able to use the derive macro
## Solution
- Rename crevice to bevy_crevice, and crevice-derive to bevy-crevice-derive
- Re-export it from bevy_render, and use it from bevy_render everywhere
- Fix derive macro to work either from bevy_render, from bevy_crevice, or from bevy
## Remaining
- It is currently re-exported as `bevy::render::bevy_crevice`, is it the path we want?
- After a brief suggestion to Cart, I changed the version to follow Bevy version instead of crevice, do we want that?
- Crevice README.md need to be updated
- in the `Cargo.toml`, there are a few things to change. How do we want to change them? How do we keep attributions to original Crevice?
```
authors = ["Lucien Greathouse <me@lpghatguy.com>"]
documentation = "https://docs.rs/crevice"
homepage = "https://github.com/LPGhatguy/crevice"
repository = "https://github.com/LPGhatguy/crevice"
```
Co-authored-by: François <8672791+mockersf@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective
- 3d examples fail to run in webgl2 because of unsupported texture formats or texture too large
## Solution
- switch to supported formats if a feature is enabled. I choose a feature instead of a build target to not conflict with a potential webgpu support
Very inspired by 6813b2edc5, and need #3290 to work.
I named the feature `webgl2`, but it's only needed if one want to use PBR in webgl2. Examples using only 2D already work.
Co-authored-by: François <8672791+mockersf@users.noreply.github.com>
# Objective
Fixes#3352Fixes#3208
## Solution
- Update wgpu to 0.12
- Update naga to 0.8
- Resolve compilation errors
- Remove [[block]] from WGSL shaders (because it is depracated and now wgpu cant parse it)
- Replace `elseif` with `else if` in pbr.wgsl
# Objective
PBR lighting was broken in the new renderer when using orthographic projections due to the way the depth slicing works for the clusters. Fix it.
## Solution
- The default orthographic projection near plane is 0.0. The perspective projection depth slicing does a division by the near plane which gives a floating point NaN and the clustering all breaks down.
- Orthographic projections have a linear depth mapping, so it made intuitive sense to me to do depth slicing with a linear mapping too. The alternative I saw was to try to handle the near plane being at 0.0 and using the exponential depth slicing, but that felt like a hack that didn't make sense.
- As such, I have added code that detects whether the projection is orthographic based on `projection[3][3] == 1.0` and then implemented the orthographic mapping case throughout (when computing cluster AABBs, and when mapping a view space position (or light) to a cluster id in both the rust and shader code).
## Screenshots
Before:
![before](https://user-images.githubusercontent.com/302146/145847278-5b1bca74-fbad-4cc5-8b49-384f6a377fdc.png)
After:
<img width="1392" alt="Screenshot 2021-12-13 at 16 36 53" src="https://user-images.githubusercontent.com/302146/145847314-6f3a2035-5d87-4896-8032-0c3e35e15b7d.png">
Old renderer (slightly lighter due to slight difference in configured intensity):
<img width="1392" alt="Screenshot 2021-12-13 at 16 42 23" src="https://user-images.githubusercontent.com/302146/145847391-6a5e6fe0-22da-4fc1-a6c7-440543689a63.png">
This makes the [New Bevy Renderer](#2535) the default (and only) renderer. The new renderer isn't _quite_ ready for the final release yet, but I want as many people as possible to start testing it so we can identify bugs and address feedback prior to release.
The examples are all ported over and operational with a few exceptions:
* I removed a good portion of the examples in the `shader` folder. We still have some work to do in order to make these examples possible / ergonomic / worthwhile: #3120 and "high level shader material plugins" are the big ones. This is a temporary measure.
* Temporarily removed the multiple_windows example: doing this properly in the new renderer will require the upcoming "render targets" changes. Same goes for the render_to_texture example.
* Removed z_sort_debug: entity visibility sort info is no longer available in app logic. we could do this on the "render app" side, but i dont consider it a priority.
2021-12-14 03:58:23 +00:00
Renamed from pipelined/bevy_pbr2/src/render/light.rs (Browse further)