bevy/crates/bevy_pbr/src/render/light.rs

1546 lines
59 KiB
Rust
Raw Normal View History

2024-06-27 16:13:03 +00:00
use bevy_asset::UntypedAssetId;
use bevy_color::ColorToComponents;
use bevy_core_pipeline::core_3d::{Camera3d, CORE_3D_DEPTH_FORMAT};
use bevy_ecs::{
entity::{EntityHashMap, EntityHashSet},
prelude::*,
system::lifetimeless::Read,
};
use bevy_math::{ops, Mat4, UVec4, Vec2, Vec3, Vec3Swizzles, Vec4, Vec4Swizzles};
use bevy_render::{
diagnostic::RecordDiagnostics,
Pack multiple vertex and index arrays together into growable buffers. (#14257) This commit uses the [`offset-allocator`] crate to combine vertex and index arrays from different meshes into single buffers. Since the primary source of `wgpu` overhead is from validation and synchronization when switching buffers, this significantly improves Bevy's rendering performance on many scenes. This patch is a more flexible version of #13218, which also used slabs. Unlike #13218, which used slabs of a fixed size, this commit implements slabs that start small and can grow. In addition to reducing memory usage, supporting slab growth reduces the number of vertex and index buffer switches that need to happen during rendering, leading to improved performance. To prevent pathological fragmentation behavior, slabs are capped to a maximum size, and mesh arrays that are too large get their own dedicated slabs. As an additional improvement over #13218, this commit allows the application to customize all allocator heuristics. The `MeshAllocatorSettings` resource contains values that adjust the minimum and maximum slab sizes, the cutoff point at which meshes get their own dedicated slabs, and the rate at which slabs grow. Hopefully-sensible defaults have been chosen for each value. Unfortunately, WebGL 2 doesn't support the *base vertex* feature, which is necessary to pack vertex arrays from different meshes into the same buffer. `wgpu` represents this restriction as the downlevel flag `BASE_VERTEX`. This patch detects that bit and ensures that all vertex buffers get dedicated slabs on that platform. Even on WebGL 2, though, we can combine all *index* arrays into single buffers to reduce buffer changes, and we do so. The following measurements are on Bistro: Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup): ![Screenshot 2024-07-09 163521](https://github.com/bevyengine/bevy/assets/157897/5d83c824-c0ee-434c-bbaf-218ff7212c48) Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup): ![Screenshot 2024-07-09 163559](https://github.com/bevyengine/bevy/assets/157897/d94e2273-c3a0-496a-9f88-20d394129610) Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup): ![Screenshot 2024-07-09 163536](https://github.com/bevyengine/bevy/assets/157897/e4ef6e48-d60e-44ae-9a71-b9a731c99d9a) ## Migration Guide ### Changed * Vertex and index buffers for meshes may now be packed alongside other buffers, for performance. * `GpuMesh` has been renamed to `RenderMesh`, to reflect the fact that it no longer directly stores handles to GPU objects. * Because meshes no longer have their own vertex and index buffers, the responsibility for the buffers has moved from `GpuMesh` (now called `RenderMesh`) to the `MeshAllocator` resource. To access the vertex data for a mesh, use `MeshAllocator::mesh_vertex_slice`. To access the index data for a mesh, use `MeshAllocator::mesh_index_slice`. [`offset-allocator`]: https://github.com/pcwalton/offset-allocator
2024-07-16 20:33:15 +00:00
mesh::RenderMesh,
primitives::{CascadesFrusta, CubemapFrusta, Frustum, HalfSpace},
render_asset::RenderAssets,
Make render graph slots optional for most cases (#8109) # Objective - Currently, the render graph slots are only used to pass the view_entity around. This introduces significant boilerplate for very little value. Instead of using slots for this, make the view_entity part of the `RenderGraphContext`. This also means we won't need to have `IN_VIEW` on every node and and we'll be able to use the default impl of `Node::input()`. ## Solution - Add `view_entity: Option<Entity>` to the `RenderGraphContext` - Update all nodes to use this instead of entity slot input --- ## Changelog - Add optional `view_entity` to `RenderGraphContext` ## Migration Guide You can now get the view_entity directly from the `RenderGraphContext`. When implementing the Node: ```rust // 0.10 struct FooNode; impl FooNode { const IN_VIEW: &'static str = "view"; } impl Node for FooNode { fn input(&self) -> Vec<SlotInfo> { vec![SlotInfo::new(Self::IN_VIEW, SlotType::Entity)] } fn run( &self, graph: &mut RenderGraphContext, // ... ) -> Result<(), NodeRunError> { let view_entity = graph.get_input_entity(Self::IN_VIEW)?; // ... Ok(()) } } // 0.11 struct FooNode; impl Node for FooNode { fn run( &self, graph: &mut RenderGraphContext, // ... ) -> Result<(), NodeRunError> { let view_entity = graph.view_entity(); // ... Ok(()) } } ``` When adding the node to the graph, you don't need to specify a slot_edge for the view_entity. ```rust // 0.10 let mut graph = RenderGraph::default(); graph.add_node(FooNode::NAME, node); let input_node_id = draw_2d_graph.set_input(vec![SlotInfo::new( graph::input::VIEW_ENTITY, SlotType::Entity, )]); graph.add_slot_edge( input_node_id, graph::input::VIEW_ENTITY, FooNode::NAME, FooNode::IN_VIEW, ); // add_node_edge ... // 0.11 let mut graph = RenderGraph::default(); graph.add_node(FooNode::NAME, node); // add_node_edge ... ``` ## Notes This PR paired with #8007 will help reduce a lot of annoying boilerplate with the render nodes. Depending on which one gets merged first. It will require a bit of clean up work to make both compatible. I tagged this as a breaking change, because using the old system to get the view_entity will break things because it's not a node input slot anymore. ## Notes for reviewers A lot of the diffs are just removing the slots in every nodes and graph creation. The important part is mostly in the graph_runner/CameraDriverNode.
2023-03-21 20:11:13 +00:00
render_graph::{Node, NodeRunError, RenderGraphContext},
render_phase::*,
render_resource::*,
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
renderer::{RenderContext, RenderDevice, RenderQueue},
2021-06-02 02:59:17 +00:00
texture::*,
view::{ExtractedView, RenderLayers, ViewVisibility},
Make `RenderStage::Extract` run on the render world (#4402) # Objective - Currently, the `Extract` `RenderStage` is executed on the main world, with the render world available as a resource. - However, when needing access to resources in the render world (e.g. to mutate them), the only way to do so was to get exclusive access to the whole `RenderWorld` resource. - This meant that effectively only one extract which wrote to resources could run at a time. - We didn't previously make `Extract`ing writing to the world a non-happy path, even though we want to discourage that. ## Solution - Move the extract stage to run on the render world. - Add the main world as a `MainWorld` resource. - Add an `Extract` `SystemParam` as a convenience to access a (read only) `SystemParam` in the main world during `Extract`. ## Future work It should be possible to avoid needing to use `get_or_spawn` for the render commands, since now the `Commands`' `Entities` matches up with the world being executed on. We need to determine how this interacts with https://github.com/bevyengine/bevy/pull/3519 It's theoretically possible to remove the need for the `value` method on `Extract`. However, that requires slightly changing the `SystemParam` interface, which would make it more complicated. That would probably mess up the `SystemState` api too. ## Todo I still need to add doc comments to `Extract`. --- ## Changelog ### Changed - The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. Resources on the render world can now be accessed using `ResMut` during extract. ### Removed - `Commands::spawn_and_forget`. Use `Commands::get_or_spawn(e).insert_bundle(bundle)` instead ## Migration Guide The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. `Extract` takes a single type parameter, which is any system parameter (such as `Res`, `Query` etc.). It will extract this from the main world, and returns the result of this extraction when `value` is called on it. For example, if previously your extract system looked like: ```rust fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { for cloud in clouds.iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` the new version would be: ```rust fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` The diff is: ```diff --- a/src/clouds.rs +++ b/src/clouds.rs @@ -1,5 +1,5 @@ -fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { - for cloud in clouds.iter() { +fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { + for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` You can now also access resources from the render world using the normal system parameters during `Extract`: ```rust fn extract_assets(mut render_assets: ResMut<MyAssets>, source_assets: Extract<Res<MyAssets>>) { *render_assets = source_assets.clone(); } ``` Please note that all existing extract systems need to be updated to match this new style; even if they currently compile they will not run as expected. A warning will be emitted on a best-effort basis if this is not met. Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-08 23:56:33 +00:00
Extract,
2021-06-02 02:59:17 +00:00
};
use bevy_transform::{components::GlobalTransform, prelude::Transform};
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
#[cfg(feature = "trace")]
use bevy_utils::tracing::info_span;
use bevy_utils::{
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
default,
tracing::{error, warn},
};
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
use core::{hash::Hash, ops::Range};
2021-06-02 02:59:17 +00:00
use crate::*;
2021-11-22 23:16:36 +00:00
#[derive(Component)]
2021-06-02 02:59:17 +00:00
pub struct ExtractedPointLight {
Migrate from `LegacyColor` to `bevy_color::Color` (#12163) # Objective - As part of the migration process we need to a) see the end effect of the migration on user ergonomics b) check for serious perf regressions c) actually migrate the code - To accomplish this, I'm going to attempt to migrate all of the remaining user-facing usages of `LegacyColor` in one PR, being careful to keep a clean commit history. - Fixes #12056. ## Solution I've chosen to use the polymorphic `Color` type as our standard user-facing API. - [x] Migrate `bevy_gizmos`. - [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs - [x] Migrate sprites - [x] Migrate UI - [x] Migrate `ColorMaterial` - [x] Migrate `MaterialMesh2D` - [x] Migrate fog - [x] Migrate lights - [x] Migrate StandardMaterial - [x] Migrate wireframes - [x] Migrate clear color - [x] Migrate text - [x] Migrate gltf loader - [x] Register color types for reflection - [x] Remove `LegacyColor` - [x] Make sure CI passes Incidental improvements to ease migration: - added `Color::srgba_u8`, `Color::srgba_from_array` and friends - added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the `Alpha` trait - add and immediately deprecate (lol) `Color::rgb` and friends in favor of more explicit and consistent `Color::srgb` - standardized on white and black for most example text colors - added vector field traits to `LinearRgba`: ~~`Add`, `Sub`, `AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications and divisions do not scale alpha. `Add` and `Sub` have been cut from this PR. - added `LinearRgba` and `Srgba` `RED/GREEN/BLUE` - added `LinearRgba_to_f32_array` and `LinearRgba::to_u32` ## Migration Guide Bevy's color types have changed! Wherever you used a `bevy::render::Color`, a `bevy::color::Color` is used instead. These are quite similar! Both are enums storing a color in a specific color space (or to be more precise, using a specific color model). However, each of the different color models now has its own type. TODO... - `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`, `Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`, `Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`. - `Color::set_a` and `Color::a` is now `Color::set_alpha` and `Color::alpha`. These are part of the `Alpha` trait in `bevy_color`. - `Color::is_fully_transparent` is now part of the `Alpha` trait in `bevy_color` - `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for `g`, `b` `h`, `s` and `l` have been removed due to causing silent relatively expensive conversions. Convert your `Color` into the desired color space, perform your operations there, and then convert it back into a polymorphic `Color` enum. - `Color::hex` is now `Srgba::hex`. Call `.into` or construct a `Color::Srgba` variant manually to convert it. - `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`, `ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now store a `LinearRgba`, rather than a polymorphic `Color` - `Color::rgb_linear` and `Color::rgba_linear` are now `Color::linear_rgb` and `Color::linear_rgba` - The various CSS color constants are no longer stored directly on `Color`. Instead, they're defined in the `Srgba` color space, and accessed via `bevy::color::palettes::css`. Call `.into()` on them to convert them into a `Color` for quick debugging use, and consider using the much prettier `tailwind` palette for prototyping. - The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with the standard naming. - Vector field arithmetic operations on `Color` (add, subtract, multiply and divide by a f32) have been removed. Instead, convert your colors into `LinearRgba` space, and perform your operations explicitly there. This is particularly relevant when working with emissive or HDR colors, whose color channel values are routinely outside of the ordinary 0 to 1 range. - `Color::as_linear_rgba_f32` has been removed. Call `LinearRgba::to_f32_array` instead, converting if needed. - `Color::as_linear_rgba_u32` has been removed. Call `LinearRgba::to_u32` instead, converting if needed. - Several other color conversion methods to transform LCH or HSL colors into float arrays or `Vec` types have been removed. Please reimplement these externally or open a PR to re-add them if you found them particularly useful. - Various methods on `Color` such as `rgb` or `hsl` to convert the color into a specific color space have been removed. Convert into `LinearRgba`, then to the color space of your choice. - Various implicitly-converting color value methods on `Color` such as `r`, `g`, `b` or `h` have been removed. Please convert it into the color space of your choice, then check these properties. - `Color` no longer implements `AsBindGroup`. Store a `LinearRgba` internally instead to avoid conversion costs. --------- Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Afonso Lage <lage.afonso@gmail.com> Co-authored-by: Rob Parrett <robparrett@gmail.com> Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
pub color: LinearRgba,
bevy_pbr2: Improve lighting units and documentation (#2704) # Objective A question was raised on Discord about the units of the `PointLight` `intensity` member. After digging around in the bevy_pbr2 source code and [Google Filament documentation](https://google.github.io/filament/Filament.html#mjx-eqn-pointLightLuminousPower) I discovered that the intention by Filament was that the 'intensity' value for point lights would be in lumens. This makes a lot of sense as these are quite relatable units given basically all light bulbs I've seen sold over the past years are rated in lumens as people move away from thinking about how bright a bulb is relative to a non-halogen incandescent bulb. However, it seems that the derivation of the conversion between luminous power (lumens, denoted `Φ` in the Filament formulae) and luminous intensity (lumens per steradian, `I` in the Filament formulae) was missed and I can see why as it is tucked right under equation 58 at the link above. As such, while the formula states that for a point light, `I = Φ / 4 π` we have been using `intensity` as if it were luminous intensity `I`. Before this PR, the intensity field is luminous intensity in lumens per steradian. After this PR, the intensity field is luminous power in lumens, [as suggested by Filament](https://google.github.io/filament/Filament.html#table_lighttypesunits) (unfortunately the link jumps to the table's caption so scroll up to see the actual table). I appreciate that it may be confusing to call this an intensity, but I think this is intended as more of a non-scientific, human-relatable general term with a bit of hand waving so that most light types can just have an intensity field and for most of them it works in the same way or at least with some relatable value. I'm inclined to think this is reasonable rather than throwing terms like luminous power, luminous intensity, blah at users. ## Solution - Documented the `PointLight` `intensity` member as 'luminous power' in units of lumens. - Added a table of examples relating from various types of household lighting to lumen values. - Added in the mapping from luminous power to luminous intensity when premultiplying the intensity into the colour before it is made into a graphics uniform. - Updated the documentation in `pbr.wgsl` to clarify the earlier confusion about the missing `/ 4 π`. - Bumped the intensity of the point lights in `3d_scene_pipelined` to 1600 lumens. Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-08-23 23:48:11 +00:00
/// luminous intensity in lumens per steradian
pub intensity: f32,
pub range: f32,
pub radius: f32,
pub transform: GlobalTransform,
pub shadows_enabled: bool,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
pub soft_shadows_enabled: bool,
pub shadow_depth_bias: f32,
pub shadow_normal_bias: f32,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
pub shadow_map_near_z: f32,
pub spot_light_angles: Option<(f32, f32)>,
pub volumetric: bool,
}
#[derive(Component, Debug)]
pub struct ExtractedDirectionalLight {
Migrate from `LegacyColor` to `bevy_color::Color` (#12163) # Objective - As part of the migration process we need to a) see the end effect of the migration on user ergonomics b) check for serious perf regressions c) actually migrate the code - To accomplish this, I'm going to attempt to migrate all of the remaining user-facing usages of `LegacyColor` in one PR, being careful to keep a clean commit history. - Fixes #12056. ## Solution I've chosen to use the polymorphic `Color` type as our standard user-facing API. - [x] Migrate `bevy_gizmos`. - [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs - [x] Migrate sprites - [x] Migrate UI - [x] Migrate `ColorMaterial` - [x] Migrate `MaterialMesh2D` - [x] Migrate fog - [x] Migrate lights - [x] Migrate StandardMaterial - [x] Migrate wireframes - [x] Migrate clear color - [x] Migrate text - [x] Migrate gltf loader - [x] Register color types for reflection - [x] Remove `LegacyColor` - [x] Make sure CI passes Incidental improvements to ease migration: - added `Color::srgba_u8`, `Color::srgba_from_array` and friends - added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the `Alpha` trait - add and immediately deprecate (lol) `Color::rgb` and friends in favor of more explicit and consistent `Color::srgb` - standardized on white and black for most example text colors - added vector field traits to `LinearRgba`: ~~`Add`, `Sub`, `AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications and divisions do not scale alpha. `Add` and `Sub` have been cut from this PR. - added `LinearRgba` and `Srgba` `RED/GREEN/BLUE` - added `LinearRgba_to_f32_array` and `LinearRgba::to_u32` ## Migration Guide Bevy's color types have changed! Wherever you used a `bevy::render::Color`, a `bevy::color::Color` is used instead. These are quite similar! Both are enums storing a color in a specific color space (or to be more precise, using a specific color model). However, each of the different color models now has its own type. TODO... - `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`, `Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`, `Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`. - `Color::set_a` and `Color::a` is now `Color::set_alpha` and `Color::alpha`. These are part of the `Alpha` trait in `bevy_color`. - `Color::is_fully_transparent` is now part of the `Alpha` trait in `bevy_color` - `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for `g`, `b` `h`, `s` and `l` have been removed due to causing silent relatively expensive conversions. Convert your `Color` into the desired color space, perform your operations there, and then convert it back into a polymorphic `Color` enum. - `Color::hex` is now `Srgba::hex`. Call `.into` or construct a `Color::Srgba` variant manually to convert it. - `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`, `ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now store a `LinearRgba`, rather than a polymorphic `Color` - `Color::rgb_linear` and `Color::rgba_linear` are now `Color::linear_rgb` and `Color::linear_rgba` - The various CSS color constants are no longer stored directly on `Color`. Instead, they're defined in the `Srgba` color space, and accessed via `bevy::color::palettes::css`. Call `.into()` on them to convert them into a `Color` for quick debugging use, and consider using the much prettier `tailwind` palette for prototyping. - The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with the standard naming. - Vector field arithmetic operations on `Color` (add, subtract, multiply and divide by a f32) have been removed. Instead, convert your colors into `LinearRgba` space, and perform your operations explicitly there. This is particularly relevant when working with emissive or HDR colors, whose color channel values are routinely outside of the ordinary 0 to 1 range. - `Color::as_linear_rgba_f32` has been removed. Call `LinearRgba::to_f32_array` instead, converting if needed. - `Color::as_linear_rgba_u32` has been removed. Call `LinearRgba::to_u32` instead, converting if needed. - Several other color conversion methods to transform LCH or HSL colors into float arrays or `Vec` types have been removed. Please reimplement these externally or open a PR to re-add them if you found them particularly useful. - Various methods on `Color` such as `rgb` or `hsl` to convert the color into a specific color space have been removed. Convert into `LinearRgba`, then to the color space of your choice. - Various implicitly-converting color value methods on `Color` such as `r`, `g`, `b` or `h` have been removed. Please convert it into the color space of your choice, then check these properties. - `Color` no longer implements `AsBindGroup`. Store a `LinearRgba` internally instead to avoid conversion costs. --------- Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Afonso Lage <lage.afonso@gmail.com> Co-authored-by: Rob Parrett <robparrett@gmail.com> Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
pub color: LinearRgba,
pub illuminance: f32,
pub transform: GlobalTransform,
pub shadows_enabled: bool,
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
pub volumetric: bool,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
pub soft_shadow_size: Option<f32>,
pub shadow_depth_bias: f32,
pub shadow_normal_bias: f32,
pub cascade_shadow_config: CascadeShadowConfig,
pub cascades: EntityHashMap<Vec<Cascade>>,
pub frusta: EntityHashMap<Vec<Frustum>>,
pub render_layers: RenderLayers,
2021-06-02 02:59:17 +00:00
}
// NOTE: These must match the bit flags in bevy_pbr/src/render/mesh_view_types.wgsl!
bitflags::bitflags! {
#[repr(transparent)]
struct PointLightFlags: u32 {
const SHADOWS_ENABLED = 1 << 0;
const SPOT_LIGHT_Y_NEGATIVE = 1 << 1;
const VOLUMETRIC = 1 << 2;
const NONE = 0;
const UNINITIALIZED = 0xFFFF;
}
}
#[derive(Copy, Clone, ShaderType, Default, Debug)]
pub struct GpuDirectionalCascade {
Normalise matrix naming (#13489) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
clip_from_world: Mat4,
texel_size: f32,
far_bound: f32,
}
#[derive(Copy, Clone, ShaderType, Default, Debug)]
pub struct GpuDirectionalLight {
cascades: [GpuDirectionalCascade; MAX_CASCADES_PER_LIGHT],
color: Vec4,
dir_to_light: Vec3,
flags: u32,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
soft_shadow_size: f32,
shadow_depth_bias: f32,
shadow_normal_bias: f32,
num_cascades: u32,
cascades_overlap_proportion: f32,
depth_texture_base_index: u32,
skip: u32,
2021-06-02 02:59:17 +00:00
}
// NOTE: These must match the bit flags in bevy_pbr/src/render/mesh_view_types.wgsl!
bitflags::bitflags! {
#[repr(transparent)]
struct DirectionalLightFlags: u32 {
const SHADOWS_ENABLED = 1 << 0;
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
const VOLUMETRIC = 1 << 1;
const NONE = 0;
const UNINITIALIZED = 0xFFFF;
}
}
#[derive(Copy, Clone, Debug, ShaderType)]
2021-06-02 02:59:17 +00:00
pub struct GpuLights {
directional_lights: [GpuDirectionalLight; MAX_DIRECTIONAL_LIGHTS],
bevy_pbr2: Add support for most of the StandardMaterial textures (#4) * bevy_pbr2: Add support for most of the StandardMaterial textures Normal maps are not included here as they require tangents in a vertex attribute. * bevy_pbr2: Ensure RenderCommandQueue is ready for PbrShaders init * texture_pipelined: Add a light to the scene so we can see stuff * WIP bevy_pbr2: back to front sorting hack * bevy_pbr2: Uniform control flow for texture sampling in pbr.frag From 'fintelia' on the Bevy Render Rework Round 2 discussion: "My understanding is that GPUs these days never use the "execute both branches and select the result" strategy. Rather, what they do is evaluate the branch condition on all threads of a warp, and jump over it if all of them evaluate to false. If even a single thread needs to execute the if statement body, however, then the remaining threads are paused until that is completed." * bevy_pbr2: Simplify texture and sampler names The StandardMaterial_ prefix is no longer needed * bevy_pbr2: Match default 'AmbientColor' of current bevy_pbr for now * bevy_pbr2: Convert from non-linear to linear sRGB for the color uniform * bevy_pbr2: Add pbr_pipelined example * Fix view vector in pbr frag to work in ortho * bevy_pbr2: Use a 90 degree y fov and light range projection for lights * bevy_pbr2: Add AmbientLight resource * bevy_pbr2: Convert PointLight color to linear sRGB for use in fragment shader * bevy_pbr2: pbr.frag: Rename PointLight.projection to view_projection The uniform contains the view_projection matrix so this was incorrect. * bevy_pbr2: PointLight is an OmniLight as it has a radius * bevy_pbr2: Factoring out duplicated code * bevy_pbr2: Implement RenderAsset for StandardMaterial * Remove unnecessary texture and sampler clones * fix comment formatting * remove redundant Buffer:from * Don't extract meshes when their material textures aren't ready * make missing textures in the queue step an error Co-authored-by: Aevyrie <aevyrie@gmail.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-06-27 23:10:23 +00:00
ambient_color: Vec4,
// xyz are x/y/z cluster dimensions and w is the number of clusters
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
cluster_dimensions: UVec4,
// xy are vec2<f32>(cluster_dimensions.xy) / vec2<f32>(view.width, view.height)
// z is cluster_dimensions.z / log(far / near)
// w is cluster_dimensions.z * log(near) / log(far / near)
cluster_factors: Vec4,
n_directional_lights: u32,
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
// offset from spot light's light index to spot light's shadow map index
spot_light_shadowmap_offset: i32,
2021-06-02 02:59:17 +00:00
}
// NOTE: When running bevy on Adreno GPU chipsets in WebGL, any value above 1 will result in a crash
// when loading the wgsl "pbr_functions.wgsl" in the function apply_fog.
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]
pub const MAX_DIRECTIONAL_LIGHTS: usize = 1;
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(any(
not(feature = "webgl"),
not(target_arch = "wasm32"),
feature = "webgpu"
))]
pub const MAX_DIRECTIONAL_LIGHTS: usize = 10;
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(any(
not(feature = "webgl"),
not(target_arch = "wasm32"),
feature = "webgpu"
))]
pub const MAX_CASCADES_PER_LIGHT: usize = 4;
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]
pub const MAX_CASCADES_PER_LIGHT: usize = 1;
2021-06-02 02:59:17 +00:00
#[derive(Resource, Clone)]
pub struct ShadowSamplers {
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
pub point_light_comparison_sampler: Sampler,
pub point_light_linear_sampler: Sampler,
pub directional_light_comparison_sampler: Sampler,
pub directional_light_linear_sampler: Sampler,
2021-06-02 02:59:17 +00:00
}
// TODO: this pattern for initializing the shaders / pipeline isn't ideal. this should be handled by the asset system
impl FromWorld for ShadowSamplers {
2021-06-02 02:59:17 +00:00
fn from_world(world: &mut World) -> Self {
let render_device = world.resource::<RenderDevice>();
2021-06-21 23:28:52 +00:00
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
let base_sampler_descriptor = SamplerDescriptor {
address_mode_u: AddressMode::ClampToEdge,
address_mode_v: AddressMode::ClampToEdge,
address_mode_w: AddressMode::ClampToEdge,
mag_filter: FilterMode::Linear,
min_filter: FilterMode::Linear,
mipmap_filter: FilterMode::Nearest,
..default()
};
ShadowSamplers {
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
point_light_comparison_sampler: render_device.create_sampler(&SamplerDescriptor {
compare: Some(CompareFunction::GreaterEqual),
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
..base_sampler_descriptor
}),
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
point_light_linear_sampler: render_device.create_sampler(&base_sampler_descriptor),
directional_light_comparison_sampler: render_device.create_sampler(
&SamplerDescriptor {
compare: Some(CompareFunction::GreaterEqual),
..base_sampler_descriptor
},
),
directional_light_linear_sampler: render_device
.create_sampler(&base_sampler_descriptor),
}
}
}
Use storage buffers for clustered forward point lights (#3989) # Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-04-07 16:16:35 +00:00
#[allow(clippy::too_many_arguments)]
2021-06-02 02:59:17 +00:00
pub fn extract_lights(
mut commands: Commands,
Make `RenderStage::Extract` run on the render world (#4402) # Objective - Currently, the `Extract` `RenderStage` is executed on the main world, with the render world available as a resource. - However, when needing access to resources in the render world (e.g. to mutate them), the only way to do so was to get exclusive access to the whole `RenderWorld` resource. - This meant that effectively only one extract which wrote to resources could run at a time. - We didn't previously make `Extract`ing writing to the world a non-happy path, even though we want to discourage that. ## Solution - Move the extract stage to run on the render world. - Add the main world as a `MainWorld` resource. - Add an `Extract` `SystemParam` as a convenience to access a (read only) `SystemParam` in the main world during `Extract`. ## Future work It should be possible to avoid needing to use `get_or_spawn` for the render commands, since now the `Commands`' `Entities` matches up with the world being executed on. We need to determine how this interacts with https://github.com/bevyengine/bevy/pull/3519 It's theoretically possible to remove the need for the `value` method on `Extract`. However, that requires slightly changing the `SystemParam` interface, which would make it more complicated. That would probably mess up the `SystemState` api too. ## Todo I still need to add doc comments to `Extract`. --- ## Changelog ### Changed - The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. Resources on the render world can now be accessed using `ResMut` during extract. ### Removed - `Commands::spawn_and_forget`. Use `Commands::get_or_spawn(e).insert_bundle(bundle)` instead ## Migration Guide The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. `Extract` takes a single type parameter, which is any system parameter (such as `Res`, `Query` etc.). It will extract this from the main world, and returns the result of this extraction when `value` is called on it. For example, if previously your extract system looked like: ```rust fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { for cloud in clouds.iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` the new version would be: ```rust fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` The diff is: ```diff --- a/src/clouds.rs +++ b/src/clouds.rs @@ -1,5 +1,5 @@ -fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { - for cloud in clouds.iter() { +fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { + for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` You can now also access resources from the render world using the normal system parameters during `Extract`: ```rust fn extract_assets(mut render_assets: ResMut<MyAssets>, source_assets: Extract<Res<MyAssets>>) { *render_assets = source_assets.clone(); } ``` Please note that all existing extract systems need to be updated to match this new style; even if they currently compile they will not run as expected. A warning will be emitted on a best-effort basis if this is not met. Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-08 23:56:33 +00:00
point_light_shadow_map: Extract<Res<PointLightShadowMap>>,
directional_light_shadow_map: Extract<Res<DirectionalLightShadowMap>>,
global_point_lights: Extract<Res<GlobalVisibleClusterableObjects>>,
Visibilty Inheritance, universal ComputedVisibility and RenderLayers support (#5310) # Objective Fixes #4907. Fixes #838. Fixes #5089. Supersedes #5146. Supersedes #2087. Supersedes #865. Supersedes #5114 Visibility is currently entirely local. Set a parent entity to be invisible, and the children are still visible. This makes it hard for users to hide entire hierarchies of entities. Additionally, the semantics of `Visibility` vs `ComputedVisibility` are inconsistent across entity types. 3D meshes use `ComputedVisibility` as the "definitive" visibility component, with `Visibility` being just one data source. Sprites just use `Visibility`, which means they can't feed off of `ComputedVisibility` data, such as culling information, RenderLayers, and (added in this pr) visibility inheritance information. ## Solution Splits `ComputedVisibilty::is_visible` into `ComputedVisibilty::is_visible_in_view` and `ComputedVisibilty::is_visible_in_hierarchy`. For each visible entity, `is_visible_in_hierarchy` is computed by propagating visibility down the hierarchy. The `ComputedVisibility::is_visible()` function combines these two booleans for the canonical "is this entity visible" function. Additionally, all entities that have `Visibility` now also have `ComputedVisibility`. Sprites, Lights, and UI entities now use `ComputedVisibility` when appropriate. This means that in addition to visibility inheritance, everything using Visibility now also supports RenderLayers. Notably, Sprites (and other 2d objects) now support `RenderLayers` and work properly across multiple views. Also note that this does increase the amount of work done per sprite. Bevymark with 100,000 sprites on `main` runs in `0.017612` seconds and this runs in `0.01902`. That is certainly a gap, but I believe the api consistency and extra functionality this buys us is worth it. See [this thread](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for more info. Note that #5146 in combination with #5114 _are_ a viable alternative to this PR and _would_ perform better, but that comes at the cost of api inconsistencies and doing visibility calculations in the "wrong" place. The current visibility system does have potential for performance improvements. I would prefer to evolve that one system as a whole rather than doing custom hacks / different behaviors for each feature slice. Here is a "split screen" example where the left camera uses RenderLayers to filter out the blue sprite. ![image](https://user-images.githubusercontent.com/2694663/178814868-2e9a2173-bf8c-4c79-8815-633899d492c3.png) Note that this builds directly on #5146 and that @james7132 deserves the credit for the baseline visibility inheritance work. This pr moves the inherited visibility field into `ComputedVisibility`, then does the additional work of porting everything to `ComputedVisibility`. See my [comments here](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for rationale. ## Follow up work * Now that lights use ComputedVisibility, VisibleEntities now includes "visible lights" in the entity list. Functionally not a problem as we use queries to filter the list down in the desired context. But we should consider splitting this out into a separate`VisibleLights` collection for both clarity and performance reasons. And _maybe_ even consider scoping `VisibleEntities` down to `VisibleMeshes`?. * Investigate alternative sprite rendering impls (in combination with visibility system tweaks) that avoid re-generating a per-view fixedbitset of visible entities every frame, then checking each ExtractedEntity. This is where most of the performance overhead lives. Ex: we could generate ExtractedEntities per-view using the VisibleEntities list, avoiding the need for the bitset. * Should ComputedVisibility use bitflags under the hood? This would cut down on the size of the component, potentially speed up the `is_visible()` function, and allow us to cheaply expand ComputedVisibility with more data (ex: split out local visibility and parent visibility, add more culling classes, etc). --- ## Changelog * ComputedVisibility now takes hierarchy visibility into account. * 2D, UI and Light entities now use the ComputedVisibility component. ## Migration Guide If you were previously reading `Visibility::is_visible` as the "actual visibility" for sprites or lights, use `ComputedVisibilty::is_visible()` instead: ```rust // before (0.7) fn system(query: Query<&Visibility>) { for visibility in query.iter() { if visibility.is_visible { log!("found visible entity"); } } } // after (0.8) fn system(query: Query<&ComputedVisibility>) { for visibility in query.iter() { if visibility.is_visible() { log!("found visible entity"); } } } ``` Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-15 23:24:42 +00:00
point_lights: Extract<
Query<(
&PointLight,
&CubemapVisibleEntities,
&GlobalTransform,
Split `ComputedVisibility` into two components to allow for accurate change detection and speed up visibility propagation (#9497) # Objective Fix #8267. Fixes half of #7840. The `ComputedVisibility` component contains two flags: hierarchy visibility, and view visibility (whether its visible to any cameras). Due to the modular and open-ended way that view visibility is computed, it triggers change detection every single frame, even when the value does not change. Since hierarchy visibility is stored in the same component as view visibility, this means that change detection for inherited visibility is completely broken. At the company I work for, this has become a real issue. We are using change detection to only re-render scenes when necessary. The broken state of change detection for computed visibility means that we have to to rely on the non-inherited `Visibility` component for now. This is workable in the early stages of our project, but since we will inevitably want to use the hierarchy, we will have to either: 1. Roll our own solution for computed visibility. 2. Fix the issue for everyone. ## Solution Split the `ComputedVisibility` component into two: `InheritedVisibilty` and `ViewVisibility`. This allows change detection to behave properly for `InheritedVisibility`. View visiblity is still erratic, although it is less useful to be able to detect changes for this flavor of visibility. Overall, this actually simplifies the API. Since the visibility system consists of self-explaining components, it is much easier to document the behavior and usage. This approach is more modular and "ECS-like" -- one could strip out the `ViewVisibility` component entirely if it's not needed, and rely only on inherited visibility. --- ## Changelog - `ComputedVisibility` has been removed in favor of: `InheritedVisibility` and `ViewVisiblity`. ## Migration Guide The `ComputedVisibilty` component has been split into `InheritedVisiblity` and `ViewVisibility`. Replace any usages of `ComputedVisibility::is_visible_in_hierarchy` with `InheritedVisibility::get`, and replace `ComputedVisibility::is_visible_in_view` with `ViewVisibility::get`. ```rust // Before: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, computed_visibility: ComputedVisibility::default(), }); // After: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, inherited_visibility: InheritedVisibility::default(), view_visibility: ViewVisibility::default(), }); ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_hierarchy() { // After: fn my_system(q: Query<&InheritedVisibility>) { for inherited_visibility in &q { if inherited_visibility.get() { ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_view() { // After: fn my_system(q: Query<&ViewVisibility>) { for view_visibility in &q { if view_visibility.get() { ``` ```rust // Before: fn my_system(mut q: Query<&mut ComputedVisibilty>) { for vis in &mut q { vis.set_visible_in_view(); // After: fn my_system(mut q: Query<&mut ViewVisibility>) { for view_visibility in &mut q { view_visibility.set(); ``` --------- Co-authored-by: Robert Swain <robert.swain@gmail.com>
2023-09-01 13:00:18 +00:00
&ViewVisibility,
&CubemapFrusta,
Option<&VolumetricLight>,
Visibilty Inheritance, universal ComputedVisibility and RenderLayers support (#5310) # Objective Fixes #4907. Fixes #838. Fixes #5089. Supersedes #5146. Supersedes #2087. Supersedes #865. Supersedes #5114 Visibility is currently entirely local. Set a parent entity to be invisible, and the children are still visible. This makes it hard for users to hide entire hierarchies of entities. Additionally, the semantics of `Visibility` vs `ComputedVisibility` are inconsistent across entity types. 3D meshes use `ComputedVisibility` as the "definitive" visibility component, with `Visibility` being just one data source. Sprites just use `Visibility`, which means they can't feed off of `ComputedVisibility` data, such as culling information, RenderLayers, and (added in this pr) visibility inheritance information. ## Solution Splits `ComputedVisibilty::is_visible` into `ComputedVisibilty::is_visible_in_view` and `ComputedVisibilty::is_visible_in_hierarchy`. For each visible entity, `is_visible_in_hierarchy` is computed by propagating visibility down the hierarchy. The `ComputedVisibility::is_visible()` function combines these two booleans for the canonical "is this entity visible" function. Additionally, all entities that have `Visibility` now also have `ComputedVisibility`. Sprites, Lights, and UI entities now use `ComputedVisibility` when appropriate. This means that in addition to visibility inheritance, everything using Visibility now also supports RenderLayers. Notably, Sprites (and other 2d objects) now support `RenderLayers` and work properly across multiple views. Also note that this does increase the amount of work done per sprite. Bevymark with 100,000 sprites on `main` runs in `0.017612` seconds and this runs in `0.01902`. That is certainly a gap, but I believe the api consistency and extra functionality this buys us is worth it. See [this thread](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for more info. Note that #5146 in combination with #5114 _are_ a viable alternative to this PR and _would_ perform better, but that comes at the cost of api inconsistencies and doing visibility calculations in the "wrong" place. The current visibility system does have potential for performance improvements. I would prefer to evolve that one system as a whole rather than doing custom hacks / different behaviors for each feature slice. Here is a "split screen" example where the left camera uses RenderLayers to filter out the blue sprite. ![image](https://user-images.githubusercontent.com/2694663/178814868-2e9a2173-bf8c-4c79-8815-633899d492c3.png) Note that this builds directly on #5146 and that @james7132 deserves the credit for the baseline visibility inheritance work. This pr moves the inherited visibility field into `ComputedVisibility`, then does the additional work of porting everything to `ComputedVisibility`. See my [comments here](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for rationale. ## Follow up work * Now that lights use ComputedVisibility, VisibleEntities now includes "visible lights" in the entity list. Functionally not a problem as we use queries to filter the list down in the desired context. But we should consider splitting this out into a separate`VisibleLights` collection for both clarity and performance reasons. And _maybe_ even consider scoping `VisibleEntities` down to `VisibleMeshes`?. * Investigate alternative sprite rendering impls (in combination with visibility system tweaks) that avoid re-generating a per-view fixedbitset of visible entities every frame, then checking each ExtractedEntity. This is where most of the performance overhead lives. Ex: we could generate ExtractedEntities per-view using the VisibleEntities list, avoiding the need for the bitset. * Should ComputedVisibility use bitflags under the hood? This would cut down on the size of the component, potentially speed up the `is_visible()` function, and allow us to cheaply expand ComputedVisibility with more data (ex: split out local visibility and parent visibility, add more culling classes, etc). --- ## Changelog * ComputedVisibility now takes hierarchy visibility into account. * 2D, UI and Light entities now use the ComputedVisibility component. ## Migration Guide If you were previously reading `Visibility::is_visible` as the "actual visibility" for sprites or lights, use `ComputedVisibilty::is_visible()` instead: ```rust // before (0.7) fn system(query: Query<&Visibility>) { for visibility in query.iter() { if visibility.is_visible { log!("found visible entity"); } } } // after (0.8) fn system(query: Query<&ComputedVisibility>) { for visibility in query.iter() { if visibility.is_visible() { log!("found visible entity"); } } } ``` Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-15 23:24:42 +00:00
)>,
>,
spot_lights: Extract<
Query<(
&SpotLight,
&VisibleMeshEntities,
Visibilty Inheritance, universal ComputedVisibility and RenderLayers support (#5310) # Objective Fixes #4907. Fixes #838. Fixes #5089. Supersedes #5146. Supersedes #2087. Supersedes #865. Supersedes #5114 Visibility is currently entirely local. Set a parent entity to be invisible, and the children are still visible. This makes it hard for users to hide entire hierarchies of entities. Additionally, the semantics of `Visibility` vs `ComputedVisibility` are inconsistent across entity types. 3D meshes use `ComputedVisibility` as the "definitive" visibility component, with `Visibility` being just one data source. Sprites just use `Visibility`, which means they can't feed off of `ComputedVisibility` data, such as culling information, RenderLayers, and (added in this pr) visibility inheritance information. ## Solution Splits `ComputedVisibilty::is_visible` into `ComputedVisibilty::is_visible_in_view` and `ComputedVisibilty::is_visible_in_hierarchy`. For each visible entity, `is_visible_in_hierarchy` is computed by propagating visibility down the hierarchy. The `ComputedVisibility::is_visible()` function combines these two booleans for the canonical "is this entity visible" function. Additionally, all entities that have `Visibility` now also have `ComputedVisibility`. Sprites, Lights, and UI entities now use `ComputedVisibility` when appropriate. This means that in addition to visibility inheritance, everything using Visibility now also supports RenderLayers. Notably, Sprites (and other 2d objects) now support `RenderLayers` and work properly across multiple views. Also note that this does increase the amount of work done per sprite. Bevymark with 100,000 sprites on `main` runs in `0.017612` seconds and this runs in `0.01902`. That is certainly a gap, but I believe the api consistency and extra functionality this buys us is worth it. See [this thread](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for more info. Note that #5146 in combination with #5114 _are_ a viable alternative to this PR and _would_ perform better, but that comes at the cost of api inconsistencies and doing visibility calculations in the "wrong" place. The current visibility system does have potential for performance improvements. I would prefer to evolve that one system as a whole rather than doing custom hacks / different behaviors for each feature slice. Here is a "split screen" example where the left camera uses RenderLayers to filter out the blue sprite. ![image](https://user-images.githubusercontent.com/2694663/178814868-2e9a2173-bf8c-4c79-8815-633899d492c3.png) Note that this builds directly on #5146 and that @james7132 deserves the credit for the baseline visibility inheritance work. This pr moves the inherited visibility field into `ComputedVisibility`, then does the additional work of porting everything to `ComputedVisibility`. See my [comments here](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for rationale. ## Follow up work * Now that lights use ComputedVisibility, VisibleEntities now includes "visible lights" in the entity list. Functionally not a problem as we use queries to filter the list down in the desired context. But we should consider splitting this out into a separate`VisibleLights` collection for both clarity and performance reasons. And _maybe_ even consider scoping `VisibleEntities` down to `VisibleMeshes`?. * Investigate alternative sprite rendering impls (in combination with visibility system tweaks) that avoid re-generating a per-view fixedbitset of visible entities every frame, then checking each ExtractedEntity. This is where most of the performance overhead lives. Ex: we could generate ExtractedEntities per-view using the VisibleEntities list, avoiding the need for the bitset. * Should ComputedVisibility use bitflags under the hood? This would cut down on the size of the component, potentially speed up the `is_visible()` function, and allow us to cheaply expand ComputedVisibility with more data (ex: split out local visibility and parent visibility, add more culling classes, etc). --- ## Changelog * ComputedVisibility now takes hierarchy visibility into account. * 2D, UI and Light entities now use the ComputedVisibility component. ## Migration Guide If you were previously reading `Visibility::is_visible` as the "actual visibility" for sprites or lights, use `ComputedVisibilty::is_visible()` instead: ```rust // before (0.7) fn system(query: Query<&Visibility>) { for visibility in query.iter() { if visibility.is_visible { log!("found visible entity"); } } } // after (0.8) fn system(query: Query<&ComputedVisibility>) { for visibility in query.iter() { if visibility.is_visible() { log!("found visible entity"); } } } ``` Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-15 23:24:42 +00:00
&GlobalTransform,
Split `ComputedVisibility` into two components to allow for accurate change detection and speed up visibility propagation (#9497) # Objective Fix #8267. Fixes half of #7840. The `ComputedVisibility` component contains two flags: hierarchy visibility, and view visibility (whether its visible to any cameras). Due to the modular and open-ended way that view visibility is computed, it triggers change detection every single frame, even when the value does not change. Since hierarchy visibility is stored in the same component as view visibility, this means that change detection for inherited visibility is completely broken. At the company I work for, this has become a real issue. We are using change detection to only re-render scenes when necessary. The broken state of change detection for computed visibility means that we have to to rely on the non-inherited `Visibility` component for now. This is workable in the early stages of our project, but since we will inevitably want to use the hierarchy, we will have to either: 1. Roll our own solution for computed visibility. 2. Fix the issue for everyone. ## Solution Split the `ComputedVisibility` component into two: `InheritedVisibilty` and `ViewVisibility`. This allows change detection to behave properly for `InheritedVisibility`. View visiblity is still erratic, although it is less useful to be able to detect changes for this flavor of visibility. Overall, this actually simplifies the API. Since the visibility system consists of self-explaining components, it is much easier to document the behavior and usage. This approach is more modular and "ECS-like" -- one could strip out the `ViewVisibility` component entirely if it's not needed, and rely only on inherited visibility. --- ## Changelog - `ComputedVisibility` has been removed in favor of: `InheritedVisibility` and `ViewVisiblity`. ## Migration Guide The `ComputedVisibilty` component has been split into `InheritedVisiblity` and `ViewVisibility`. Replace any usages of `ComputedVisibility::is_visible_in_hierarchy` with `InheritedVisibility::get`, and replace `ComputedVisibility::is_visible_in_view` with `ViewVisibility::get`. ```rust // Before: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, computed_visibility: ComputedVisibility::default(), }); // After: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, inherited_visibility: InheritedVisibility::default(), view_visibility: ViewVisibility::default(), }); ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_hierarchy() { // After: fn my_system(q: Query<&InheritedVisibility>) { for inherited_visibility in &q { if inherited_visibility.get() { ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_view() { // After: fn my_system(q: Query<&ViewVisibility>) { for view_visibility in &q { if view_visibility.get() { ``` ```rust // Before: fn my_system(mut q: Query<&mut ComputedVisibilty>) { for vis in &mut q { vis.set_visible_in_view(); // After: fn my_system(mut q: Query<&mut ViewVisibility>) { for view_visibility in &mut q { view_visibility.set(); ``` --------- Co-authored-by: Robert Swain <robert.swain@gmail.com>
2023-09-01 13:00:18 +00:00
&ViewVisibility,
&Frustum,
Option<&VolumetricLight>,
Visibilty Inheritance, universal ComputedVisibility and RenderLayers support (#5310) # Objective Fixes #4907. Fixes #838. Fixes #5089. Supersedes #5146. Supersedes #2087. Supersedes #865. Supersedes #5114 Visibility is currently entirely local. Set a parent entity to be invisible, and the children are still visible. This makes it hard for users to hide entire hierarchies of entities. Additionally, the semantics of `Visibility` vs `ComputedVisibility` are inconsistent across entity types. 3D meshes use `ComputedVisibility` as the "definitive" visibility component, with `Visibility` being just one data source. Sprites just use `Visibility`, which means they can't feed off of `ComputedVisibility` data, such as culling information, RenderLayers, and (added in this pr) visibility inheritance information. ## Solution Splits `ComputedVisibilty::is_visible` into `ComputedVisibilty::is_visible_in_view` and `ComputedVisibilty::is_visible_in_hierarchy`. For each visible entity, `is_visible_in_hierarchy` is computed by propagating visibility down the hierarchy. The `ComputedVisibility::is_visible()` function combines these two booleans for the canonical "is this entity visible" function. Additionally, all entities that have `Visibility` now also have `ComputedVisibility`. Sprites, Lights, and UI entities now use `ComputedVisibility` when appropriate. This means that in addition to visibility inheritance, everything using Visibility now also supports RenderLayers. Notably, Sprites (and other 2d objects) now support `RenderLayers` and work properly across multiple views. Also note that this does increase the amount of work done per sprite. Bevymark with 100,000 sprites on `main` runs in `0.017612` seconds and this runs in `0.01902`. That is certainly a gap, but I believe the api consistency and extra functionality this buys us is worth it. See [this thread](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for more info. Note that #5146 in combination with #5114 _are_ a viable alternative to this PR and _would_ perform better, but that comes at the cost of api inconsistencies and doing visibility calculations in the "wrong" place. The current visibility system does have potential for performance improvements. I would prefer to evolve that one system as a whole rather than doing custom hacks / different behaviors for each feature slice. Here is a "split screen" example where the left camera uses RenderLayers to filter out the blue sprite. ![image](https://user-images.githubusercontent.com/2694663/178814868-2e9a2173-bf8c-4c79-8815-633899d492c3.png) Note that this builds directly on #5146 and that @james7132 deserves the credit for the baseline visibility inheritance work. This pr moves the inherited visibility field into `ComputedVisibility`, then does the additional work of porting everything to `ComputedVisibility`. See my [comments here](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for rationale. ## Follow up work * Now that lights use ComputedVisibility, VisibleEntities now includes "visible lights" in the entity list. Functionally not a problem as we use queries to filter the list down in the desired context. But we should consider splitting this out into a separate`VisibleLights` collection for both clarity and performance reasons. And _maybe_ even consider scoping `VisibleEntities` down to `VisibleMeshes`?. * Investigate alternative sprite rendering impls (in combination with visibility system tweaks) that avoid re-generating a per-view fixedbitset of visible entities every frame, then checking each ExtractedEntity. This is where most of the performance overhead lives. Ex: we could generate ExtractedEntities per-view using the VisibleEntities list, avoiding the need for the bitset. * Should ComputedVisibility use bitflags under the hood? This would cut down on the size of the component, potentially speed up the `is_visible()` function, and allow us to cheaply expand ComputedVisibility with more data (ex: split out local visibility and parent visibility, add more culling classes, etc). --- ## Changelog * ComputedVisibility now takes hierarchy visibility into account. * 2D, UI and Light entities now use the ComputedVisibility component. ## Migration Guide If you were previously reading `Visibility::is_visible` as the "actual visibility" for sprites or lights, use `ComputedVisibilty::is_visible()` instead: ```rust // before (0.7) fn system(query: Query<&Visibility>) { for visibility in query.iter() { if visibility.is_visible { log!("found visible entity"); } } } // after (0.8) fn system(query: Query<&ComputedVisibility>) { for visibility in query.iter() { if visibility.is_visible() { log!("found visible entity"); } } } ``` Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-15 23:24:42 +00:00
)>,
>,
Make `RenderStage::Extract` run on the render world (#4402) # Objective - Currently, the `Extract` `RenderStage` is executed on the main world, with the render world available as a resource. - However, when needing access to resources in the render world (e.g. to mutate them), the only way to do so was to get exclusive access to the whole `RenderWorld` resource. - This meant that effectively only one extract which wrote to resources could run at a time. - We didn't previously make `Extract`ing writing to the world a non-happy path, even though we want to discourage that. ## Solution - Move the extract stage to run on the render world. - Add the main world as a `MainWorld` resource. - Add an `Extract` `SystemParam` as a convenience to access a (read only) `SystemParam` in the main world during `Extract`. ## Future work It should be possible to avoid needing to use `get_or_spawn` for the render commands, since now the `Commands`' `Entities` matches up with the world being executed on. We need to determine how this interacts with https://github.com/bevyengine/bevy/pull/3519 It's theoretically possible to remove the need for the `value` method on `Extract`. However, that requires slightly changing the `SystemParam` interface, which would make it more complicated. That would probably mess up the `SystemState` api too. ## Todo I still need to add doc comments to `Extract`. --- ## Changelog ### Changed - The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. Resources on the render world can now be accessed using `ResMut` during extract. ### Removed - `Commands::spawn_and_forget`. Use `Commands::get_or_spawn(e).insert_bundle(bundle)` instead ## Migration Guide The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. `Extract` takes a single type parameter, which is any system parameter (such as `Res`, `Query` etc.). It will extract this from the main world, and returns the result of this extraction when `value` is called on it. For example, if previously your extract system looked like: ```rust fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { for cloud in clouds.iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` the new version would be: ```rust fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` The diff is: ```diff --- a/src/clouds.rs +++ b/src/clouds.rs @@ -1,5 +1,5 @@ -fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { - for cloud in clouds.iter() { +fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { + for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` You can now also access resources from the render world using the normal system parameters during `Extract`: ```rust fn extract_assets(mut render_assets: ResMut<MyAssets>, source_assets: Extract<Res<MyAssets>>) { *render_assets = source_assets.clone(); } ``` Please note that all existing extract systems need to be updated to match this new style; even if they currently compile they will not run as expected. A warning will be emitted on a best-effort basis if this is not met. Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-08 23:56:33 +00:00
directional_lights: Extract<
Query<
(
Entity,
&DirectionalLight,
&CascadesVisibleEntities,
&Cascades,
&CascadeShadowConfig,
&CascadesFrusta,
Make `RenderStage::Extract` run on the render world (#4402) # Objective - Currently, the `Extract` `RenderStage` is executed on the main world, with the render world available as a resource. - However, when needing access to resources in the render world (e.g. to mutate them), the only way to do so was to get exclusive access to the whole `RenderWorld` resource. - This meant that effectively only one extract which wrote to resources could run at a time. - We didn't previously make `Extract`ing writing to the world a non-happy path, even though we want to discourage that. ## Solution - Move the extract stage to run on the render world. - Add the main world as a `MainWorld` resource. - Add an `Extract` `SystemParam` as a convenience to access a (read only) `SystemParam` in the main world during `Extract`. ## Future work It should be possible to avoid needing to use `get_or_spawn` for the render commands, since now the `Commands`' `Entities` matches up with the world being executed on. We need to determine how this interacts with https://github.com/bevyengine/bevy/pull/3519 It's theoretically possible to remove the need for the `value` method on `Extract`. However, that requires slightly changing the `SystemParam` interface, which would make it more complicated. That would probably mess up the `SystemState` api too. ## Todo I still need to add doc comments to `Extract`. --- ## Changelog ### Changed - The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. Resources on the render world can now be accessed using `ResMut` during extract. ### Removed - `Commands::spawn_and_forget`. Use `Commands::get_or_spawn(e).insert_bundle(bundle)` instead ## Migration Guide The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. `Extract` takes a single type parameter, which is any system parameter (such as `Res`, `Query` etc.). It will extract this from the main world, and returns the result of this extraction when `value` is called on it. For example, if previously your extract system looked like: ```rust fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { for cloud in clouds.iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` the new version would be: ```rust fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` The diff is: ```diff --- a/src/clouds.rs +++ b/src/clouds.rs @@ -1,5 +1,5 @@ -fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { - for cloud in clouds.iter() { +fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { + for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` You can now also access resources from the render world using the normal system parameters during `Extract`: ```rust fn extract_assets(mut render_assets: ResMut<MyAssets>, source_assets: Extract<Res<MyAssets>>) { *render_assets = source_assets.clone(); } ``` Please note that all existing extract systems need to be updated to match this new style; even if they currently compile they will not run as expected. A warning will be emitted on a best-effort basis if this is not met. Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-08 23:56:33 +00:00
&GlobalTransform,
Split `ComputedVisibility` into two components to allow for accurate change detection and speed up visibility propagation (#9497) # Objective Fix #8267. Fixes half of #7840. The `ComputedVisibility` component contains two flags: hierarchy visibility, and view visibility (whether its visible to any cameras). Due to the modular and open-ended way that view visibility is computed, it triggers change detection every single frame, even when the value does not change. Since hierarchy visibility is stored in the same component as view visibility, this means that change detection for inherited visibility is completely broken. At the company I work for, this has become a real issue. We are using change detection to only re-render scenes when necessary. The broken state of change detection for computed visibility means that we have to to rely on the non-inherited `Visibility` component for now. This is workable in the early stages of our project, but since we will inevitably want to use the hierarchy, we will have to either: 1. Roll our own solution for computed visibility. 2. Fix the issue for everyone. ## Solution Split the `ComputedVisibility` component into two: `InheritedVisibilty` and `ViewVisibility`. This allows change detection to behave properly for `InheritedVisibility`. View visiblity is still erratic, although it is less useful to be able to detect changes for this flavor of visibility. Overall, this actually simplifies the API. Since the visibility system consists of self-explaining components, it is much easier to document the behavior and usage. This approach is more modular and "ECS-like" -- one could strip out the `ViewVisibility` component entirely if it's not needed, and rely only on inherited visibility. --- ## Changelog - `ComputedVisibility` has been removed in favor of: `InheritedVisibility` and `ViewVisiblity`. ## Migration Guide The `ComputedVisibilty` component has been split into `InheritedVisiblity` and `ViewVisibility`. Replace any usages of `ComputedVisibility::is_visible_in_hierarchy` with `InheritedVisibility::get`, and replace `ComputedVisibility::is_visible_in_view` with `ViewVisibility::get`. ```rust // Before: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, computed_visibility: ComputedVisibility::default(), }); // After: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, inherited_visibility: InheritedVisibility::default(), view_visibility: ViewVisibility::default(), }); ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_hierarchy() { // After: fn my_system(q: Query<&InheritedVisibility>) { for inherited_visibility in &q { if inherited_visibility.get() { ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_view() { // After: fn my_system(q: Query<&ViewVisibility>) { for view_visibility in &q { if view_visibility.get() { ``` ```rust // Before: fn my_system(mut q: Query<&mut ComputedVisibilty>) { for vis in &mut q { vis.set_visible_in_view(); // After: fn my_system(mut q: Query<&mut ViewVisibility>) { for view_visibility in &mut q { view_visibility.set(); ``` --------- Co-authored-by: Robert Swain <robert.swain@gmail.com>
2023-09-01 13:00:18 +00:00
&ViewVisibility,
light renderlayers (#10742) # Objective add `RenderLayers` awareness to lights. lights default to `RenderLayers::layer(0)`, and must intersect the camera entity's `RenderLayers` in order to affect the camera's output. note that lights already use renderlayers to filter meshes for shadow casting. this adds filtering lights per view based on intersection of camera layers and light layers. fixes #3462 ## Solution PointLights and SpotLights are assigned to individual views in `assign_lights_to_clusters`, so we simply cull the lights which don't match the view layers in that function. DirectionalLights are global, so we - add the light layers to the `DirectionalLight` struct - add the view layers to the `ViewUniform` struct - check for intersection before processing the light in `apply_pbr_lighting` potential issue: when mesh/light layers are smaller than the view layers weird results can occur. e.g: camera = layers 1+2 light = layers 1 mesh = layers 2 the mesh does not cast shadows wrt the light as (1 & 2) == 0. the light affects the view as (1+2 & 1) != 0. the view renders the mesh as (1+2 & 2) != 0. so the mesh is rendered and lit, but does not cast a shadow. this could be fixed (so that the light would not affect the mesh in that view) by adding the light layers to the point and spot light structs, but i think the setup is pretty unusual, and space is at a premium in those structs (adding 4 bytes more would reduce the webgl point+spot light max count to 240 from 256). I think typical usage is for cameras to have a single layer, and meshes/lights to maybe have multiple layers to render to e.g. minimaps as well as primary views. if there is a good use case for the above setup and we should support it, please let me know. --- ## Migration Guide Lights no longer affect all `RenderLayers` by default, now like cameras and meshes they default to `RenderLayers::layer(0)`. To recover the previous behaviour and have all lights affect all views, add a `RenderLayers::all()` component to the light entity.
2023-12-12 19:45:37 +00:00
Option<&RenderLayers>,
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
Option<&VolumetricLight>,
Make `RenderStage::Extract` run on the render world (#4402) # Objective - Currently, the `Extract` `RenderStage` is executed on the main world, with the render world available as a resource. - However, when needing access to resources in the render world (e.g. to mutate them), the only way to do so was to get exclusive access to the whole `RenderWorld` resource. - This meant that effectively only one extract which wrote to resources could run at a time. - We didn't previously make `Extract`ing writing to the world a non-happy path, even though we want to discourage that. ## Solution - Move the extract stage to run on the render world. - Add the main world as a `MainWorld` resource. - Add an `Extract` `SystemParam` as a convenience to access a (read only) `SystemParam` in the main world during `Extract`. ## Future work It should be possible to avoid needing to use `get_or_spawn` for the render commands, since now the `Commands`' `Entities` matches up with the world being executed on. We need to determine how this interacts with https://github.com/bevyengine/bevy/pull/3519 It's theoretically possible to remove the need for the `value` method on `Extract`. However, that requires slightly changing the `SystemParam` interface, which would make it more complicated. That would probably mess up the `SystemState` api too. ## Todo I still need to add doc comments to `Extract`. --- ## Changelog ### Changed - The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. Resources on the render world can now be accessed using `ResMut` during extract. ### Removed - `Commands::spawn_and_forget`. Use `Commands::get_or_spawn(e).insert_bundle(bundle)` instead ## Migration Guide The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. `Extract` takes a single type parameter, which is any system parameter (such as `Res`, `Query` etc.). It will extract this from the main world, and returns the result of this extraction when `value` is called on it. For example, if previously your extract system looked like: ```rust fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { for cloud in clouds.iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` the new version would be: ```rust fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` The diff is: ```diff --- a/src/clouds.rs +++ b/src/clouds.rs @@ -1,5 +1,5 @@ -fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { - for cloud in clouds.iter() { +fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { + for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` You can now also access resources from the render world using the normal system parameters during `Extract`: ```rust fn extract_assets(mut render_assets: ResMut<MyAssets>, source_assets: Extract<Res<MyAssets>>) { *render_assets = source_assets.clone(); } ``` Please note that all existing extract systems need to be updated to match this new style; even if they currently compile they will not run as expected. A warning will be emitted on a best-effort basis if this is not met. Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-08 23:56:33 +00:00
),
Without<SpotLight>,
>,
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
>,
Use storage buffers for clustered forward point lights (#3989) # Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-04-07 16:16:35 +00:00
mut previous_point_lights_len: Local<usize>,
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
mut previous_spot_lights_len: Local<usize>,
2021-06-02 02:59:17 +00:00
) {
// NOTE: These shadow map resources are extracted here as they are used here too so this avoids
// races between scheduling of ExtractResourceSystems and this system.
if point_light_shadow_map.is_changed() {
commands.insert_resource(point_light_shadow_map.clone());
}
if directional_light_shadow_map.is_changed() {
commands.insert_resource(directional_light_shadow_map.clone());
}
Scale normal bias by texel size (#26) * 3d_scene_pipelined: Use a shallower directional light angle to provoke acne * cornell_box_pipelined: Remove bias tweaks * bevy_pbr2: Simplify shadow biases by moving them to linear depth * bevy_pbr2: Do not use DepthBiasState * bevy_pbr2: Do not use bilinear filtering for sampling depth textures * pbr.wgsl: Remove unnecessary comment * bevy_pbr2: Do manual shadow map depth comparisons for more flexibility * examples: Add shadow_biases_pipelined example This is useful for stress testing biases. * bevy_pbr2: Scale the point light normal bias by the shadow map texel size This allows the normal bias to be small close to the light source where the shadow map texel to screen texel ratio is high, but is appropriately large further away from the light source where the shadow map texel can easily cover multiple screen texels. * shadow_biases_pipelined: Add support for toggling directional / point light * shadow_biases_pipelined: Cleanup * bevy_pbr2: Scale the directional light normal bias by the shadow map texel size * shadow_biases_pipelined: Fit the orthographic projection around the scene * bevy_pbr2: Directional lights should have no shadows outside their projection Before this change, sampling a fragment position from outside the ndc volume would result in the return sample being clamped to the edge in x,y or possibly always casting a shadow for fragment positions past the orthographic projection's far plane. * bevy_pbr2: Fix the default directional light normal bias * Revert "bevy_pbr2: Do manual shadow map depth comparisons for more flexibility" This reverts commit 7df1bab38a42d8a33bc50ca583d4be37bd9c9f0d. * shadow_biases_pipelined: Adjust directional light normal bias in 0.1 increments * pbr.wgsl: Add a couple of clarifying comments * Revert "bevy_pbr2: Do not use bilinear filtering for sampling depth textures" This reverts commit f53baab0232ce218866a45cad6902b470f4cf2c4. * shadow_biases_pipelined: Print usage to terminal
2021-07-19 19:20:59 +00:00
// This is the point light shadow map texel size for one face of the cube as a distance of 1.0
// world unit from the light.
// point_light_texel_size = 2.0 * 1.0 * tan(PI / 4.0) / cube face width in texels
// PI / 4.0 is half the cube face fov, tan(PI / 4.0) = 1.0, so this simplifies to:
// point_light_texel_size = 2.0 / cube face width in texels
// NOTE: When using various PCF kernel sizes, this will need to be adjusted, according to:
// https://catlikecoding.com/unity/tutorials/custom-srp/point-and-spot-shadows/
let point_light_texel_size = 2.0 / point_light_shadow_map.size as f32;
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
Use storage buffers for clustered forward point lights (#3989) # Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-04-07 16:16:35 +00:00
let mut point_lights_values = Vec::with_capacity(*previous_point_lights_len);
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
for entity in global_point_lights.iter().copied() {
let Ok((
point_light,
cubemap_visible_entities,
transform,
view_visibility,
frusta,
volumetric_light,
)) = point_lights.get(entity)
else {
continue;
};
if !view_visibility.get() {
continue;
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
}
// TODO: This is very much not ideal. We should be able to re-use the vector memory.
// However, since exclusive access to the main world in extract is ill-advised, we just clone here.
let render_cubemap_visible_entities = cubemap_visible_entities.clone();
let extracted_point_light = ExtractedPointLight {
Migrate from `LegacyColor` to `bevy_color::Color` (#12163) # Objective - As part of the migration process we need to a) see the end effect of the migration on user ergonomics b) check for serious perf regressions c) actually migrate the code - To accomplish this, I'm going to attempt to migrate all of the remaining user-facing usages of `LegacyColor` in one PR, being careful to keep a clean commit history. - Fixes #12056. ## Solution I've chosen to use the polymorphic `Color` type as our standard user-facing API. - [x] Migrate `bevy_gizmos`. - [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs - [x] Migrate sprites - [x] Migrate UI - [x] Migrate `ColorMaterial` - [x] Migrate `MaterialMesh2D` - [x] Migrate fog - [x] Migrate lights - [x] Migrate StandardMaterial - [x] Migrate wireframes - [x] Migrate clear color - [x] Migrate text - [x] Migrate gltf loader - [x] Register color types for reflection - [x] Remove `LegacyColor` - [x] Make sure CI passes Incidental improvements to ease migration: - added `Color::srgba_u8`, `Color::srgba_from_array` and friends - added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the `Alpha` trait - add and immediately deprecate (lol) `Color::rgb` and friends in favor of more explicit and consistent `Color::srgb` - standardized on white and black for most example text colors - added vector field traits to `LinearRgba`: ~~`Add`, `Sub`, `AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications and divisions do not scale alpha. `Add` and `Sub` have been cut from this PR. - added `LinearRgba` and `Srgba` `RED/GREEN/BLUE` - added `LinearRgba_to_f32_array` and `LinearRgba::to_u32` ## Migration Guide Bevy's color types have changed! Wherever you used a `bevy::render::Color`, a `bevy::color::Color` is used instead. These are quite similar! Both are enums storing a color in a specific color space (or to be more precise, using a specific color model). However, each of the different color models now has its own type. TODO... - `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`, `Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`, `Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`. - `Color::set_a` and `Color::a` is now `Color::set_alpha` and `Color::alpha`. These are part of the `Alpha` trait in `bevy_color`. - `Color::is_fully_transparent` is now part of the `Alpha` trait in `bevy_color` - `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for `g`, `b` `h`, `s` and `l` have been removed due to causing silent relatively expensive conversions. Convert your `Color` into the desired color space, perform your operations there, and then convert it back into a polymorphic `Color` enum. - `Color::hex` is now `Srgba::hex`. Call `.into` or construct a `Color::Srgba` variant manually to convert it. - `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`, `ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now store a `LinearRgba`, rather than a polymorphic `Color` - `Color::rgb_linear` and `Color::rgba_linear` are now `Color::linear_rgb` and `Color::linear_rgba` - The various CSS color constants are no longer stored directly on `Color`. Instead, they're defined in the `Srgba` color space, and accessed via `bevy::color::palettes::css`. Call `.into()` on them to convert them into a `Color` for quick debugging use, and consider using the much prettier `tailwind` palette for prototyping. - The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with the standard naming. - Vector field arithmetic operations on `Color` (add, subtract, multiply and divide by a f32) have been removed. Instead, convert your colors into `LinearRgba` space, and perform your operations explicitly there. This is particularly relevant when working with emissive or HDR colors, whose color channel values are routinely outside of the ordinary 0 to 1 range. - `Color::as_linear_rgba_f32` has been removed. Call `LinearRgba::to_f32_array` instead, converting if needed. - `Color::as_linear_rgba_u32` has been removed. Call `LinearRgba::to_u32` instead, converting if needed. - Several other color conversion methods to transform LCH or HSL colors into float arrays or `Vec` types have been removed. Please reimplement these externally or open a PR to re-add them if you found them particularly useful. - Various methods on `Color` such as `rgb` or `hsl` to convert the color into a specific color space have been removed. Convert into `LinearRgba`, then to the color space of your choice. - Various implicitly-converting color value methods on `Color` such as `r`, `g`, `b` or `h` have been removed. Please convert it into the color space of your choice, then check these properties. - `Color` no longer implements `AsBindGroup`. Store a `LinearRgba` internally instead to avoid conversion costs. --------- Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Afonso Lage <lage.afonso@gmail.com> Co-authored-by: Rob Parrett <robparrett@gmail.com> Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
color: point_light.color.into(),
// NOTE: Map from luminous power in lumens to luminous intensity in lumens per steradian
// for a point light. See https://google.github.io/filament/Filament.html#mjx-eqn-pointLightLuminousPower
// for details.
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
intensity: point_light.intensity / (4.0 * core::f32::consts::PI),
range: point_light.range,
radius: point_light.radius,
transform: *transform,
shadows_enabled: point_light.shadows_enabled,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
soft_shadows_enabled: point_light.soft_shadows_enabled,
shadow_depth_bias: point_light.shadow_depth_bias,
// The factor of SQRT_2 is for the worst-case diagonal offset
shadow_normal_bias: point_light.shadow_normal_bias
* point_light_texel_size
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
* core::f32::consts::SQRT_2,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
shadow_map_near_z: point_light.shadow_map_near_z,
spot_light_angles: None,
volumetric: volumetric_light.is_some(),
};
point_lights_values.push((
entity,
(
extracted_point_light,
render_cubemap_visible_entities,
(*frusta).clone(),
),
));
2021-06-02 02:59:17 +00:00
}
Use storage buffers for clustered forward point lights (#3989) # Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-04-07 16:16:35 +00:00
*previous_point_lights_len = point_lights_values.len();
commands.insert_or_spawn_batch(point_lights_values);
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let mut spot_lights_values = Vec::with_capacity(*previous_spot_lights_len);
for entity in global_point_lights.iter().copied() {
if let Ok((
spot_light,
visible_entities,
transform,
view_visibility,
frustum,
volumetric_light,
)) = spot_lights.get(entity)
Split `ComputedVisibility` into two components to allow for accurate change detection and speed up visibility propagation (#9497) # Objective Fix #8267. Fixes half of #7840. The `ComputedVisibility` component contains two flags: hierarchy visibility, and view visibility (whether its visible to any cameras). Due to the modular and open-ended way that view visibility is computed, it triggers change detection every single frame, even when the value does not change. Since hierarchy visibility is stored in the same component as view visibility, this means that change detection for inherited visibility is completely broken. At the company I work for, this has become a real issue. We are using change detection to only re-render scenes when necessary. The broken state of change detection for computed visibility means that we have to to rely on the non-inherited `Visibility` component for now. This is workable in the early stages of our project, but since we will inevitably want to use the hierarchy, we will have to either: 1. Roll our own solution for computed visibility. 2. Fix the issue for everyone. ## Solution Split the `ComputedVisibility` component into two: `InheritedVisibilty` and `ViewVisibility`. This allows change detection to behave properly for `InheritedVisibility`. View visiblity is still erratic, although it is less useful to be able to detect changes for this flavor of visibility. Overall, this actually simplifies the API. Since the visibility system consists of self-explaining components, it is much easier to document the behavior and usage. This approach is more modular and "ECS-like" -- one could strip out the `ViewVisibility` component entirely if it's not needed, and rely only on inherited visibility. --- ## Changelog - `ComputedVisibility` has been removed in favor of: `InheritedVisibility` and `ViewVisiblity`. ## Migration Guide The `ComputedVisibilty` component has been split into `InheritedVisiblity` and `ViewVisibility`. Replace any usages of `ComputedVisibility::is_visible_in_hierarchy` with `InheritedVisibility::get`, and replace `ComputedVisibility::is_visible_in_view` with `ViewVisibility::get`. ```rust // Before: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, computed_visibility: ComputedVisibility::default(), }); // After: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, inherited_visibility: InheritedVisibility::default(), view_visibility: ViewVisibility::default(), }); ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_hierarchy() { // After: fn my_system(q: Query<&InheritedVisibility>) { for inherited_visibility in &q { if inherited_visibility.get() { ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_view() { // After: fn my_system(q: Query<&ViewVisibility>) { for view_visibility in &q { if view_visibility.get() { ``` ```rust // Before: fn my_system(mut q: Query<&mut ComputedVisibilty>) { for vis in &mut q { vis.set_visible_in_view(); // After: fn my_system(mut q: Query<&mut ViewVisibility>) { for view_visibility in &mut q { view_visibility.set(); ``` --------- Co-authored-by: Robert Swain <robert.swain@gmail.com>
2023-09-01 13:00:18 +00:00
{
if !view_visibility.get() {
Visibilty Inheritance, universal ComputedVisibility and RenderLayers support (#5310) # Objective Fixes #4907. Fixes #838. Fixes #5089. Supersedes #5146. Supersedes #2087. Supersedes #865. Supersedes #5114 Visibility is currently entirely local. Set a parent entity to be invisible, and the children are still visible. This makes it hard for users to hide entire hierarchies of entities. Additionally, the semantics of `Visibility` vs `ComputedVisibility` are inconsistent across entity types. 3D meshes use `ComputedVisibility` as the "definitive" visibility component, with `Visibility` being just one data source. Sprites just use `Visibility`, which means they can't feed off of `ComputedVisibility` data, such as culling information, RenderLayers, and (added in this pr) visibility inheritance information. ## Solution Splits `ComputedVisibilty::is_visible` into `ComputedVisibilty::is_visible_in_view` and `ComputedVisibilty::is_visible_in_hierarchy`. For each visible entity, `is_visible_in_hierarchy` is computed by propagating visibility down the hierarchy. The `ComputedVisibility::is_visible()` function combines these two booleans for the canonical "is this entity visible" function. Additionally, all entities that have `Visibility` now also have `ComputedVisibility`. Sprites, Lights, and UI entities now use `ComputedVisibility` when appropriate. This means that in addition to visibility inheritance, everything using Visibility now also supports RenderLayers. Notably, Sprites (and other 2d objects) now support `RenderLayers` and work properly across multiple views. Also note that this does increase the amount of work done per sprite. Bevymark with 100,000 sprites on `main` runs in `0.017612` seconds and this runs in `0.01902`. That is certainly a gap, but I believe the api consistency and extra functionality this buys us is worth it. See [this thread](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for more info. Note that #5146 in combination with #5114 _are_ a viable alternative to this PR and _would_ perform better, but that comes at the cost of api inconsistencies and doing visibility calculations in the "wrong" place. The current visibility system does have potential for performance improvements. I would prefer to evolve that one system as a whole rather than doing custom hacks / different behaviors for each feature slice. Here is a "split screen" example where the left camera uses RenderLayers to filter out the blue sprite. ![image](https://user-images.githubusercontent.com/2694663/178814868-2e9a2173-bf8c-4c79-8815-633899d492c3.png) Note that this builds directly on #5146 and that @james7132 deserves the credit for the baseline visibility inheritance work. This pr moves the inherited visibility field into `ComputedVisibility`, then does the additional work of porting everything to `ComputedVisibility`. See my [comments here](https://github.com/bevyengine/bevy/pull/5146#issuecomment-1182783452) for rationale. ## Follow up work * Now that lights use ComputedVisibility, VisibleEntities now includes "visible lights" in the entity list. Functionally not a problem as we use queries to filter the list down in the desired context. But we should consider splitting this out into a separate`VisibleLights` collection for both clarity and performance reasons. And _maybe_ even consider scoping `VisibleEntities` down to `VisibleMeshes`?. * Investigate alternative sprite rendering impls (in combination with visibility system tweaks) that avoid re-generating a per-view fixedbitset of visible entities every frame, then checking each ExtractedEntity. This is where most of the performance overhead lives. Ex: we could generate ExtractedEntities per-view using the VisibleEntities list, avoiding the need for the bitset. * Should ComputedVisibility use bitflags under the hood? This would cut down on the size of the component, potentially speed up the `is_visible()` function, and allow us to cheaply expand ComputedVisibility with more data (ex: split out local visibility and parent visibility, add more culling classes, etc). --- ## Changelog * ComputedVisibility now takes hierarchy visibility into account. * 2D, UI and Light entities now use the ComputedVisibility component. ## Migration Guide If you were previously reading `Visibility::is_visible` as the "actual visibility" for sprites or lights, use `ComputedVisibilty::is_visible()` instead: ```rust // before (0.7) fn system(query: Query<&Visibility>) { for visibility in query.iter() { if visibility.is_visible { log!("found visible entity"); } } } // after (0.8) fn system(query: Query<&ComputedVisibility>) { for visibility in query.iter() { if visibility.is_visible() { log!("found visible entity"); } } } ``` Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-15 23:24:42 +00:00
continue;
}
Make `RenderStage::Extract` run on the render world (#4402) # Objective - Currently, the `Extract` `RenderStage` is executed on the main world, with the render world available as a resource. - However, when needing access to resources in the render world (e.g. to mutate them), the only way to do so was to get exclusive access to the whole `RenderWorld` resource. - This meant that effectively only one extract which wrote to resources could run at a time. - We didn't previously make `Extract`ing writing to the world a non-happy path, even though we want to discourage that. ## Solution - Move the extract stage to run on the render world. - Add the main world as a `MainWorld` resource. - Add an `Extract` `SystemParam` as a convenience to access a (read only) `SystemParam` in the main world during `Extract`. ## Future work It should be possible to avoid needing to use `get_or_spawn` for the render commands, since now the `Commands`' `Entities` matches up with the world being executed on. We need to determine how this interacts with https://github.com/bevyengine/bevy/pull/3519 It's theoretically possible to remove the need for the `value` method on `Extract`. However, that requires slightly changing the `SystemParam` interface, which would make it more complicated. That would probably mess up the `SystemState` api too. ## Todo I still need to add doc comments to `Extract`. --- ## Changelog ### Changed - The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. Resources on the render world can now be accessed using `ResMut` during extract. ### Removed - `Commands::spawn_and_forget`. Use `Commands::get_or_spawn(e).insert_bundle(bundle)` instead ## Migration Guide The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. `Extract` takes a single type parameter, which is any system parameter (such as `Res`, `Query` etc.). It will extract this from the main world, and returns the result of this extraction when `value` is called on it. For example, if previously your extract system looked like: ```rust fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { for cloud in clouds.iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` the new version would be: ```rust fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` The diff is: ```diff --- a/src/clouds.rs +++ b/src/clouds.rs @@ -1,5 +1,5 @@ -fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { - for cloud in clouds.iter() { +fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { + for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` You can now also access resources from the render world using the normal system parameters during `Extract`: ```rust fn extract_assets(mut render_assets: ResMut<MyAssets>, source_assets: Extract<Res<MyAssets>>) { *render_assets = source_assets.clone(); } ``` Please note that all existing extract systems need to be updated to match this new style; even if they currently compile they will not run as expected. A warning will be emitted on a best-effort basis if this is not met. Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-08 23:56:33 +00:00
// TODO: This is very much not ideal. We should be able to re-use the vector memory.
// However, since exclusive access to the main world in extract is ill-advised, we just clone here.
let render_visible_entities = visible_entities.clone();
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let texel_size =
2.0 * ops::tan(spot_light.outer_angle) / directional_light_shadow_map.size as f32;
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
spot_lights_values.push((
entity,
(
ExtractedPointLight {
Migrate from `LegacyColor` to `bevy_color::Color` (#12163) # Objective - As part of the migration process we need to a) see the end effect of the migration on user ergonomics b) check for serious perf regressions c) actually migrate the code - To accomplish this, I'm going to attempt to migrate all of the remaining user-facing usages of `LegacyColor` in one PR, being careful to keep a clean commit history. - Fixes #12056. ## Solution I've chosen to use the polymorphic `Color` type as our standard user-facing API. - [x] Migrate `bevy_gizmos`. - [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs - [x] Migrate sprites - [x] Migrate UI - [x] Migrate `ColorMaterial` - [x] Migrate `MaterialMesh2D` - [x] Migrate fog - [x] Migrate lights - [x] Migrate StandardMaterial - [x] Migrate wireframes - [x] Migrate clear color - [x] Migrate text - [x] Migrate gltf loader - [x] Register color types for reflection - [x] Remove `LegacyColor` - [x] Make sure CI passes Incidental improvements to ease migration: - added `Color::srgba_u8`, `Color::srgba_from_array` and friends - added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the `Alpha` trait - add and immediately deprecate (lol) `Color::rgb` and friends in favor of more explicit and consistent `Color::srgb` - standardized on white and black for most example text colors - added vector field traits to `LinearRgba`: ~~`Add`, `Sub`, `AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications and divisions do not scale alpha. `Add` and `Sub` have been cut from this PR. - added `LinearRgba` and `Srgba` `RED/GREEN/BLUE` - added `LinearRgba_to_f32_array` and `LinearRgba::to_u32` ## Migration Guide Bevy's color types have changed! Wherever you used a `bevy::render::Color`, a `bevy::color::Color` is used instead. These are quite similar! Both are enums storing a color in a specific color space (or to be more precise, using a specific color model). However, each of the different color models now has its own type. TODO... - `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`, `Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`, `Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`. - `Color::set_a` and `Color::a` is now `Color::set_alpha` and `Color::alpha`. These are part of the `Alpha` trait in `bevy_color`. - `Color::is_fully_transparent` is now part of the `Alpha` trait in `bevy_color` - `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for `g`, `b` `h`, `s` and `l` have been removed due to causing silent relatively expensive conversions. Convert your `Color` into the desired color space, perform your operations there, and then convert it back into a polymorphic `Color` enum. - `Color::hex` is now `Srgba::hex`. Call `.into` or construct a `Color::Srgba` variant manually to convert it. - `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`, `ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now store a `LinearRgba`, rather than a polymorphic `Color` - `Color::rgb_linear` and `Color::rgba_linear` are now `Color::linear_rgb` and `Color::linear_rgba` - The various CSS color constants are no longer stored directly on `Color`. Instead, they're defined in the `Srgba` color space, and accessed via `bevy::color::palettes::css`. Call `.into()` on them to convert them into a `Color` for quick debugging use, and consider using the much prettier `tailwind` palette for prototyping. - The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with the standard naming. - Vector field arithmetic operations on `Color` (add, subtract, multiply and divide by a f32) have been removed. Instead, convert your colors into `LinearRgba` space, and perform your operations explicitly there. This is particularly relevant when working with emissive or HDR colors, whose color channel values are routinely outside of the ordinary 0 to 1 range. - `Color::as_linear_rgba_f32` has been removed. Call `LinearRgba::to_f32_array` instead, converting if needed. - `Color::as_linear_rgba_u32` has been removed. Call `LinearRgba::to_u32` instead, converting if needed. - Several other color conversion methods to transform LCH or HSL colors into float arrays or `Vec` types have been removed. Please reimplement these externally or open a PR to re-add them if you found them particularly useful. - Various methods on `Color` such as `rgb` or `hsl` to convert the color into a specific color space have been removed. Convert into `LinearRgba`, then to the color space of your choice. - Various implicitly-converting color value methods on `Color` such as `r`, `g`, `b` or `h` have been removed. Please convert it into the color space of your choice, then check these properties. - `Color` no longer implements `AsBindGroup`. Store a `LinearRgba` internally instead to avoid conversion costs. --------- Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Afonso Lage <lage.afonso@gmail.com> Co-authored-by: Rob Parrett <robparrett@gmail.com> Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
color: spot_light.color.into(),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
// NOTE: Map from luminous power in lumens to luminous intensity in lumens per steradian
// for a point light. See https://google.github.io/filament/Filament.html#mjx-eqn-pointLightLuminousPower
// for details.
// Note: Filament uses a divisor of PI for spot lights. We choose to use the same 4*PI divisor
// in both cases so that toggling between point light and spot light keeps lit areas lit equally,
// which seems least surprising for users
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
intensity: spot_light.intensity / (4.0 * core::f32::consts::PI),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
range: spot_light.range,
radius: spot_light.radius,
transform: *transform,
shadows_enabled: spot_light.shadows_enabled,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
soft_shadows_enabled: spot_light.soft_shadows_enabled,
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
shadow_depth_bias: spot_light.shadow_depth_bias,
// The factor of SQRT_2 is for the worst-case diagonal offset
shadow_normal_bias: spot_light.shadow_normal_bias
* texel_size
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
* core::f32::consts::SQRT_2,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
shadow_map_near_z: spot_light.shadow_map_near_z,
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
spot_light_angles: Some((spot_light.inner_angle, spot_light.outer_angle)),
volumetric: volumetric_light.is_some(),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
},
render_visible_entities,
*frustum,
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
),
));
}
}
*previous_spot_lights_len = spot_lights_values.len();
commands.insert_or_spawn_batch(spot_lights_values);
for (
entity,
directional_light,
visible_entities,
cascades,
cascade_config,
frusta,
transform,
Split `ComputedVisibility` into two components to allow for accurate change detection and speed up visibility propagation (#9497) # Objective Fix #8267. Fixes half of #7840. The `ComputedVisibility` component contains two flags: hierarchy visibility, and view visibility (whether its visible to any cameras). Due to the modular and open-ended way that view visibility is computed, it triggers change detection every single frame, even when the value does not change. Since hierarchy visibility is stored in the same component as view visibility, this means that change detection for inherited visibility is completely broken. At the company I work for, this has become a real issue. We are using change detection to only re-render scenes when necessary. The broken state of change detection for computed visibility means that we have to to rely on the non-inherited `Visibility` component for now. This is workable in the early stages of our project, but since we will inevitably want to use the hierarchy, we will have to either: 1. Roll our own solution for computed visibility. 2. Fix the issue for everyone. ## Solution Split the `ComputedVisibility` component into two: `InheritedVisibilty` and `ViewVisibility`. This allows change detection to behave properly for `InheritedVisibility`. View visiblity is still erratic, although it is less useful to be able to detect changes for this flavor of visibility. Overall, this actually simplifies the API. Since the visibility system consists of self-explaining components, it is much easier to document the behavior and usage. This approach is more modular and "ECS-like" -- one could strip out the `ViewVisibility` component entirely if it's not needed, and rely only on inherited visibility. --- ## Changelog - `ComputedVisibility` has been removed in favor of: `InheritedVisibility` and `ViewVisiblity`. ## Migration Guide The `ComputedVisibilty` component has been split into `InheritedVisiblity` and `ViewVisibility`. Replace any usages of `ComputedVisibility::is_visible_in_hierarchy` with `InheritedVisibility::get`, and replace `ComputedVisibility::is_visible_in_view` with `ViewVisibility::get`. ```rust // Before: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, computed_visibility: ComputedVisibility::default(), }); // After: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, inherited_visibility: InheritedVisibility::default(), view_visibility: ViewVisibility::default(), }); ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_hierarchy() { // After: fn my_system(q: Query<&InheritedVisibility>) { for inherited_visibility in &q { if inherited_visibility.get() { ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_view() { // After: fn my_system(q: Query<&ViewVisibility>) { for view_visibility in &q { if view_visibility.get() { ``` ```rust // Before: fn my_system(mut q: Query<&mut ComputedVisibilty>) { for vis in &mut q { vis.set_visible_in_view(); // After: fn my_system(mut q: Query<&mut ViewVisibility>) { for view_visibility in &mut q { view_visibility.set(); ``` --------- Co-authored-by: Robert Swain <robert.swain@gmail.com>
2023-09-01 13:00:18 +00:00
view_visibility,
light renderlayers (#10742) # Objective add `RenderLayers` awareness to lights. lights default to `RenderLayers::layer(0)`, and must intersect the camera entity's `RenderLayers` in order to affect the camera's output. note that lights already use renderlayers to filter meshes for shadow casting. this adds filtering lights per view based on intersection of camera layers and light layers. fixes #3462 ## Solution PointLights and SpotLights are assigned to individual views in `assign_lights_to_clusters`, so we simply cull the lights which don't match the view layers in that function. DirectionalLights are global, so we - add the light layers to the `DirectionalLight` struct - add the view layers to the `ViewUniform` struct - check for intersection before processing the light in `apply_pbr_lighting` potential issue: when mesh/light layers are smaller than the view layers weird results can occur. e.g: camera = layers 1+2 light = layers 1 mesh = layers 2 the mesh does not cast shadows wrt the light as (1 & 2) == 0. the light affects the view as (1+2 & 1) != 0. the view renders the mesh as (1+2 & 2) != 0. so the mesh is rendered and lit, but does not cast a shadow. this could be fixed (so that the light would not affect the mesh in that view) by adding the light layers to the point and spot light structs, but i think the setup is pretty unusual, and space is at a premium in those structs (adding 4 bytes more would reduce the webgl point+spot light max count to 240 from 256). I think typical usage is for cameras to have a single layer, and meshes/lights to maybe have multiple layers to render to e.g. minimaps as well as primary views. if there is a good use case for the above setup and we should support it, please let me know. --- ## Migration Guide Lights no longer affect all `RenderLayers` by default, now like cameras and meshes they default to `RenderLayers::layer(0)`. To recover the previous behaviour and have all lights affect all views, add a `RenderLayers::all()` component to the light entity.
2023-12-12 19:45:37 +00:00
maybe_layers,
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
volumetric_light,
Split `ComputedVisibility` into two components to allow for accurate change detection and speed up visibility propagation (#9497) # Objective Fix #8267. Fixes half of #7840. The `ComputedVisibility` component contains two flags: hierarchy visibility, and view visibility (whether its visible to any cameras). Due to the modular and open-ended way that view visibility is computed, it triggers change detection every single frame, even when the value does not change. Since hierarchy visibility is stored in the same component as view visibility, this means that change detection for inherited visibility is completely broken. At the company I work for, this has become a real issue. We are using change detection to only re-render scenes when necessary. The broken state of change detection for computed visibility means that we have to to rely on the non-inherited `Visibility` component for now. This is workable in the early stages of our project, but since we will inevitably want to use the hierarchy, we will have to either: 1. Roll our own solution for computed visibility. 2. Fix the issue for everyone. ## Solution Split the `ComputedVisibility` component into two: `InheritedVisibilty` and `ViewVisibility`. This allows change detection to behave properly for `InheritedVisibility`. View visiblity is still erratic, although it is less useful to be able to detect changes for this flavor of visibility. Overall, this actually simplifies the API. Since the visibility system consists of self-explaining components, it is much easier to document the behavior and usage. This approach is more modular and "ECS-like" -- one could strip out the `ViewVisibility` component entirely if it's not needed, and rely only on inherited visibility. --- ## Changelog - `ComputedVisibility` has been removed in favor of: `InheritedVisibility` and `ViewVisiblity`. ## Migration Guide The `ComputedVisibilty` component has been split into `InheritedVisiblity` and `ViewVisibility`. Replace any usages of `ComputedVisibility::is_visible_in_hierarchy` with `InheritedVisibility::get`, and replace `ComputedVisibility::is_visible_in_view` with `ViewVisibility::get`. ```rust // Before: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, computed_visibility: ComputedVisibility::default(), }); // After: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, inherited_visibility: InheritedVisibility::default(), view_visibility: ViewVisibility::default(), }); ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_hierarchy() { // After: fn my_system(q: Query<&InheritedVisibility>) { for inherited_visibility in &q { if inherited_visibility.get() { ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_view() { // After: fn my_system(q: Query<&ViewVisibility>) { for view_visibility in &q { if view_visibility.get() { ``` ```rust // Before: fn my_system(mut q: Query<&mut ComputedVisibilty>) { for vis in &mut q { vis.set_visible_in_view(); // After: fn my_system(mut q: Query<&mut ViewVisibility>) { for view_visibility in &mut q { view_visibility.set(); ``` --------- Co-authored-by: Robert Swain <robert.swain@gmail.com>
2023-09-01 13:00:18 +00:00
) in &directional_lights
{
Split `ComputedVisibility` into two components to allow for accurate change detection and speed up visibility propagation (#9497) # Objective Fix #8267. Fixes half of #7840. The `ComputedVisibility` component contains two flags: hierarchy visibility, and view visibility (whether its visible to any cameras). Due to the modular and open-ended way that view visibility is computed, it triggers change detection every single frame, even when the value does not change. Since hierarchy visibility is stored in the same component as view visibility, this means that change detection for inherited visibility is completely broken. At the company I work for, this has become a real issue. We are using change detection to only re-render scenes when necessary. The broken state of change detection for computed visibility means that we have to to rely on the non-inherited `Visibility` component for now. This is workable in the early stages of our project, but since we will inevitably want to use the hierarchy, we will have to either: 1. Roll our own solution for computed visibility. 2. Fix the issue for everyone. ## Solution Split the `ComputedVisibility` component into two: `InheritedVisibilty` and `ViewVisibility`. This allows change detection to behave properly for `InheritedVisibility`. View visiblity is still erratic, although it is less useful to be able to detect changes for this flavor of visibility. Overall, this actually simplifies the API. Since the visibility system consists of self-explaining components, it is much easier to document the behavior and usage. This approach is more modular and "ECS-like" -- one could strip out the `ViewVisibility` component entirely if it's not needed, and rely only on inherited visibility. --- ## Changelog - `ComputedVisibility` has been removed in favor of: `InheritedVisibility` and `ViewVisiblity`. ## Migration Guide The `ComputedVisibilty` component has been split into `InheritedVisiblity` and `ViewVisibility`. Replace any usages of `ComputedVisibility::is_visible_in_hierarchy` with `InheritedVisibility::get`, and replace `ComputedVisibility::is_visible_in_view` with `ViewVisibility::get`. ```rust // Before: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, computed_visibility: ComputedVisibility::default(), }); // After: commands.spawn(VisibilityBundle { visibility: Visibility::Inherited, inherited_visibility: InheritedVisibility::default(), view_visibility: ViewVisibility::default(), }); ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_hierarchy() { // After: fn my_system(q: Query<&InheritedVisibility>) { for inherited_visibility in &q { if inherited_visibility.get() { ``` ```rust // Before: fn my_system(q: Query<&ComputedVisibilty>) { for vis in &q { if vis.is_visible_in_view() { // After: fn my_system(q: Query<&ViewVisibility>) { for view_visibility in &q { if view_visibility.get() { ``` ```rust // Before: fn my_system(mut q: Query<&mut ComputedVisibilty>) { for vis in &mut q { vis.set_visible_in_view(); // After: fn my_system(mut q: Query<&mut ViewVisibility>) { for view_visibility in &mut q { view_visibility.set(); ``` --------- Co-authored-by: Robert Swain <robert.swain@gmail.com>
2023-09-01 13:00:18 +00:00
if !view_visibility.get() {
continue;
}
Make `RenderStage::Extract` run on the render world (#4402) # Objective - Currently, the `Extract` `RenderStage` is executed on the main world, with the render world available as a resource. - However, when needing access to resources in the render world (e.g. to mutate them), the only way to do so was to get exclusive access to the whole `RenderWorld` resource. - This meant that effectively only one extract which wrote to resources could run at a time. - We didn't previously make `Extract`ing writing to the world a non-happy path, even though we want to discourage that. ## Solution - Move the extract stage to run on the render world. - Add the main world as a `MainWorld` resource. - Add an `Extract` `SystemParam` as a convenience to access a (read only) `SystemParam` in the main world during `Extract`. ## Future work It should be possible to avoid needing to use `get_or_spawn` for the render commands, since now the `Commands`' `Entities` matches up with the world being executed on. We need to determine how this interacts with https://github.com/bevyengine/bevy/pull/3519 It's theoretically possible to remove the need for the `value` method on `Extract`. However, that requires slightly changing the `SystemParam` interface, which would make it more complicated. That would probably mess up the `SystemState` api too. ## Todo I still need to add doc comments to `Extract`. --- ## Changelog ### Changed - The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. Resources on the render world can now be accessed using `ResMut` during extract. ### Removed - `Commands::spawn_and_forget`. Use `Commands::get_or_spawn(e).insert_bundle(bundle)` instead ## Migration Guide The `Extract` `RenderStage` now runs on the render world (instead of the main world as before). You must use the `Extract` `SystemParam` to access the main world during the extract phase. `Extract` takes a single type parameter, which is any system parameter (such as `Res`, `Query` etc.). It will extract this from the main world, and returns the result of this extraction when `value` is called on it. For example, if previously your extract system looked like: ```rust fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { for cloud in clouds.iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` the new version would be: ```rust fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` The diff is: ```diff --- a/src/clouds.rs +++ b/src/clouds.rs @@ -1,5 +1,5 @@ -fn extract_clouds(mut commands: Commands, clouds: Query<Entity, With<Cloud>>) { - for cloud in clouds.iter() { +fn extract_clouds(mut commands: Commands, mut clouds: Extract<Query<Entity, With<Cloud>>>) { + for cloud in clouds.value().iter() { commands.get_or_spawn(cloud).insert(Cloud); } } ``` You can now also access resources from the render world using the normal system parameters during `Extract`: ```rust fn extract_assets(mut render_assets: ResMut<MyAssets>, source_assets: Extract<Res<MyAssets>>) { *render_assets = source_assets.clone(); } ``` Please note that all existing extract systems need to be updated to match this new style; even if they currently compile they will not run as expected. A warning will be emitted on a best-effort basis if this is not met. Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-07-08 23:56:33 +00:00
// TODO: As above
let render_visible_entities = visible_entities.clone();
Accept Bundles for insert and remove. Deprecate insert/remove_bundle (#6039) # Objective Take advantage of the "impl Bundle for Component" changes in #2975 / add the follow up changes discussed there. ## Solution - Change `insert` and `remove` to accept a Bundle instead of a Component (for both Commands and World) - Deprecate `insert_bundle`, `remove_bundle`, and `remove_bundle_intersection` - Add `remove_intersection` --- ## Changelog - Change `insert` and `remove` now accept a Bundle instead of a Component (for both Commands and World) - `insert_bundle` and `remove_bundle` are deprecated ## Migration Guide Replace `insert_bundle` with `insert`: ```rust // Old (0.8) commands.spawn().insert_bundle(SomeBundle::default()); // New (0.9) commands.spawn().insert(SomeBundle::default()); ``` Replace `remove_bundle` with `remove`: ```rust // Old (0.8) commands.entity(some_entity).remove_bundle::<SomeBundle>(); // New (0.9) commands.entity(some_entity).remove::<SomeBundle>(); ``` Replace `remove_bundle_intersection` with `remove_intersection`: ```rust // Old (0.8) world.entity_mut(some_entity).remove_bundle_intersection::<SomeBundle>(); // New (0.9) world.entity_mut(some_entity).remove_intersection::<SomeBundle>(); ``` Consider consolidating as many operations as possible to improve ergonomics and cut down on archetype moves: ```rust // Old (0.8) commands.spawn() .insert_bundle(SomeBundle::default()) .insert(SomeComponent); // New (0.9) - Option 1 commands.spawn().insert(( SomeBundle::default(), SomeComponent, )) // New (0.9) - Option 2 commands.spawn_bundle(( SomeBundle::default(), SomeComponent, )) ``` ## Next Steps Consider changing `spawn` to accept a bundle and deprecate `spawn_bundle`.
2022-09-21 21:47:53 +00:00
commands.get_or_spawn(entity).insert((
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
ExtractedDirectionalLight {
Migrate from `LegacyColor` to `bevy_color::Color` (#12163) # Objective - As part of the migration process we need to a) see the end effect of the migration on user ergonomics b) check for serious perf regressions c) actually migrate the code - To accomplish this, I'm going to attempt to migrate all of the remaining user-facing usages of `LegacyColor` in one PR, being careful to keep a clean commit history. - Fixes #12056. ## Solution I've chosen to use the polymorphic `Color` type as our standard user-facing API. - [x] Migrate `bevy_gizmos`. - [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs - [x] Migrate sprites - [x] Migrate UI - [x] Migrate `ColorMaterial` - [x] Migrate `MaterialMesh2D` - [x] Migrate fog - [x] Migrate lights - [x] Migrate StandardMaterial - [x] Migrate wireframes - [x] Migrate clear color - [x] Migrate text - [x] Migrate gltf loader - [x] Register color types for reflection - [x] Remove `LegacyColor` - [x] Make sure CI passes Incidental improvements to ease migration: - added `Color::srgba_u8`, `Color::srgba_from_array` and friends - added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the `Alpha` trait - add and immediately deprecate (lol) `Color::rgb` and friends in favor of more explicit and consistent `Color::srgb` - standardized on white and black for most example text colors - added vector field traits to `LinearRgba`: ~~`Add`, `Sub`, `AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications and divisions do not scale alpha. `Add` and `Sub` have been cut from this PR. - added `LinearRgba` and `Srgba` `RED/GREEN/BLUE` - added `LinearRgba_to_f32_array` and `LinearRgba::to_u32` ## Migration Guide Bevy's color types have changed! Wherever you used a `bevy::render::Color`, a `bevy::color::Color` is used instead. These are quite similar! Both are enums storing a color in a specific color space (or to be more precise, using a specific color model). However, each of the different color models now has its own type. TODO... - `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`, `Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`, `Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`. - `Color::set_a` and `Color::a` is now `Color::set_alpha` and `Color::alpha`. These are part of the `Alpha` trait in `bevy_color`. - `Color::is_fully_transparent` is now part of the `Alpha` trait in `bevy_color` - `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for `g`, `b` `h`, `s` and `l` have been removed due to causing silent relatively expensive conversions. Convert your `Color` into the desired color space, perform your operations there, and then convert it back into a polymorphic `Color` enum. - `Color::hex` is now `Srgba::hex`. Call `.into` or construct a `Color::Srgba` variant manually to convert it. - `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`, `ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now store a `LinearRgba`, rather than a polymorphic `Color` - `Color::rgb_linear` and `Color::rgba_linear` are now `Color::linear_rgb` and `Color::linear_rgba` - The various CSS color constants are no longer stored directly on `Color`. Instead, they're defined in the `Srgba` color space, and accessed via `bevy::color::palettes::css`. Call `.into()` on them to convert them into a `Color` for quick debugging use, and consider using the much prettier `tailwind` palette for prototyping. - The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with the standard naming. - Vector field arithmetic operations on `Color` (add, subtract, multiply and divide by a f32) have been removed. Instead, convert your colors into `LinearRgba` space, and perform your operations explicitly there. This is particularly relevant when working with emissive or HDR colors, whose color channel values are routinely outside of the ordinary 0 to 1 range. - `Color::as_linear_rgba_f32` has been removed. Call `LinearRgba::to_f32_array` instead, converting if needed. - `Color::as_linear_rgba_u32` has been removed. Call `LinearRgba::to_u32` instead, converting if needed. - Several other color conversion methods to transform LCH or HSL colors into float arrays or `Vec` types have been removed. Please reimplement these externally or open a PR to re-add them if you found them particularly useful. - Various methods on `Color` such as `rgb` or `hsl` to convert the color into a specific color space have been removed. Convert into `LinearRgba`, then to the color space of your choice. - Various implicitly-converting color value methods on `Color` such as `r`, `g`, `b` or `h` have been removed. Please convert it into the color space of your choice, then check these properties. - `Color` no longer implements `AsBindGroup`. Store a `LinearRgba` internally instead to avoid conversion costs. --------- Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Afonso Lage <lage.afonso@gmail.com> Co-authored-by: Rob Parrett <robparrett@gmail.com> Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
color: directional_light.color.into(),
illuminance: directional_light.illuminance,
Take DirectionalLight's GlobalTransform into account when calculating shadow map volume (not just direction) (#6384) # Objective This PR fixes #5789, by enabling movable (and scalable) directional light shadow volumes. ## Solution This PR changes `ExtractedDirectionalLight` to hold a copy of the `DirectionalLight` entity's `GlobalTransform`, instead of just a `direction` vector. This allows the shadow map volume (as defined by the light's `shadow_projection` field) to be transformed honoring translation _and_ scale transforms, and not just rotation. It also augments the texel size calculation (used to determine the `shadow_normal_bias`) so that it now takes into account the upper bound of the x/y/z scale of the `GlobalTransform`. This change makes the directional light extraction code more consistent with point and spot lights (that already use `transform`), and allows easily moving and scaling the shadow volume along with a player entity based on camera distance/angle, immediately enabling more real world use cases until we have a more sophisticated adaptive implementation, such as the one described in #3629. **Note:** While it was previously possible to update the projection achieving a similar effect, depending on the light direction and distance to the origin, the fact that the shadow map camera was always positioned at the origin with a hardcoded `Vec3::Y` up value meant you would get sub-optimal or inconsistent/incorrect results. --- ## Changelog ### Changed - `DirectionalLight` shadow volumes now honor translation and scale transforms ## Migration Guide - If your directional lights were positioned at the origin and not scaled (the default, most common scenario) no changes are needed on your part; it just works as before; - If you previously had a system for dynamically updating directional light shadow projections, you might now be able to simplify your code by updating the directional light entity's transform instead; - In the unlikely scenario that a scene with directional lights that previously rendered shadows correctly has missing shadows, make sure your directional lights are positioned at (0, 0, 0) and are not scaled to a size that's too large or too small.
2022-11-04 20:12:26 +00:00
transform: *transform,
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
volumetric: volumetric_light.is_some(),
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
soft_shadow_size: directional_light.soft_shadow_size,
shadows_enabled: directional_light.shadows_enabled,
shadow_depth_bias: directional_light.shadow_depth_bias,
// The factor of SQRT_2 is for the worst-case diagonal offset
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
shadow_normal_bias: directional_light.shadow_normal_bias
* core::f32::consts::SQRT_2,
cascade_shadow_config: cascade_config.clone(),
cascades: cascades.cascades.clone(),
frusta: frusta.frusta.clone(),
render_layers: maybe_layers.unwrap_or_default().clone(),
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
},
render_visible_entities,
));
}
2021-06-02 02:59:17 +00:00
}
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
pub(crate) struct CubeMapFace {
pub(crate) target: Vec3,
pub(crate) up: Vec3,
2021-07-01 23:48:55 +00:00
}
// Cubemap faces are [+X, -X, +Y, -Y, +Z, -Z], per https://www.w3.org/TR/webgpu/#texture-view-creation
// Note: Cubemap coordinates are left-handed y-up, unlike the rest of Bevy.
// See https://registry.khronos.org/vulkan/specs/1.2/html/chap16.html#_cube_map_face_selection
//
// For each cubemap face, we take care to specify the appropriate target/up axis such that the rendered
// texture using Bevy's right-handed y-up coordinate space matches the expected cubemap face in
// left-handed y-up cubemap coordinates.
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
pub(crate) const CUBE_MAP_FACES: [CubeMapFace; 6] = [
// +X
2021-07-01 23:48:55 +00:00
CubeMapFace {
target: Vec3::X,
up: Vec3::Y,
2021-07-01 23:48:55 +00:00
},
// -X
2021-07-01 23:48:55 +00:00
CubeMapFace {
target: Vec3::NEG_X,
up: Vec3::Y,
2021-07-01 23:48:55 +00:00
},
// +Y
2021-07-01 23:48:55 +00:00
CubeMapFace {
target: Vec3::Y,
2021-07-01 23:48:55 +00:00
up: Vec3::Z,
},
// -Y
2021-07-01 23:48:55 +00:00
CubeMapFace {
target: Vec3::NEG_Y,
up: Vec3::NEG_Z,
2021-07-01 23:48:55 +00:00
},
// +Z (with left-handed conventions, pointing forwards)
2021-07-01 23:48:55 +00:00
CubeMapFace {
target: Vec3::NEG_Z,
up: Vec3::Y,
2021-07-01 23:48:55 +00:00
},
// -Z (with left-handed conventions, pointing backwards)
2021-07-01 23:48:55 +00:00
CubeMapFace {
target: Vec3::Z,
up: Vec3::Y,
2021-07-01 23:48:55 +00:00
},
];
fn face_index_to_name(face_index: usize) -> &'static str {
match face_index {
0 => "+x",
1 => "-x",
2 => "+y",
3 => "-y",
4 => "+z",
5 => "-z",
_ => "invalid",
}
}
2021-11-22 23:16:36 +00:00
#[derive(Component)]
pub struct ShadowView {
Keep track of when a texture is first cleared (#10325) # Objective - Custom render passes, or future passes in the engine (such as https://github.com/bevyengine/bevy/pull/10164) need a better way to know and indicate to the core passes whether the view color/depth/prepass attachments have been cleared or not yet this frame, to know if they should clear it themselves or load it. ## Solution - For all render targets (depth textures, shadow textures, prepass textures, main textures) use an atomic bool to track whether or not each texture has been cleared this frame. Abstracted away in the new ColorAttachment and DepthAttachment wrappers. --- ## Changelog - Changed `ViewTarget::get_color_attachment()`, removed arguments. - Changed `ViewTarget::get_unsampled_color_attachment()`, removed arguments. - Removed `Camera3d::clear_color`. - Removed `Camera2d::clear_color`. - Added `Camera::clear_color`. - Added `ExtractedCamera::clear_color`. - Added `ColorAttachment` and `DepthAttachment` wrappers. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - Core render passes now track when a texture is first bound as an attachment in order to decide whether to clear or load it. ## Migration Guide - Remove arguments to `ViewTarget::get_color_attachment()` and `ViewTarget::get_unsampled_color_attachment()`. - Configure clear color on `Camera` instead of on `Camera3d` and `Camera2d`. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - `ViewDepthTexture` must now be created via the `new()` method --------- Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-12-31 00:37:37 +00:00
pub depth_attachment: DepthAttachment,
pub pass_name: String,
2021-06-02 02:59:17 +00:00
}
2021-11-22 23:16:36 +00:00
#[derive(Component)]
pub struct ViewShadowBindings {
pub point_light_depth_texture: Texture,
pub point_light_depth_texture_view: TextureView,
pub directional_light_depth_texture: Texture,
pub directional_light_depth_texture_view: TextureView,
}
2021-11-22 23:16:36 +00:00
#[derive(Component)]
pub struct ViewLightEntities {
2021-06-02 02:59:17 +00:00
pub lights: Vec<Entity>,
}
2021-11-22 23:16:36 +00:00
#[derive(Component)]
pub struct ViewLightsUniformOffset {
pub offset: u32,
2021-06-02 02:59:17 +00:00
}
Make `Resource` trait opt-in, requiring `#[derive(Resource)]` V2 (#5577) *This PR description is an edited copy of #5007, written by @alice-i-cecile.* # Objective Follow-up to https://github.com/bevyengine/bevy/pull/2254. The `Resource` trait currently has a blanket implementation for all types that meet its bounds. While ergonomic, this results in several drawbacks: * it is possible to make confusing, silent mistakes such as inserting a function pointer (Foo) rather than a value (Foo::Bar) as a resource * it is challenging to discover if a type is intended to be used as a resource * we cannot later add customization options (see the [RFC](https://github.com/bevyengine/rfcs/blob/main/rfcs/27-derive-component.md) for the equivalent choice for Component). * dependencies can use the same Rust type as a resource in invisibly conflicting ways * raw Rust types used as resources cannot preserve privacy appropriately, as anyone able to access that type can read and write to internal values * we cannot capture a definitive list of possible resources to display to users in an editor ## Notes to reviewers * Review this commit-by-commit; there's effectively no back-tracking and there's a lot of churn in some of these commits. *ira: My commits are not as well organized :')* * I've relaxed the bound on Local to Send + Sync + 'static: I don't think these concerns apply there, so this can keep things simple. Storing e.g. a u32 in a Local is fine, because there's a variable name attached explaining what it does. * I think this is a bad place for the Resource trait to live, but I've left it in place to make reviewing easier. IMO that's best tackled with https://github.com/bevyengine/bevy/issues/4981. ## Changelog `Resource` is no longer automatically implemented for all matching types. Instead, use the new `#[derive(Resource)]` macro. ## Migration Guide Add `#[derive(Resource)]` to all types you are using as a resource. If you are using a third party type as a resource, wrap it in a tuple struct to bypass orphan rules. Consider deriving `Deref` and `DerefMut` to improve ergonomics. `ClearColor` no longer implements `Component`. Using `ClearColor` as a component in 0.8 did nothing. Use the `ClearColorConfig` in the `Camera3d` and `Camera2d` components instead. Co-authored-by: Alice <alice.i.cecile@gmail.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: devil-ira <justthecooldude@gmail.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-08-08 21:36:35 +00:00
#[derive(Resource, Default)]
2021-06-02 02:59:17 +00:00
pub struct LightMeta {
pub view_gpu_lights: DynamicUniformBuffer<GpuLights>,
2021-06-02 02:59:17 +00:00
}
2021-11-22 23:16:36 +00:00
#[derive(Component)]
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
pub enum LightEntity {
Directional {
light_entity: Entity,
cascade_index: usize,
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
},
Point {
light_entity: Entity,
face_index: usize,
},
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
Spot {
light_entity: Entity,
},
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
}
bevy_pbr2: Fix clustering for orthographic projections (#3316) # Objective PBR lighting was broken in the new renderer when using orthographic projections due to the way the depth slicing works for the clusters. Fix it. ## Solution - The default orthographic projection near plane is 0.0. The perspective projection depth slicing does a division by the near plane which gives a floating point NaN and the clustering all breaks down. - Orthographic projections have a linear depth mapping, so it made intuitive sense to me to do depth slicing with a linear mapping too. The alternative I saw was to try to handle the near plane being at 0.0 and using the exponential depth slicing, but that felt like a hack that didn't make sense. - As such, I have added code that detects whether the projection is orthographic based on `projection[3][3] == 1.0` and then implemented the orthographic mapping case throughout (when computing cluster AABBs, and when mapping a view space position (or light) to a cluster id in both the rust and shader code). ## Screenshots Before: ![before](https://user-images.githubusercontent.com/302146/145847278-5b1bca74-fbad-4cc5-8b49-384f6a377fdc.png) After: <img width="1392" alt="Screenshot 2021-12-13 at 16 36 53" src="https://user-images.githubusercontent.com/302146/145847314-6f3a2035-5d87-4896-8032-0c3e35e15b7d.png"> Old renderer (slightly lighter due to slight difference in configured intensity): <img width="1392" alt="Screenshot 2021-12-13 at 16 42 23" src="https://user-images.githubusercontent.com/302146/145847391-6a5e6fe0-22da-4fc1-a6c7-440543689a63.png">
2021-12-14 23:42:35 +00:00
pub fn calculate_cluster_factors(
near: f32,
far: f32,
z_slices: f32,
is_orthographic: bool,
) -> Vec2 {
if is_orthographic {
Vec2::new(-near, z_slices / (-far - -near))
} else {
let z_slices_of_ln_zfar_over_znear = (z_slices - 1.0) / ops::ln(far / near);
bevy_pbr2: Fix clustering for orthographic projections (#3316) # Objective PBR lighting was broken in the new renderer when using orthographic projections due to the way the depth slicing works for the clusters. Fix it. ## Solution - The default orthographic projection near plane is 0.0. The perspective projection depth slicing does a division by the near plane which gives a floating point NaN and the clustering all breaks down. - Orthographic projections have a linear depth mapping, so it made intuitive sense to me to do depth slicing with a linear mapping too. The alternative I saw was to try to handle the near plane being at 0.0 and using the exponential depth slicing, but that felt like a hack that didn't make sense. - As such, I have added code that detects whether the projection is orthographic based on `projection[3][3] == 1.0` and then implemented the orthographic mapping case throughout (when computing cluster AABBs, and when mapping a view space position (or light) to a cluster id in both the rust and shader code). ## Screenshots Before: ![before](https://user-images.githubusercontent.com/302146/145847278-5b1bca74-fbad-4cc5-8b49-384f6a377fdc.png) After: <img width="1392" alt="Screenshot 2021-12-13 at 16 36 53" src="https://user-images.githubusercontent.com/302146/145847314-6f3a2035-5d87-4896-8032-0c3e35e15b7d.png"> Old renderer (slightly lighter due to slight difference in configured intensity): <img width="1392" alt="Screenshot 2021-12-13 at 16 42 23" src="https://user-images.githubusercontent.com/302146/145847391-6a5e6fe0-22da-4fc1-a6c7-440543689a63.png">
2021-12-14 23:42:35 +00:00
Vec2::new(
z_slices_of_ln_zfar_over_znear,
ops::ln(near) * z_slices_of_ln_zfar_over_znear,
bevy_pbr2: Fix clustering for orthographic projections (#3316) # Objective PBR lighting was broken in the new renderer when using orthographic projections due to the way the depth slicing works for the clusters. Fix it. ## Solution - The default orthographic projection near plane is 0.0. The perspective projection depth slicing does a division by the near plane which gives a floating point NaN and the clustering all breaks down. - Orthographic projections have a linear depth mapping, so it made intuitive sense to me to do depth slicing with a linear mapping too. The alternative I saw was to try to handle the near plane being at 0.0 and using the exponential depth slicing, but that felt like a hack that didn't make sense. - As such, I have added code that detects whether the projection is orthographic based on `projection[3][3] == 1.0` and then implemented the orthographic mapping case throughout (when computing cluster AABBs, and when mapping a view space position (or light) to a cluster id in both the rust and shader code). ## Screenshots Before: ![before](https://user-images.githubusercontent.com/302146/145847278-5b1bca74-fbad-4cc5-8b49-384f6a377fdc.png) After: <img width="1392" alt="Screenshot 2021-12-13 at 16 36 53" src="https://user-images.githubusercontent.com/302146/145847314-6f3a2035-5d87-4896-8032-0c3e35e15b7d.png"> Old renderer (slightly lighter due to slight difference in configured intensity): <img width="1392" alt="Screenshot 2021-12-13 at 16 42 23" src="https://user-images.githubusercontent.com/302146/145847391-6a5e6fe0-22da-4fc1-a6c7-440543689a63.png">
2021-12-14 23:42:35 +00:00
)
}
}
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
// this method of constructing a basis from a vec3 is used by glam::Vec3::any_orthonormal_pair
// we will also construct it in the fragment shader and need our implementations to match,
// so we reproduce it here to avoid a mismatch if glam changes. we also switch the handedness
// could move this onto transform but it's pretty niche
Normalise matrix naming (#13489) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
pub(crate) fn spot_light_world_from_view(transform: &GlobalTransform) -> Mat4 {
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
// the matrix z_local (opposite of transform.forward())
let fwd_dir = transform.back().extend(0.0);
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let sign = 1f32.copysign(fwd_dir.z);
let a = -1.0 / (fwd_dir.z + sign);
let b = fwd_dir.x * fwd_dir.y * a;
let up_dir = Vec4::new(
1.0 + sign * fwd_dir.x * fwd_dir.x * a,
sign * b,
-sign * fwd_dir.x,
0.0,
);
let right_dir = Vec4::new(-b, -sign - fwd_dir.y * fwd_dir.y * a, fwd_dir.y, 0.0);
Mat4::from_cols(
right_dir,
up_dir,
fwd_dir,
transform.translation().extend(1.0),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
)
}
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
pub(crate) fn spot_light_clip_from_view(angle: f32, near_z: f32) -> Mat4 {
// spot light projection FOV is 2x the angle from spot light center to outer edge
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
Mat4::perspective_infinite_reverse_rh(angle * 2.0, 1.0, near_z)
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
}
2021-07-08 03:01:41 +00:00
#[allow(clippy::too_many_arguments)]
2021-06-02 02:59:17 +00:00
pub fn prepare_lights(
mut commands: Commands,
mut texture_cache: ResMut<TextureCache>,
2021-06-21 23:28:52 +00:00
render_device: Res<RenderDevice>,
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
render_queue: Res<RenderQueue>,
mut global_light_meta: ResMut<GlobalClusterableObjectMeta>,
2021-06-02 02:59:17 +00:00
mut light_meta: ResMut<LightMeta>,
views: Query<
(
Entity,
&ExtractedView,
&ExtractedClusterConfig,
Option<&RenderLayers>,
),
With<Camera3d>,
>,
ambient_light: Res<AmbientLight>,
point_light_shadow_map: Res<PointLightShadowMap>,
directional_light_shadow_map: Res<DirectionalLightShadowMap>,
mut shadow_render_phases: ResMut<ViewBinnedRenderPhases<Shadow>>,
mut max_directional_lights_warning_emitted: Local<bool>,
mut max_cascades_per_light_warning_emitted: Local<bool>,
point_lights: Query<(
Entity,
&ExtractedPointLight,
AnyOf<(&CubemapFrusta, &Frustum)>,
)>,
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
directional_lights: Query<(Entity, &ExtractedDirectionalLight)>,
mut live_shadow_mapping_lights: Local<EntityHashSet>,
2021-06-02 02:59:17 +00:00
) {
let views_iter = views.iter();
let views_count = views_iter.len();
let Some(mut view_gpu_lights_writer) =
light_meta
.view_gpu_lights
.get_writer(views_count, &render_device, &render_queue)
else {
return;
};
2021-06-02 02:59:17 +00:00
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
// Pre-calculate for PointLights
let cube_face_rotations = CUBE_MAP_FACES
.iter()
.map(|CubeMapFace { target, up }| Transform::IDENTITY.looking_at(*target, *up))
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
.collect::<Vec<_>>();
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
global_light_meta.entity_to_index.clear();
let mut point_lights: Vec<_> = point_lights.iter().collect::<Vec<_>>();
let mut directional_lights: Vec<_> = directional_lights.iter().collect::<Vec<_>>();
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(any(
not(feature = "webgl"),
not(target_arch = "wasm32"),
feature = "webgpu"
))]
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let max_texture_array_layers = render_device.limits().max_texture_array_layers as usize;
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(any(
not(feature = "webgl"),
not(target_arch = "wasm32"),
feature = "webgpu"
))]
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let max_texture_cubes = max_texture_array_layers / 6;
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let max_texture_array_layers = 1;
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let max_texture_cubes = 1;
if !*max_directional_lights_warning_emitted && directional_lights.len() > MAX_DIRECTIONAL_LIGHTS
{
warn!(
"The amount of directional lights of {} is exceeding the supported limit of {}.",
directional_lights.len(),
MAX_DIRECTIONAL_LIGHTS
);
*max_directional_lights_warning_emitted = true;
}
if !*max_cascades_per_light_warning_emitted
&& directional_lights
.iter()
.any(|(_, light)| light.cascade_shadow_config.bounds.len() > MAX_CASCADES_PER_LIGHT)
{
warn!(
"The number of cascades configured for a directional light exceeds the supported limit of {}.",
MAX_CASCADES_PER_LIGHT
);
*max_cascades_per_light_warning_emitted = true;
}
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let point_light_count = point_lights
.iter()
.filter(|light| light.1.spot_light_angles.is_none())
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
.count();
let point_light_volumetric_enabled_count = point_lights
.iter()
.filter(|(_, light, _)| light.volumetric && light.spot_light_angles.is_none())
.count()
.min(max_texture_cubes);
let point_light_shadow_maps_count = point_lights
.iter()
.filter(|light| light.1.shadows_enabled && light.1.spot_light_angles.is_none())
.count()
.min(max_texture_cubes);
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
let directional_volumetric_enabled_count = directional_lights
.iter()
.take(MAX_DIRECTIONAL_LIGHTS)
.filter(|(_, light)| light.volumetric)
.count()
.min(max_texture_array_layers / MAX_CASCADES_PER_LIGHT);
let directional_shadow_enabled_count = directional_lights
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
.iter()
.take(MAX_DIRECTIONAL_LIGHTS)
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
.filter(|(_, light)| light.shadows_enabled)
.count()
.min(max_texture_array_layers / MAX_CASCADES_PER_LIGHT);
let spot_light_volumetric_enabled_count = point_lights
.iter()
.filter(|(_, light, _)| light.volumetric && light.spot_light_angles.is_some())
.count()
.min(max_texture_array_layers - directional_shadow_enabled_count * MAX_CASCADES_PER_LIGHT);
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let spot_light_shadow_maps_count = point_lights
.iter()
.filter(|(_, light, _)| light.shadows_enabled && light.spot_light_angles.is_some())
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
.count()
.min(max_texture_array_layers - directional_shadow_enabled_count * MAX_CASCADES_PER_LIGHT);
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
// Sort lights by
// - point-light vs spot-light, so that we can iterate point lights and spot lights in contiguous blocks in the fragment shader,
// - then those with shadows enabled first, so that the index can be used to render at most `point_light_shadow_maps_count`
// point light shadows and `spot_light_shadow_maps_count` spot light shadow maps,
// - then by entity as a stable key to ensure that a consistent set of lights are chosen if the light count limit is exceeded.
point_lights.sort_by(|(entity_1, light_1, _), (entity_2, light_2, _)| {
clusterable_object_order(
ClusterableObjectOrderData {
entity: entity_1,
shadows_enabled: &light_1.shadows_enabled,
is_volumetric_light: &light_1.volumetric,
is_spot_light: &light_1.spot_light_angles.is_some(),
},
ClusterableObjectOrderData {
entity: entity_2,
shadows_enabled: &light_2.shadows_enabled,
is_volumetric_light: &light_2.volumetric,
is_spot_light: &light_2.spot_light_angles.is_some(),
},
)
});
// Sort lights by
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
// - those with volumetric (and shadows) enabled first, so that the
// volumetric lighting pass can quickly find the volumetric lights;
// - then those with shadows enabled second, so that the index can be used
// to render at most `directional_light_shadow_maps_count` directional light
// shadows
// - then by entity as a stable key to ensure that a consistent set of
// lights are chosen if the light count limit is exceeded.
directional_lights.sort_by(|(entity_1, light_1), (entity_2, light_2)| {
directional_light_order(
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
(entity_1, &light_1.volumetric, &light_1.shadows_enabled),
(entity_2, &light_2.volumetric, &light_2.shadows_enabled),
)
});
if global_light_meta.entity_to_index.capacity() < point_lights.len() {
global_light_meta
.entity_to_index
.reserve(point_lights.len());
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
}
Use storage buffers for clustered forward point lights (#3989) # Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-04-07 16:16:35 +00:00
let mut gpu_point_lights = Vec::new();
for (index, &(entity, light, _)) in point_lights.iter().enumerate() {
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
let mut flags = PointLightFlags::NONE;
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
// Lights are sorted, shadow enabled lights are first
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
if light.shadows_enabled
&& (index < point_light_shadow_maps_count
|| (light.spot_light_angles.is_some()
&& index - point_light_count < spot_light_shadow_maps_count))
{
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
flags |= PointLightFlags::SHADOWS_ENABLED;
}
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
let cube_face_projection = Mat4::perspective_infinite_reverse_rh(
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
core::f32::consts::FRAC_PI_2,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
1.0,
light.shadow_map_near_z,
);
if light.shadows_enabled
&& light.volumetric
&& (index < point_light_volumetric_enabled_count
|| (light.spot_light_angles.is_some()
&& index - point_light_count < spot_light_volumetric_enabled_count))
{
flags |= PointLightFlags::VOLUMETRIC;
}
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let (light_custom_data, spot_light_tan_angle) = match light.spot_light_angles {
Some((inner, outer)) => {
let light_direction = light.transform.forward();
if light_direction.y.is_sign_negative() {
flags |= PointLightFlags::SPOT_LIGHT_Y_NEGATIVE;
}
let cos_outer = ops::cos(outer);
let spot_scale = 1.0 / f32::max(ops::cos(inner) - cos_outer, 1e-4);
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let spot_offset = -cos_outer * spot_scale;
(
// For spot lights: the direction (x,z), spot_scale and spot_offset
light_direction.xz().extend(spot_scale).extend(spot_offset),
ops::tan(outer),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
)
}
None => {
(
// For point lights: the lower-right 2x2 values of the projection matrix [2][2] [2][3] [3][2] [3][3]
Vec4::new(
cube_face_projection.z_axis.z,
cube_face_projection.z_axis.w,
cube_face_projection.w_axis.z,
cube_face_projection.w_axis.w,
),
// unused
0.0,
)
}
};
gpu_point_lights.push(GpuClusterableObject {
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
light_custom_data,
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
// premultiply color by intensity
// we don't use the alpha at all, so no reason to multiply only [0..3]
Migrate from `LegacyColor` to `bevy_color::Color` (#12163) # Objective - As part of the migration process we need to a) see the end effect of the migration on user ergonomics b) check for serious perf regressions c) actually migrate the code - To accomplish this, I'm going to attempt to migrate all of the remaining user-facing usages of `LegacyColor` in one PR, being careful to keep a clean commit history. - Fixes #12056. ## Solution I've chosen to use the polymorphic `Color` type as our standard user-facing API. - [x] Migrate `bevy_gizmos`. - [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs - [x] Migrate sprites - [x] Migrate UI - [x] Migrate `ColorMaterial` - [x] Migrate `MaterialMesh2D` - [x] Migrate fog - [x] Migrate lights - [x] Migrate StandardMaterial - [x] Migrate wireframes - [x] Migrate clear color - [x] Migrate text - [x] Migrate gltf loader - [x] Register color types for reflection - [x] Remove `LegacyColor` - [x] Make sure CI passes Incidental improvements to ease migration: - added `Color::srgba_u8`, `Color::srgba_from_array` and friends - added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the `Alpha` trait - add and immediately deprecate (lol) `Color::rgb` and friends in favor of more explicit and consistent `Color::srgb` - standardized on white and black for most example text colors - added vector field traits to `LinearRgba`: ~~`Add`, `Sub`, `AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications and divisions do not scale alpha. `Add` and `Sub` have been cut from this PR. - added `LinearRgba` and `Srgba` `RED/GREEN/BLUE` - added `LinearRgba_to_f32_array` and `LinearRgba::to_u32` ## Migration Guide Bevy's color types have changed! Wherever you used a `bevy::render::Color`, a `bevy::color::Color` is used instead. These are quite similar! Both are enums storing a color in a specific color space (or to be more precise, using a specific color model). However, each of the different color models now has its own type. TODO... - `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`, `Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`, `Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`. - `Color::set_a` and `Color::a` is now `Color::set_alpha` and `Color::alpha`. These are part of the `Alpha` trait in `bevy_color`. - `Color::is_fully_transparent` is now part of the `Alpha` trait in `bevy_color` - `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for `g`, `b` `h`, `s` and `l` have been removed due to causing silent relatively expensive conversions. Convert your `Color` into the desired color space, perform your operations there, and then convert it back into a polymorphic `Color` enum. - `Color::hex` is now `Srgba::hex`. Call `.into` or construct a `Color::Srgba` variant manually to convert it. - `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`, `ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now store a `LinearRgba`, rather than a polymorphic `Color` - `Color::rgb_linear` and `Color::rgba_linear` are now `Color::linear_rgb` and `Color::linear_rgba` - The various CSS color constants are no longer stored directly on `Color`. Instead, they're defined in the `Srgba` color space, and accessed via `bevy::color::palettes::css`. Call `.into()` on them to convert them into a `Color` for quick debugging use, and consider using the much prettier `tailwind` palette for prototyping. - The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with the standard naming. - Vector field arithmetic operations on `Color` (add, subtract, multiply and divide by a f32) have been removed. Instead, convert your colors into `LinearRgba` space, and perform your operations explicitly there. This is particularly relevant when working with emissive or HDR colors, whose color channel values are routinely outside of the ordinary 0 to 1 range. - `Color::as_linear_rgba_f32` has been removed. Call `LinearRgba::to_f32_array` instead, converting if needed. - `Color::as_linear_rgba_u32` has been removed. Call `LinearRgba::to_u32` instead, converting if needed. - Several other color conversion methods to transform LCH or HSL colors into float arrays or `Vec` types have been removed. Please reimplement these externally or open a PR to re-add them if you found them particularly useful. - Various methods on `Color` such as `rgb` or `hsl` to convert the color into a specific color space have been removed. Convert into `LinearRgba`, then to the color space of your choice. - Various implicitly-converting color value methods on `Color` such as `r`, `g`, `b` or `h` have been removed. Please convert it into the color space of your choice, then check these properties. - `Color` no longer implements `AsBindGroup`. Store a `LinearRgba` internally instead to avoid conversion costs. --------- Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Afonso Lage <lage.afonso@gmail.com> Co-authored-by: Rob Parrett <robparrett@gmail.com> Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
color_inverse_square_range: (Vec4::from_slice(&light.color.to_f32_array())
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
* light.intensity)
.xyz()
.extend(1.0 / (light.range * light.range)),
position_radius: light.transform.translation().extend(light.radius),
flags: flags.bits(),
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
soft_shadow_size: if light.soft_shadows_enabled {
light.radius
} else {
0.0
},
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
shadow_depth_bias: light.shadow_depth_bias,
shadow_normal_bias: light.shadow_normal_bias,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
shadow_map_near_z: light.shadow_map_near_z,
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
spot_light_tan_angle,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
pad_a: 0.0,
pad_b: 0.0,
Use storage buffers for clustered forward point lights (#3989) # Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-04-07 16:16:35 +00:00
});
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
global_light_meta.entity_to_index.insert(entity, index);
}
let mut gpu_directional_lights = [GpuDirectionalLight::default(); MAX_DIRECTIONAL_LIGHTS];
let mut num_directional_cascades_enabled = 0usize;
for (index, (_light_entity, light)) in directional_lights
.iter()
.enumerate()
.take(MAX_DIRECTIONAL_LIGHTS)
{
let mut flags = DirectionalLightFlags::NONE;
Implement volumetric fog and volumetric lighting, also known as light shafts or god rays. (#13057) This commit implements a more physically-accurate, but slower, form of fog than the `bevy_pbr::fog` module does. Notably, this *volumetric fog* allows for light beams from directional lights to shine through, creating what is known as *light shafts* or *god rays*. To add volumetric fog to a scene, add `VolumetricFogSettings` to the camera, and add `VolumetricLight` to directional lights that you wish to be volumetric. `VolumetricFogSettings` has numerous settings that allow you to define the accuracy of the simulation, as well as the look of the fog. Currently, only interaction with directional lights that have shadow maps is supported. Note that the overhead of the effect scales directly with the number of directional lights in use, so apply `VolumetricLight` sparingly for the best results. The overall algorithm, which is implemented as a postprocessing effect, is a combination of the techniques described in [Scratchapixel] and [this blog post]. It uses raymarching in screen space, transformed into shadow map space for sampling and combined with physically-based modeling of absorption and scattering. Bevy employs the widely-used [Henyey-Greenstein phase function] to model asymmetry; this essentially allows light shafts to fade into and out of existence as the user views them. Volumetric rendering is a huge subject, and I deliberately kept the scope of this commit small. Possible follow-ups include: 1. Raymarching at a lower resolution. 2. A post-processing blur (especially useful when combined with (1)). 3. Supporting point lights and spot lights. 4. Supporting lights with no shadow maps. 5. Supporting irradiance volumes and reflection probes. 6. Voxel components that reuse the volumetric fog code to create voxel shapes. 7. *Horizon: Zero Dawn*-style clouds. These are all useful, but out of scope of this patch for now, to keep things tidy and easy to review. A new example, `volumetric_fog`, has been added to demonstrate the effect. ## Changelog ### Added * A new component, `VolumetricFog`, is available, to allow for a more physically-accurate, but more resource-intensive, form of fog. * A new component, `VolumetricLight`, can be placed on directional lights to make them interact with `VolumetricFog`. Notably, this allows such lights to emit light shafts/god rays. ![Screenshot 2024-04-21 162808](https://github.com/bevyengine/bevy/assets/157897/7a1fc81d-eed5-4735-9419-286c496391a9) ![Screenshot 2024-04-21 132005](https://github.com/bevyengine/bevy/assets/157897/e6d3b5ca-8f59-488d-a3de-15e95aaf4995) [Scratchapixel]: https://www.scratchapixel.com/lessons/3d-basic-rendering/volume-rendering-for-developers/intro-volume-rendering.html [this blog post]: https://www.alexandre-pestana.com/volumetric-lights/ [Henyey-Greenstein phase function]: https://www.pbr-book.org/4ed/Volume_Scattering/Phase_Functions#TheHenyeyndashGreensteinPhaseFunction
2024-05-16 17:13:18 +00:00
// Lights are sorted, volumetric and shadow enabled lights are first
if light.volumetric
&& light.shadows_enabled
&& (index < directional_volumetric_enabled_count)
{
flags |= DirectionalLightFlags::VOLUMETRIC;
}
// Shadow enabled lights are second
if light.shadows_enabled && (index < directional_shadow_enabled_count) {
flags |= DirectionalLightFlags::SHADOWS_ENABLED;
}
let num_cascades = light
.cascade_shadow_config
.bounds
.len()
.min(MAX_CASCADES_PER_LIGHT);
gpu_directional_lights[index] = GpuDirectionalLight {
// Set to true later when necessary.
skip: 0u32,
// Filled in later.
cascades: [GpuDirectionalCascade::default(); MAX_CASCADES_PER_LIGHT],
// premultiply color by illuminance
// we don't use the alpha at all, so no reason to multiply only [0..3]
Migrate from `LegacyColor` to `bevy_color::Color` (#12163) # Objective - As part of the migration process we need to a) see the end effect of the migration on user ergonomics b) check for serious perf regressions c) actually migrate the code - To accomplish this, I'm going to attempt to migrate all of the remaining user-facing usages of `LegacyColor` in one PR, being careful to keep a clean commit history. - Fixes #12056. ## Solution I've chosen to use the polymorphic `Color` type as our standard user-facing API. - [x] Migrate `bevy_gizmos`. - [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs - [x] Migrate sprites - [x] Migrate UI - [x] Migrate `ColorMaterial` - [x] Migrate `MaterialMesh2D` - [x] Migrate fog - [x] Migrate lights - [x] Migrate StandardMaterial - [x] Migrate wireframes - [x] Migrate clear color - [x] Migrate text - [x] Migrate gltf loader - [x] Register color types for reflection - [x] Remove `LegacyColor` - [x] Make sure CI passes Incidental improvements to ease migration: - added `Color::srgba_u8`, `Color::srgba_from_array` and friends - added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the `Alpha` trait - add and immediately deprecate (lol) `Color::rgb` and friends in favor of more explicit and consistent `Color::srgb` - standardized on white and black for most example text colors - added vector field traits to `LinearRgba`: ~~`Add`, `Sub`, `AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications and divisions do not scale alpha. `Add` and `Sub` have been cut from this PR. - added `LinearRgba` and `Srgba` `RED/GREEN/BLUE` - added `LinearRgba_to_f32_array` and `LinearRgba::to_u32` ## Migration Guide Bevy's color types have changed! Wherever you used a `bevy::render::Color`, a `bevy::color::Color` is used instead. These are quite similar! Both are enums storing a color in a specific color space (or to be more precise, using a specific color model). However, each of the different color models now has its own type. TODO... - `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`, `Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`, `Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`. - `Color::set_a` and `Color::a` is now `Color::set_alpha` and `Color::alpha`. These are part of the `Alpha` trait in `bevy_color`. - `Color::is_fully_transparent` is now part of the `Alpha` trait in `bevy_color` - `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for `g`, `b` `h`, `s` and `l` have been removed due to causing silent relatively expensive conversions. Convert your `Color` into the desired color space, perform your operations there, and then convert it back into a polymorphic `Color` enum. - `Color::hex` is now `Srgba::hex`. Call `.into` or construct a `Color::Srgba` variant manually to convert it. - `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`, `ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now store a `LinearRgba`, rather than a polymorphic `Color` - `Color::rgb_linear` and `Color::rgba_linear` are now `Color::linear_rgb` and `Color::linear_rgba` - The various CSS color constants are no longer stored directly on `Color`. Instead, they're defined in the `Srgba` color space, and accessed via `bevy::color::palettes::css`. Call `.into()` on them to convert them into a `Color` for quick debugging use, and consider using the much prettier `tailwind` palette for prototyping. - The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with the standard naming. - Vector field arithmetic operations on `Color` (add, subtract, multiply and divide by a f32) have been removed. Instead, convert your colors into `LinearRgba` space, and perform your operations explicitly there. This is particularly relevant when working with emissive or HDR colors, whose color channel values are routinely outside of the ordinary 0 to 1 range. - `Color::as_linear_rgba_f32` has been removed. Call `LinearRgba::to_f32_array` instead, converting if needed. - `Color::as_linear_rgba_u32` has been removed. Call `LinearRgba::to_u32` instead, converting if needed. - Several other color conversion methods to transform LCH or HSL colors into float arrays or `Vec` types have been removed. Please reimplement these externally or open a PR to re-add them if you found them particularly useful. - Various methods on `Color` such as `rgb` or `hsl` to convert the color into a specific color space have been removed. Convert into `LinearRgba`, then to the color space of your choice. - Various implicitly-converting color value methods on `Color` such as `r`, `g`, `b` or `h` have been removed. Please convert it into the color space of your choice, then check these properties. - `Color` no longer implements `AsBindGroup`. Store a `LinearRgba` internally instead to avoid conversion costs. --------- Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Afonso Lage <lage.afonso@gmail.com> Co-authored-by: Rob Parrett <robparrett@gmail.com> Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
color: Vec4::from_slice(&light.color.to_f32_array()) * light.illuminance,
// direction is negated to be ready for N.L
dir_to_light: light.transform.back().into(),
flags: flags.bits(),
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
soft_shadow_size: light.soft_shadow_size.unwrap_or_default(),
shadow_depth_bias: light.shadow_depth_bias,
shadow_normal_bias: light.shadow_normal_bias,
num_cascades: num_cascades as u32,
cascades_overlap_proportion: light.cascade_shadow_config.overlap_proportion,
depth_texture_base_index: num_directional_cascades_enabled as u32,
};
if index < directional_shadow_enabled_count {
num_directional_cascades_enabled += num_cascades;
}
}
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
global_light_meta
.gpu_clusterable_objects
.set(gpu_point_lights);
global_light_meta
.gpu_clusterable_objects
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
.write_buffer(&render_device, &render_queue);
live_shadow_mapping_lights.clear();
2021-06-02 02:59:17 +00:00
// set up light data for each view
for (entity, extracted_view, clusters, maybe_layers) in &views {
let point_light_depth_texture = texture_cache.get(
&render_device,
TextureDescriptor {
size: Extent3d {
width: point_light_shadow_map.size as u32,
height: point_light_shadow_map.size as u32,
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
depth_or_array_layers: point_light_shadow_maps_count.max(1) as u32 * 6,
},
mip_level_count: 1,
sample_count: 1,
dimension: TextureDimension::D2,
Deferred Renderer (#9258) # Objective - Add a [Deferred Renderer](https://en.wikipedia.org/wiki/Deferred_shading) to Bevy. - This allows subsequent passes to access per pixel material information before/during shading. - Accessing this per pixel material information is needed for some features, like GI. It also makes other features (ex. Decals) simpler to implement and/or improves their capability. There are multiple approaches to accomplishing this. The deferred shading approach works well given the limitations of WebGPU and WebGL2. Motivation: [I'm working on a GI solution for Bevy](https://youtu.be/eH1AkL-mwhI) # Solution - The deferred renderer is implemented with a prepass and a deferred lighting pass. - The prepass renders opaque objects into the Gbuffer attachment (`Rgba32Uint`). The PBR shader generates a `PbrInput` in mostly the same way as the forward implementation and then [packs it into the Gbuffer](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/render/pbr.wgsl#L168). - The deferred lighting pass unpacks the `PbrInput` and [feeds it into the pbr() function](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl#L65), then outputs the shaded color data. - There is now a resource [DefaultOpaqueRendererMethod](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L599) that can be used to set the default render method for opaque materials. If materials return `None` from [opaque_render_method()](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L131) the `DefaultOpaqueRendererMethod` will be used. Otherwise, custom materials can also explicitly choose to only support Deferred or Forward by returning the respective [OpaqueRendererMethod](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L603) - Deferred materials can be used seamlessly along with both opaque and transparent forward rendered materials in the same scene. The [deferred rendering example](https://github.com/DGriffin91/bevy/blob/deferred/examples/3d/deferred_rendering.rs) does this. - The deferred renderer does not support MSAA. If any deferred materials are used, MSAA must be disabled. Both TAA and FXAA are supported. - Deferred rendering supports WebGL2/WebGPU. ## Custom deferred materials - Custom materials can support both deferred and forward at the same time. The [StandardMaterial](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/render/pbr.wgsl#L166) does this. So does [this example](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56). - Custom deferred materials that require PBR lighting can create a `PbrInput`, write it to the deferred GBuffer and let it be rendered by the `PBRDeferredLightingPlugin`. - Custom deferred materials that require custom lighting have two options: 1. Use the base_color channel of the `PbrInput` combined with the `STANDARD_MATERIAL_FLAGS_UNLIT_BIT` flag. [Example.](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56) (If the unlit bit is set, the base_color is stored as RGB9E5 for extra precision) 2. A Custom Deferred Lighting pass can be created, either overriding the default, or running in addition. The a depth buffer is used to limit rendering to only the required fragments for each deferred lighting pass. Materials can set their respective depth id via the [deferred_lighting_pass_id](https://github.com/DGriffin91/bevy/blob/b79182d2a32cac28c4213c2457a53ac2cc885332/crates/bevy_pbr/src/prepass/prepass_io.wgsl#L95) attachment. The custom deferred lighting pass plugin can then set [its corresponding depth](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl#L37). Then with the lighting pass using [CompareFunction::Equal](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/mod.rs#L335), only the fragments with a depth that equal the corresponding depth written in the material will be rendered. Custom deferred lighting plugins can also be created to render the StandardMaterial. The default deferred lighting plugin can be bypassed with `DefaultPlugins.set(PBRDeferredLightingPlugin { bypass: true })` --------- Co-authored-by: nickrart <nickolas.g.russell@gmail.com>
2023-10-12 22:10:38 +00:00
format: CORE_3D_DEPTH_FORMAT,
label: Some("point_light_shadow_map_texture"),
usage: TextureUsages::RENDER_ATTACHMENT | TextureUsages::TEXTURE_BINDING,
Wgpu 0.15 (#7356) # Objective Update Bevy to wgpu 0.15. ## Changelog - Update to wgpu 0.15, wgpu-hal 0.15.1, and naga 0.11 - Users can now use the [DirectX Shader Compiler](https://github.com/microsoft/DirectXShaderCompiler) (DXC) on Windows with DX12 for faster shader compilation and ShaderModel 6.0+ support (requires `dxcompiler.dll` and `dxil.dll`, which are included in DXC downloads from [here](https://github.com/microsoft/DirectXShaderCompiler/releases/latest)) ## Migration Guide ### WGSL Top-Level `let` is now `const` All top level constants are now declared with `const`, catching up with the wgsl spec. `let` is no longer allowed at the global scope, only within functions. ```diff -let SOME_CONSTANT = 12.0; +const SOME_CONSTANT = 12.0; ``` #### `TextureDescriptor` and `SurfaceConfiguration` now requires a `view_formats` field The new `view_formats` field in the `TextureDescriptor` is used to specify a list of formats the texture can be re-interpreted to in a texture view. Currently only changing srgb-ness is allowed (ex. `Rgba8Unorm` <=> `Rgba8UnormSrgb`). You should set `view_formats` to `&[]` (empty) unless you have a specific reason not to. #### The DirectX Shader Compiler (DXC) is now supported on DX12 DXC is now the default shader compiler when using the DX12 backend. DXC is Microsoft's replacement for their legacy FXC compiler, and is faster, less buggy, and allows for modern shader features to be used (ShaderModel 6.0+). DXC requires `dxcompiler.dll` and `dxil.dll` to be available, otherwise it will log a warning and fall back to FXC. You can get `dxcompiler.dll` and `dxil.dll` by downloading the latest release from [Microsoft's DirectXShaderCompiler github repo](https://github.com/microsoft/DirectXShaderCompiler/releases/latest) and copying them into your project's root directory. These must be included when you distribute your Bevy game/app/etc if you plan on supporting the DX12 backend and are using DXC. `WgpuSettings` now has a `dx12_shader_compiler` field which can be used to choose between either FXC or DXC (if you pass None for the paths for DXC, it will check for the .dlls in the working directory).
2023-01-29 20:27:30 +00:00
view_formats: &[],
},
);
let directional_light_depth_texture = texture_cache.get(
2021-06-21 23:28:52 +00:00
&render_device,
2021-06-02 02:59:17 +00:00
TextureDescriptor {
size: Extent3d {
width: (directional_light_shadow_map.size as u32)
.min(render_device.limits().max_texture_dimension_2d),
height: (directional_light_shadow_map.size as u32)
.min(render_device.limits().max_texture_dimension_2d),
depth_or_array_layers: (num_directional_cascades_enabled
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
+ spot_light_shadow_maps_count)
.max(1) as u32,
},
2021-06-02 02:59:17 +00:00
mip_level_count: 1,
sample_count: 1,
dimension: TextureDimension::D2,
Deferred Renderer (#9258) # Objective - Add a [Deferred Renderer](https://en.wikipedia.org/wiki/Deferred_shading) to Bevy. - This allows subsequent passes to access per pixel material information before/during shading. - Accessing this per pixel material information is needed for some features, like GI. It also makes other features (ex. Decals) simpler to implement and/or improves their capability. There are multiple approaches to accomplishing this. The deferred shading approach works well given the limitations of WebGPU and WebGL2. Motivation: [I'm working on a GI solution for Bevy](https://youtu.be/eH1AkL-mwhI) # Solution - The deferred renderer is implemented with a prepass and a deferred lighting pass. - The prepass renders opaque objects into the Gbuffer attachment (`Rgba32Uint`). The PBR shader generates a `PbrInput` in mostly the same way as the forward implementation and then [packs it into the Gbuffer](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/render/pbr.wgsl#L168). - The deferred lighting pass unpacks the `PbrInput` and [feeds it into the pbr() function](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl#L65), then outputs the shaded color data. - There is now a resource [DefaultOpaqueRendererMethod](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L599) that can be used to set the default render method for opaque materials. If materials return `None` from [opaque_render_method()](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L131) the `DefaultOpaqueRendererMethod` will be used. Otherwise, custom materials can also explicitly choose to only support Deferred or Forward by returning the respective [OpaqueRendererMethod](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L603) - Deferred materials can be used seamlessly along with both opaque and transparent forward rendered materials in the same scene. The [deferred rendering example](https://github.com/DGriffin91/bevy/blob/deferred/examples/3d/deferred_rendering.rs) does this. - The deferred renderer does not support MSAA. If any deferred materials are used, MSAA must be disabled. Both TAA and FXAA are supported. - Deferred rendering supports WebGL2/WebGPU. ## Custom deferred materials - Custom materials can support both deferred and forward at the same time. The [StandardMaterial](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/render/pbr.wgsl#L166) does this. So does [this example](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56). - Custom deferred materials that require PBR lighting can create a `PbrInput`, write it to the deferred GBuffer and let it be rendered by the `PBRDeferredLightingPlugin`. - Custom deferred materials that require custom lighting have two options: 1. Use the base_color channel of the `PbrInput` combined with the `STANDARD_MATERIAL_FLAGS_UNLIT_BIT` flag. [Example.](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56) (If the unlit bit is set, the base_color is stored as RGB9E5 for extra precision) 2. A Custom Deferred Lighting pass can be created, either overriding the default, or running in addition. The a depth buffer is used to limit rendering to only the required fragments for each deferred lighting pass. Materials can set their respective depth id via the [deferred_lighting_pass_id](https://github.com/DGriffin91/bevy/blob/b79182d2a32cac28c4213c2457a53ac2cc885332/crates/bevy_pbr/src/prepass/prepass_io.wgsl#L95) attachment. The custom deferred lighting pass plugin can then set [its corresponding depth](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl#L37). Then with the lighting pass using [CompareFunction::Equal](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/mod.rs#L335), only the fragments with a depth that equal the corresponding depth written in the material will be rendered. Custom deferred lighting plugins can also be created to render the StandardMaterial. The default deferred lighting plugin can be bypassed with `DefaultPlugins.set(PBRDeferredLightingPlugin { bypass: true })` --------- Co-authored-by: nickrart <nickolas.g.russell@gmail.com>
2023-10-12 22:10:38 +00:00
format: CORE_3D_DEPTH_FORMAT,
label: Some("directional_light_shadow_map_texture"),
usage: TextureUsages::RENDER_ATTACHMENT | TextureUsages::TEXTURE_BINDING,
Wgpu 0.15 (#7356) # Objective Update Bevy to wgpu 0.15. ## Changelog - Update to wgpu 0.15, wgpu-hal 0.15.1, and naga 0.11 - Users can now use the [DirectX Shader Compiler](https://github.com/microsoft/DirectXShaderCompiler) (DXC) on Windows with DX12 for faster shader compilation and ShaderModel 6.0+ support (requires `dxcompiler.dll` and `dxil.dll`, which are included in DXC downloads from [here](https://github.com/microsoft/DirectXShaderCompiler/releases/latest)) ## Migration Guide ### WGSL Top-Level `let` is now `const` All top level constants are now declared with `const`, catching up with the wgsl spec. `let` is no longer allowed at the global scope, only within functions. ```diff -let SOME_CONSTANT = 12.0; +const SOME_CONSTANT = 12.0; ``` #### `TextureDescriptor` and `SurfaceConfiguration` now requires a `view_formats` field The new `view_formats` field in the `TextureDescriptor` is used to specify a list of formats the texture can be re-interpreted to in a texture view. Currently only changing srgb-ness is allowed (ex. `Rgba8Unorm` <=> `Rgba8UnormSrgb`). You should set `view_formats` to `&[]` (empty) unless you have a specific reason not to. #### The DirectX Shader Compiler (DXC) is now supported on DX12 DXC is now the default shader compiler when using the DX12 backend. DXC is Microsoft's replacement for their legacy FXC compiler, and is faster, less buggy, and allows for modern shader features to be used (ShaderModel 6.0+). DXC requires `dxcompiler.dll` and `dxil.dll` to be available, otherwise it will log a warning and fall back to FXC. You can get `dxcompiler.dll` and `dxil.dll` by downloading the latest release from [Microsoft's DirectXShaderCompiler github repo](https://github.com/microsoft/DirectXShaderCompiler/releases/latest) and copying them into your project's root directory. These must be included when you distribute your Bevy game/app/etc if you plan on supporting the DX12 backend and are using DXC. `WgpuSettings` now has a `dx12_shader_compiler` field which can be used to choose between either FXC or DXC (if you pass None for the paths for DXC, it will check for the .dlls in the working directory).
2023-01-29 20:27:30 +00:00
view_formats: &[],
2021-06-02 02:59:17 +00:00
},
);
let mut view_lights = Vec::new();
Normalise matrix naming (#13489) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
let is_orthographic = extracted_view.clip_from_view.w_axis.w == 1.0;
bevy_pbr2: Fix clustering for orthographic projections (#3316) # Objective PBR lighting was broken in the new renderer when using orthographic projections due to the way the depth slicing works for the clusters. Fix it. ## Solution - The default orthographic projection near plane is 0.0. The perspective projection depth slicing does a division by the near plane which gives a floating point NaN and the clustering all breaks down. - Orthographic projections have a linear depth mapping, so it made intuitive sense to me to do depth slicing with a linear mapping too. The alternative I saw was to try to handle the near plane being at 0.0 and using the exponential depth slicing, but that felt like a hack that didn't make sense. - As such, I have added code that detects whether the projection is orthographic based on `projection[3][3] == 1.0` and then implemented the orthographic mapping case throughout (when computing cluster AABBs, and when mapping a view space position (or light) to a cluster id in both the rust and shader code). ## Screenshots Before: ![before](https://user-images.githubusercontent.com/302146/145847278-5b1bca74-fbad-4cc5-8b49-384f6a377fdc.png) After: <img width="1392" alt="Screenshot 2021-12-13 at 16 36 53" src="https://user-images.githubusercontent.com/302146/145847314-6f3a2035-5d87-4896-8032-0c3e35e15b7d.png"> Old renderer (slightly lighter due to slight difference in configured intensity): <img width="1392" alt="Screenshot 2021-12-13 at 16 42 23" src="https://user-images.githubusercontent.com/302146/145847391-6a5e6fe0-22da-4fc1-a6c7-440543689a63.png">
2021-12-14 23:42:35 +00:00
let cluster_factors_zw = calculate_cluster_factors(
clusters.near,
Dynamic light clusters (#3968) # Objective provide some customisation for default cluster setup avoid "cluster index lists is full" in all cases (using a strategy outlined by @superdump) ## Solution Add ClusterConfig enum (which can be inserted into a view at any time) to allow specifying cluster setup with variants: - None (do not do any light assignment - for views which do not require light info, e.g. minimaps etc) - Single (one cluster) - XYZ (explicit cluster counts in each dimension) - FixedZ (most similar to current - specify Z-slices and total, then x and y counts are dynamically determined to give approximately square clusters based on current aspect ratio) Defaults to FixedZ { total: 4096, z: 24 } which is similar to the current setup. Per frame, estimate the number of indices that would be required for the current config and decrease the cluster counts / increase the cluster sizes in the x and y dimensions if the index list would be too small. notes: - I didn't put ClusterConfig in the camera bundles to avoid introducing a dependency from bevy_render to bevy_pbr. the ClusterConfig enum comes with a pbr-centric impl block so i didn't want to move that into bevy_render either. - ~Might want to add None variant to cluster config for views that don't care about lights?~ - Not well tested for orthographic - ~there's a cluster_muck branch on my repo which includes some diagnostics / a modified lighting example which may be useful for tyre-kicking~ (outdated, i will bring it up to date if required) anecdotal timings: FPS on the lighting demo is negligibly better (~5%), maybe due to a small optimisation constraining the light aabb to be in front of the camera FPS on the lighting demo with 100 extra lights added is ~33% faster, and also renders correctly as the cluster index count is no longer exceeded
2022-03-08 04:56:42 +00:00
clusters.far,
clusters.dimensions.z as f32,
bevy_pbr2: Fix clustering for orthographic projections (#3316) # Objective PBR lighting was broken in the new renderer when using orthographic projections due to the way the depth slicing works for the clusters. Fix it. ## Solution - The default orthographic projection near plane is 0.0. The perspective projection depth slicing does a division by the near plane which gives a floating point NaN and the clustering all breaks down. - Orthographic projections have a linear depth mapping, so it made intuitive sense to me to do depth slicing with a linear mapping too. The alternative I saw was to try to handle the near plane being at 0.0 and using the exponential depth slicing, but that felt like a hack that didn't make sense. - As such, I have added code that detects whether the projection is orthographic based on `projection[3][3] == 1.0` and then implemented the orthographic mapping case throughout (when computing cluster AABBs, and when mapping a view space position (or light) to a cluster id in both the rust and shader code). ## Screenshots Before: ![before](https://user-images.githubusercontent.com/302146/145847278-5b1bca74-fbad-4cc5-8b49-384f6a377fdc.png) After: <img width="1392" alt="Screenshot 2021-12-13 at 16 36 53" src="https://user-images.githubusercontent.com/302146/145847314-6f3a2035-5d87-4896-8032-0c3e35e15b7d.png"> Old renderer (slightly lighter due to slight difference in configured intensity): <img width="1392" alt="Screenshot 2021-12-13 at 16 42 23" src="https://user-images.githubusercontent.com/302146/145847391-6a5e6fe0-22da-4fc1-a6c7-440543689a63.png">
2021-12-14 23:42:35 +00:00
is_orthographic,
);
let n_clusters = clusters.dimensions.x * clusters.dimensions.y * clusters.dimensions.z;
let mut gpu_lights = GpuLights {
directional_lights: gpu_directional_lights,
Migrate from `LegacyColor` to `bevy_color::Color` (#12163) # Objective - As part of the migration process we need to a) see the end effect of the migration on user ergonomics b) check for serious perf regressions c) actually migrate the code - To accomplish this, I'm going to attempt to migrate all of the remaining user-facing usages of `LegacyColor` in one PR, being careful to keep a clean commit history. - Fixes #12056. ## Solution I've chosen to use the polymorphic `Color` type as our standard user-facing API. - [x] Migrate `bevy_gizmos`. - [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs - [x] Migrate sprites - [x] Migrate UI - [x] Migrate `ColorMaterial` - [x] Migrate `MaterialMesh2D` - [x] Migrate fog - [x] Migrate lights - [x] Migrate StandardMaterial - [x] Migrate wireframes - [x] Migrate clear color - [x] Migrate text - [x] Migrate gltf loader - [x] Register color types for reflection - [x] Remove `LegacyColor` - [x] Make sure CI passes Incidental improvements to ease migration: - added `Color::srgba_u8`, `Color::srgba_from_array` and friends - added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the `Alpha` trait - add and immediately deprecate (lol) `Color::rgb` and friends in favor of more explicit and consistent `Color::srgb` - standardized on white and black for most example text colors - added vector field traits to `LinearRgba`: ~~`Add`, `Sub`, `AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications and divisions do not scale alpha. `Add` and `Sub` have been cut from this PR. - added `LinearRgba` and `Srgba` `RED/GREEN/BLUE` - added `LinearRgba_to_f32_array` and `LinearRgba::to_u32` ## Migration Guide Bevy's color types have changed! Wherever you used a `bevy::render::Color`, a `bevy::color::Color` is used instead. These are quite similar! Both are enums storing a color in a specific color space (or to be more precise, using a specific color model). However, each of the different color models now has its own type. TODO... - `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`, `Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`, `Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`. - `Color::set_a` and `Color::a` is now `Color::set_alpha` and `Color::alpha`. These are part of the `Alpha` trait in `bevy_color`. - `Color::is_fully_transparent` is now part of the `Alpha` trait in `bevy_color` - `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for `g`, `b` `h`, `s` and `l` have been removed due to causing silent relatively expensive conversions. Convert your `Color` into the desired color space, perform your operations there, and then convert it back into a polymorphic `Color` enum. - `Color::hex` is now `Srgba::hex`. Call `.into` or construct a `Color::Srgba` variant manually to convert it. - `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`, `ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now store a `LinearRgba`, rather than a polymorphic `Color` - `Color::rgb_linear` and `Color::rgba_linear` are now `Color::linear_rgb` and `Color::linear_rgba` - The various CSS color constants are no longer stored directly on `Color`. Instead, they're defined in the `Srgba` color space, and accessed via `bevy::color::palettes::css`. Call `.into()` on them to convert them into a `Color` for quick debugging use, and consider using the much prettier `tailwind` palette for prototyping. - The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with the standard naming. - Vector field arithmetic operations on `Color` (add, subtract, multiply and divide by a f32) have been removed. Instead, convert your colors into `LinearRgba` space, and perform your operations explicitly there. This is particularly relevant when working with emissive or HDR colors, whose color channel values are routinely outside of the ordinary 0 to 1 range. - `Color::as_linear_rgba_f32` has been removed. Call `LinearRgba::to_f32_array` instead, converting if needed. - `Color::as_linear_rgba_u32` has been removed. Call `LinearRgba::to_u32` instead, converting if needed. - Several other color conversion methods to transform LCH or HSL colors into float arrays or `Vec` types have been removed. Please reimplement these externally or open a PR to re-add them if you found them particularly useful. - Various methods on `Color` such as `rgb` or `hsl` to convert the color into a specific color space have been removed. Convert into `LinearRgba`, then to the color space of your choice. - Various implicitly-converting color value methods on `Color` such as `r`, `g`, `b` or `h` have been removed. Please convert it into the color space of your choice, then check these properties. - `Color` no longer implements `AsBindGroup`. Store a `LinearRgba` internally instead to avoid conversion costs. --------- Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Afonso Lage <lage.afonso@gmail.com> Co-authored-by: Rob Parrett <robparrett@gmail.com> Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
ambient_color: Vec4::from_slice(&LinearRgba::from(ambient_light.color).to_f32_array())
* ambient_light.brightness,
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
cluster_factors: Vec4::new(
clusters.dimensions.x as f32 / extracted_view.viewport.z as f32,
clusters.dimensions.y as f32 / extracted_view.viewport.w as f32,
bevy_pbr2: Fix clustering for orthographic projections (#3316) # Objective PBR lighting was broken in the new renderer when using orthographic projections due to the way the depth slicing works for the clusters. Fix it. ## Solution - The default orthographic projection near plane is 0.0. The perspective projection depth slicing does a division by the near plane which gives a floating point NaN and the clustering all breaks down. - Orthographic projections have a linear depth mapping, so it made intuitive sense to me to do depth slicing with a linear mapping too. The alternative I saw was to try to handle the near plane being at 0.0 and using the exponential depth slicing, but that felt like a hack that didn't make sense. - As such, I have added code that detects whether the projection is orthographic based on `projection[3][3] == 1.0` and then implemented the orthographic mapping case throughout (when computing cluster AABBs, and when mapping a view space position (or light) to a cluster id in both the rust and shader code). ## Screenshots Before: ![before](https://user-images.githubusercontent.com/302146/145847278-5b1bca74-fbad-4cc5-8b49-384f6a377fdc.png) After: <img width="1392" alt="Screenshot 2021-12-13 at 16 36 53" src="https://user-images.githubusercontent.com/302146/145847314-6f3a2035-5d87-4896-8032-0c3e35e15b7d.png"> Old renderer (slightly lighter due to slight difference in configured intensity): <img width="1392" alt="Screenshot 2021-12-13 at 16 42 23" src="https://user-images.githubusercontent.com/302146/145847391-6a5e6fe0-22da-4fc1-a6c7-440543689a63.png">
2021-12-14 23:42:35 +00:00
cluster_factors_zw.x,
cluster_factors_zw.y,
Clustered forward rendering (#3153) # Objective Implement clustered-forward rendering. ## Solution ~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works. * The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+). * WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers. * Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct * The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space. * Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers. * Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR. * I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times. Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on. * Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is. * Improve the z-slicing to use a larger first slice. * More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …) * Switch to using a texture instead of uniform buffer * Figure out the / a better story for shadows I will also probably add an example that demonstrates some of the issues: * What situations exhaust the space available in the uniform buffers * Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters * Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light * Perhaps some performance issues * How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
),
cluster_dimensions: clusters.dimensions.extend(n_clusters),
n_directional_lights: directional_lights.iter().len().min(MAX_DIRECTIONAL_LIGHTS)
as u32,
// spotlight shadow maps are stored in the directional light array, starting at num_directional_cascades_enabled.
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
// the spot lights themselves start in the light array at point_light_count. so to go from light
// index to shadow map index, we need to subtract point light count and add directional shadowmap count.
spot_light_shadowmap_offset: num_directional_cascades_enabled as i32
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
- point_light_count as i32,
2021-06-02 02:59:17 +00:00
};
// TODO: this should select lights based on relevance to the view instead of the first ones that show up in a query
for &(light_entity, light, (point_light_frusta, _)) in point_lights
.iter()
// Lights are sorted, shadow enabled lights are first
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
.take(point_light_shadow_maps_count)
.filter(|(_, light, _)| light.shadows_enabled)
{
let light_index = *global_light_meta
.entity_to_index
.get(&light_entity)
.unwrap();
// ignore scale because we don't want to effectively scale light radius and range
// by applying those as a view transform to shadow map rendering of objects
// and ignore rotation because we want the shadow map projections to align with the axes
let view_translation = GlobalTransform::from_translation(light.transform.translation());
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
let cube_face_projection = Mat4::perspective_infinite_reverse_rh(
Add `core` and `alloc` over `std` Lints (#15281) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
core::f32::consts::FRAC_PI_2,
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
1.0,
light.shadow_map_near_z,
);
for (face_index, (view_rotation, frustum)) in cube_face_rotations
.iter()
.zip(&point_light_frusta.unwrap().frusta)
.enumerate()
{
let depth_texture_view =
point_light_depth_texture
.texture
.create_view(&TextureViewDescriptor {
label: Some("point_light_shadow_map_texture_view"),
format: None,
dimension: Some(TextureViewDimension::D2),
aspect: TextureAspect::All,
base_mip_level: 0,
mip_level_count: None,
base_array_layer: (light_index * 6 + face_index) as u32,
array_layer_count: Some(1u32),
});
let view_light_entity = commands
Spawn now takes a Bundle (#6054) # Objective Now that we can consolidate Bundles and Components under a single insert (thanks to #2975 and #6039), almost 100% of world spawns now look like `world.spawn().insert((Some, Tuple, Here))`. Spawning an entity without any components is an extremely uncommon pattern, so it makes sense to give spawn the "first class" ergonomic api. This consolidated api should be made consistent across all spawn apis (such as World and Commands). ## Solution All `spawn` apis (`World::spawn`, `Commands:;spawn`, `ChildBuilder::spawn`, and `WorldChildBuilder::spawn`) now accept a bundle as input: ```rust // before: commands .spawn() .insert((A, B, C)); world .spawn() .insert((A, B, C); // after commands.spawn((A, B, C)); world.spawn((A, B, C)); ``` All existing instances of `spawn_bundle` have been deprecated in favor of the new `spawn` api. A new `spawn_empty` has been added, replacing the old `spawn` api. By allowing `world.spawn(some_bundle)` to replace `world.spawn().insert(some_bundle)`, this opened the door to removing the initial entity allocation in the "empty" archetype / table done in `spawn()` (and subsequent move to the actual archetype in `.insert(some_bundle)`). This improves spawn performance by over 10%: ![image](https://user-images.githubusercontent.com/2694663/191627587-4ab2f949-4ccd-4231-80eb-80dd4d9ad6b9.png) To take this measurement, I added a new `world_spawn` benchmark. Unfortunately, optimizing `Commands::spawn` is slightly less trivial, as Commands expose the Entity id of spawned entities prior to actually spawning. Doing the optimization would (naively) require assurances that the `spawn(some_bundle)` command is applied before all other commands involving the entity (which would not necessarily be true, if memory serves). Optimizing `Commands::spawn` this way does feel possible, but it will require careful thought (and maybe some additional checks), which deserves its own PR. For now, it has the same performance characteristics of the current `Commands::spawn_bundle` on main. **Note that 99% of this PR is simple renames and refactors. The only code that needs careful scrutiny is the new `World::spawn()` impl, which is relatively straightforward, but it has some new unsafe code (which re-uses battle tested BundlerSpawner code path).** --- ## Changelog - All `spawn` apis (`World::spawn`, `Commands:;spawn`, `ChildBuilder::spawn`, and `WorldChildBuilder::spawn`) now accept a bundle as input - All instances of `spawn_bundle` have been deprecated in favor of the new `spawn` api - World and Commands now have `spawn_empty()`, which is equivalent to the old `spawn()` behavior. ## Migration Guide ```rust // Old (0.8): commands .spawn() .insert_bundle((A, B, C)); // New (0.9) commands.spawn((A, B, C)); // Old (0.8): commands.spawn_bundle((A, B, C)); // New (0.9) commands.spawn((A, B, C)); // Old (0.8): let entity = commands.spawn().id(); // New (0.9) let entity = commands.spawn_empty().id(); // Old (0.8) let entity = world.spawn().id(); // New (0.9) let entity = world.spawn_empty(); ```
2022-09-23 19:55:54 +00:00
.spawn((
ShadowView {
Keep track of when a texture is first cleared (#10325) # Objective - Custom render passes, or future passes in the engine (such as https://github.com/bevyengine/bevy/pull/10164) need a better way to know and indicate to the core passes whether the view color/depth/prepass attachments have been cleared or not yet this frame, to know if they should clear it themselves or load it. ## Solution - For all render targets (depth textures, shadow textures, prepass textures, main textures) use an atomic bool to track whether or not each texture has been cleared this frame. Abstracted away in the new ColorAttachment and DepthAttachment wrappers. --- ## Changelog - Changed `ViewTarget::get_color_attachment()`, removed arguments. - Changed `ViewTarget::get_unsampled_color_attachment()`, removed arguments. - Removed `Camera3d::clear_color`. - Removed `Camera2d::clear_color`. - Added `Camera::clear_color`. - Added `ExtractedCamera::clear_color`. - Added `ColorAttachment` and `DepthAttachment` wrappers. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - Core render passes now track when a texture is first bound as an attachment in order to decide whether to clear or load it. ## Migration Guide - Remove arguments to `ViewTarget::get_color_attachment()` and `ViewTarget::get_unsampled_color_attachment()`. - Configure clear color on `Camera` instead of on `Camera3d` and `Camera2d`. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - `ViewDepthTexture` must now be created via the `new()` method --------- Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-12-31 00:37:37 +00:00
depth_attachment: DepthAttachment::new(depth_texture_view, Some(0.0)),
pass_name: format!(
"shadow pass point light {} {}",
light_index,
face_index_to_name(face_index)
),
},
ExtractedView {
viewport: UVec4::new(
0,
0,
point_light_shadow_map.size as u32,
point_light_shadow_map.size as u32,
),
Normalise matrix naming (#13489) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
world_from_view: view_translation * *view_rotation,
clip_from_world: None,
clip_from_view: cube_face_projection,
separate tonemapping and upscaling passes (#3425) Attempt to make features like bloom https://github.com/bevyengine/bevy/pull/2876 easier to implement. **This PR:** - Moves the tonemapping from `pbr.wgsl` into a separate pass - also add a separate upscaling pass after the tonemapping which writes to the swap chain (enables resolution-independant rendering and post-processing after tonemapping) - adds a `hdr` bool to the camera which controls whether the pbr and sprite shaders render into a `Rgba16Float` texture **Open questions:** - ~should the 2d graph work the same as the 3d one?~ it is the same now - ~The current solution is a bit inflexible because while you can add a post processing pass that writes to e.g. the `hdr_texture`, you can't write to a separate `user_postprocess_texture` while reading the `hdr_texture` and tell the tone mapping pass to read from the `user_postprocess_texture` instead. If the tonemapping and upscaling render graph nodes were to take in a `TextureView` instead of the view entity this would almost work, but the bind groups for their respective input textures are already created in the `Queue` render stage in the hardcoded order.~ solved by creating bind groups in render node **New render graph:** ![render_graph](https://user-images.githubusercontent.com/22177966/147767249-57dd4229-cfab-4ec5-9bf3-dc76dccf8e8b.png) <details> <summary>Before</summary> ![render_graph_old](https://user-images.githubusercontent.com/22177966/147284579-c895fdbd-4028-41cf-914c-e1ffef60e44e.png) </details> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-10-26 20:13:59 +00:00
hdr: false,
color_grading: Default::default(),
},
*frustum,
LightEntity::Point {
light_entity,
face_index,
},
))
.id();
view_lights.push(view_light_entity);
shadow_render_phases.insert_or_clear(view_light_entity);
live_shadow_mapping_lights.insert(view_light_entity);
}
2021-06-02 02:59:17 +00:00
}
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
// spot lights
for (light_index, &(light_entity, light, (_, spot_light_frustum))) in point_lights
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
.iter()
.skip(point_light_count)
.take(spot_light_shadow_maps_count)
.enumerate()
{
Normalise matrix naming (#13489) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
let spot_world_from_view = spot_light_world_from_view(&light.transform);
let spot_world_from_view = spot_world_from_view.into();
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let angle = light.spot_light_angles.expect("lights should be sorted so that \
[point_light_count..point_light_count + spot_light_shadow_maps_count] are spot lights").1;
Implement percentage-closer soft shadows (PCSS). (#13497) [*Percentage-closer soft shadows*] are a technique from 2004 that allow shadows to become blurrier farther from the objects that cast them. It works by introducing a *blocker search* step that runs before the normal shadow map sampling. The blocker search step detects the difference between the depth of the fragment being rasterized and the depth of the nearby samples in the depth buffer. Larger depth differences result in a larger penumbra and therefore a blurrier shadow. To enable PCSS, fill in the `soft_shadow_size` value in `DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This shadow size value represents the size of the light and should be tuned as appropriate for your scene. Higher values result in a wider penumbra (i.e. blurrier shadows). When using PCSS, temporal shadow maps (`ShadowFilteringMethod::Temporal`) are recommended. If you don't use `ShadowFilteringMethod::Temporal` and instead use `ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as `Temporal`, but the result won't vary over time. This produces a rather noisy result. Doing better would likely require downsampling the shadow map, which would be complex and slower (and would require PR #13003 to land first). In addition to PCSS, this commit makes the near Z plane for the shadow map configurable on a per-light basis. Previously, it had been hardcoded to 0.1 meters. This change was necessary to make the point light shadow map in the example look reasonable, as otherwise the shadows appeared far too aliased. A new example, `pcss`, has been added. It demonstrates the percentage-closer soft shadow technique with directional lights, point lights, spot lights, non-temporal operation, and temporal operation. The assets are my original work. Both temporal and non-temporal shadows are rather noisy in the example, and, as mentioned before, this is unavoidable without downsampling the depth buffer, which we can't do yet. Note also that the shadows don't look particularly great for point lights; the example simply isn't an ideal scene for them. Nevertheless, I felt that the benefits of the ability to do a side-by-side comparison of directional and point lights outweighed the unsightliness of the point light shadows in that example, so I kept the point light feature in. Fixes #3631. [*Percentage-closer soft shadows*]: https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf ## Changelog ### Added * Percentage-closer soft shadows (PCSS) are now supported, allowing shadows to become blurrier as they stretch away from objects. To use them, set the `soft_shadow_size` field in `DirectionalLight`, `PointLight`, or `SpotLight`, as applicable. * The near Z value for shadow maps is now customizable via the `shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and `SpotLight`. ## Screenshots PCSS off: ![Screenshot 2024-05-24 120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86) PCSS on: ![Screenshot 2024-05-24 115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-09-18 18:07:17 +00:00
let spot_projection = spot_light_clip_from_view(angle, light.shadow_map_near_z);
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
let depth_texture_view =
directional_light_depth_texture
.texture
.create_view(&TextureViewDescriptor {
label: Some("spot_light_shadow_map_texture_view"),
format: None,
dimension: Some(TextureViewDimension::D2),
aspect: TextureAspect::All,
base_mip_level: 0,
mip_level_count: None,
base_array_layer: (num_directional_cascades_enabled + light_index) as u32,
array_layer_count: Some(1u32),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
});
let view_light_entity = commands
Spawn now takes a Bundle (#6054) # Objective Now that we can consolidate Bundles and Components under a single insert (thanks to #2975 and #6039), almost 100% of world spawns now look like `world.spawn().insert((Some, Tuple, Here))`. Spawning an entity without any components is an extremely uncommon pattern, so it makes sense to give spawn the "first class" ergonomic api. This consolidated api should be made consistent across all spawn apis (such as World and Commands). ## Solution All `spawn` apis (`World::spawn`, `Commands:;spawn`, `ChildBuilder::spawn`, and `WorldChildBuilder::spawn`) now accept a bundle as input: ```rust // before: commands .spawn() .insert((A, B, C)); world .spawn() .insert((A, B, C); // after commands.spawn((A, B, C)); world.spawn((A, B, C)); ``` All existing instances of `spawn_bundle` have been deprecated in favor of the new `spawn` api. A new `spawn_empty` has been added, replacing the old `spawn` api. By allowing `world.spawn(some_bundle)` to replace `world.spawn().insert(some_bundle)`, this opened the door to removing the initial entity allocation in the "empty" archetype / table done in `spawn()` (and subsequent move to the actual archetype in `.insert(some_bundle)`). This improves spawn performance by over 10%: ![image](https://user-images.githubusercontent.com/2694663/191627587-4ab2f949-4ccd-4231-80eb-80dd4d9ad6b9.png) To take this measurement, I added a new `world_spawn` benchmark. Unfortunately, optimizing `Commands::spawn` is slightly less trivial, as Commands expose the Entity id of spawned entities prior to actually spawning. Doing the optimization would (naively) require assurances that the `spawn(some_bundle)` command is applied before all other commands involving the entity (which would not necessarily be true, if memory serves). Optimizing `Commands::spawn` this way does feel possible, but it will require careful thought (and maybe some additional checks), which deserves its own PR. For now, it has the same performance characteristics of the current `Commands::spawn_bundle` on main. **Note that 99% of this PR is simple renames and refactors. The only code that needs careful scrutiny is the new `World::spawn()` impl, which is relatively straightforward, but it has some new unsafe code (which re-uses battle tested BundlerSpawner code path).** --- ## Changelog - All `spawn` apis (`World::spawn`, `Commands:;spawn`, `ChildBuilder::spawn`, and `WorldChildBuilder::spawn`) now accept a bundle as input - All instances of `spawn_bundle` have been deprecated in favor of the new `spawn` api - World and Commands now have `spawn_empty()`, which is equivalent to the old `spawn()` behavior. ## Migration Guide ```rust // Old (0.8): commands .spawn() .insert_bundle((A, B, C)); // New (0.9) commands.spawn((A, B, C)); // Old (0.8): commands.spawn_bundle((A, B, C)); // New (0.9) commands.spawn((A, B, C)); // Old (0.8): let entity = commands.spawn().id(); // New (0.9) let entity = commands.spawn_empty().id(); // Old (0.8) let entity = world.spawn().id(); // New (0.9) let entity = world.spawn_empty(); ```
2022-09-23 19:55:54 +00:00
.spawn((
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
ShadowView {
Keep track of when a texture is first cleared (#10325) # Objective - Custom render passes, or future passes in the engine (such as https://github.com/bevyengine/bevy/pull/10164) need a better way to know and indicate to the core passes whether the view color/depth/prepass attachments have been cleared or not yet this frame, to know if they should clear it themselves or load it. ## Solution - For all render targets (depth textures, shadow textures, prepass textures, main textures) use an atomic bool to track whether or not each texture has been cleared this frame. Abstracted away in the new ColorAttachment and DepthAttachment wrappers. --- ## Changelog - Changed `ViewTarget::get_color_attachment()`, removed arguments. - Changed `ViewTarget::get_unsampled_color_attachment()`, removed arguments. - Removed `Camera3d::clear_color`. - Removed `Camera2d::clear_color`. - Added `Camera::clear_color`. - Added `ExtractedCamera::clear_color`. - Added `ColorAttachment` and `DepthAttachment` wrappers. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - Core render passes now track when a texture is first bound as an attachment in order to decide whether to clear or load it. ## Migration Guide - Remove arguments to `ViewTarget::get_color_attachment()` and `ViewTarget::get_unsampled_color_attachment()`. - Configure clear color on `Camera` instead of on `Camera3d` and `Camera2d`. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - `ViewDepthTexture` must now be created via the `new()` method --------- Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-12-31 00:37:37 +00:00
depth_attachment: DepthAttachment::new(depth_texture_view, Some(0.0)),
pass_name: format!("shadow pass spot light {light_index}"),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
},
ExtractedView {
viewport: UVec4::new(
0,
0,
directional_light_shadow_map.size as u32,
directional_light_shadow_map.size as u32,
),
Normalise matrix naming (#13489) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
world_from_view: spot_world_from_view,
clip_from_view: spot_projection,
clip_from_world: None,
separate tonemapping and upscaling passes (#3425) Attempt to make features like bloom https://github.com/bevyengine/bevy/pull/2876 easier to implement. **This PR:** - Moves the tonemapping from `pbr.wgsl` into a separate pass - also add a separate upscaling pass after the tonemapping which writes to the swap chain (enables resolution-independant rendering and post-processing after tonemapping) - adds a `hdr` bool to the camera which controls whether the pbr and sprite shaders render into a `Rgba16Float` texture **Open questions:** - ~should the 2d graph work the same as the 3d one?~ it is the same now - ~The current solution is a bit inflexible because while you can add a post processing pass that writes to e.g. the `hdr_texture`, you can't write to a separate `user_postprocess_texture` while reading the `hdr_texture` and tell the tone mapping pass to read from the `user_postprocess_texture` instead. If the tonemapping and upscaling render graph nodes were to take in a `TextureView` instead of the view entity this would almost work, but the bind groups for their respective input textures are already created in the `Queue` render stage in the hardcoded order.~ solved by creating bind groups in render node **New render graph:** ![render_graph](https://user-images.githubusercontent.com/22177966/147767249-57dd4229-cfab-4ec5-9bf3-dc76dccf8e8b.png) <details> <summary>Before</summary> ![render_graph_old](https://user-images.githubusercontent.com/22177966/147284579-c895fdbd-4028-41cf-914c-e1ffef60e44e.png) </details> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-10-26 20:13:59 +00:00
hdr: false,
color_grading: Default::default(),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
},
*spot_light_frustum.unwrap(),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
LightEntity::Spot { light_entity },
))
.id();
view_lights.push(view_light_entity);
shadow_render_phases.insert_or_clear(view_light_entity);
live_shadow_mapping_lights.insert(view_light_entity);
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
}
// directional lights
let mut directional_depth_texture_array_index = 0u32;
let view_layers = maybe_layers.unwrap_or_default();
for (light_index, &(light_entity, light)) in directional_lights
.iter()
.enumerate()
.take(MAX_DIRECTIONAL_LIGHTS)
{
let gpu_light = &mut gpu_lights.directional_lights[light_index];
// Check if the light intersects with the view.
if !view_layers.intersects(&light.render_layers) {
gpu_light.skip = 1u32;
continue;
}
// Only deal with cascades when shadows are enabled.
if (gpu_light.flags & DirectionalLightFlags::SHADOWS_ENABLED.bits()) == 0u32 {
continue;
}
let cascades = light
.cascades
.get(&entity)
.unwrap()
.iter()
.take(MAX_CASCADES_PER_LIGHT);
let frusta = light
.frusta
.get(&entity)
.unwrap()
.iter()
.take(MAX_CASCADES_PER_LIGHT);
for (cascade_index, ((cascade, frustum), bound)) in cascades
.zip(frusta)
.zip(&light.cascade_shadow_config.bounds)
.enumerate()
{
gpu_lights.directional_lights[light_index].cascades[cascade_index] =
GpuDirectionalCascade {
Normalise matrix naming (#13489) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
clip_from_world: cascade.clip_from_world,
texel_size: cascade.texel_size,
far_bound: *bound,
};
let depth_texture_view =
directional_light_depth_texture
.texture
.create_view(&TextureViewDescriptor {
label: Some("directional_light_shadow_map_array_texture_view"),
format: None,
dimension: Some(TextureViewDimension::D2),
aspect: TextureAspect::All,
base_mip_level: 0,
mip_level_count: None,
base_array_layer: directional_depth_texture_array_index,
array_layer_count: Some(1u32),
});
directional_depth_texture_array_index += 1;
let mut frustum = *frustum;
// Push the near clip plane out to infinity for directional lights
frustum.half_spaces[4] =
HalfSpace::new(frustum.half_spaces[4].normal().extend(f32::INFINITY));
let view_light_entity = commands
.spawn((
ShadowView {
Keep track of when a texture is first cleared (#10325) # Objective - Custom render passes, or future passes in the engine (such as https://github.com/bevyengine/bevy/pull/10164) need a better way to know and indicate to the core passes whether the view color/depth/prepass attachments have been cleared or not yet this frame, to know if they should clear it themselves or load it. ## Solution - For all render targets (depth textures, shadow textures, prepass textures, main textures) use an atomic bool to track whether or not each texture has been cleared this frame. Abstracted away in the new ColorAttachment and DepthAttachment wrappers. --- ## Changelog - Changed `ViewTarget::get_color_attachment()`, removed arguments. - Changed `ViewTarget::get_unsampled_color_attachment()`, removed arguments. - Removed `Camera3d::clear_color`. - Removed `Camera2d::clear_color`. - Added `Camera::clear_color`. - Added `ExtractedCamera::clear_color`. - Added `ColorAttachment` and `DepthAttachment` wrappers. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - Core render passes now track when a texture is first bound as an attachment in order to decide whether to clear or load it. ## Migration Guide - Remove arguments to `ViewTarget::get_color_attachment()` and `ViewTarget::get_unsampled_color_attachment()`. - Configure clear color on `Camera` instead of on `Camera3d` and `Camera2d`. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - `ViewDepthTexture` must now be created via the `new()` method --------- Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-12-31 00:37:37 +00:00
depth_attachment: DepthAttachment::new(depth_texture_view, Some(0.0)),
pass_name: format!(
"shadow pass directional light {light_index} cascade {cascade_index}"),
},
ExtractedView {
viewport: UVec4::new(
0,
0,
directional_light_shadow_map.size as u32,
directional_light_shadow_map.size as u32,
),
Normalise matrix naming (#13489) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.
2024-06-03 16:56:53 +00:00
world_from_view: GlobalTransform::from(cascade.world_from_cascade),
clip_from_view: cascade.clip_from_cascade,
clip_from_world: Some(cascade.clip_from_world),
hdr: false,
color_grading: Default::default(),
},
frustum,
LightEntity::Directional {
light_entity,
cascade_index,
},
))
.id();
view_lights.push(view_light_entity);
shadow_render_phases.insert_or_clear(view_light_entity);
live_shadow_mapping_lights.insert(view_light_entity);
}
}
let point_light_depth_texture_view =
point_light_depth_texture
2021-07-01 23:48:55 +00:00
.texture
.create_view(&TextureViewDescriptor {
label: Some("point_light_shadow_map_array_texture_view"),
2021-07-01 23:48:55 +00:00
format: None,
// NOTE: iOS Simulator is missing CubeArray support so we use Cube instead.
// See https://github.com/bevyengine/bevy/pull/12052 - remove if support is added.
#[cfg(all(
not(feature = "ios_simulator"),
any(
not(feature = "webgl"),
not(target_arch = "wasm32"),
feature = "webgpu"
)
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
))]
2021-07-01 23:48:55 +00:00
dimension: Some(TextureViewDimension::CubeArray),
#[cfg(any(
feature = "ios_simulator",
all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu"))
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
))]
dimension: Some(TextureViewDimension::Cube),
Deferred Renderer (#9258) # Objective - Add a [Deferred Renderer](https://en.wikipedia.org/wiki/Deferred_shading) to Bevy. - This allows subsequent passes to access per pixel material information before/during shading. - Accessing this per pixel material information is needed for some features, like GI. It also makes other features (ex. Decals) simpler to implement and/or improves their capability. There are multiple approaches to accomplishing this. The deferred shading approach works well given the limitations of WebGPU and WebGL2. Motivation: [I'm working on a GI solution for Bevy](https://youtu.be/eH1AkL-mwhI) # Solution - The deferred renderer is implemented with a prepass and a deferred lighting pass. - The prepass renders opaque objects into the Gbuffer attachment (`Rgba32Uint`). The PBR shader generates a `PbrInput` in mostly the same way as the forward implementation and then [packs it into the Gbuffer](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/render/pbr.wgsl#L168). - The deferred lighting pass unpacks the `PbrInput` and [feeds it into the pbr() function](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl#L65), then outputs the shaded color data. - There is now a resource [DefaultOpaqueRendererMethod](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L599) that can be used to set the default render method for opaque materials. If materials return `None` from [opaque_render_method()](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L131) the `DefaultOpaqueRendererMethod` will be used. Otherwise, custom materials can also explicitly choose to only support Deferred or Forward by returning the respective [OpaqueRendererMethod](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L603) - Deferred materials can be used seamlessly along with both opaque and transparent forward rendered materials in the same scene. The [deferred rendering example](https://github.com/DGriffin91/bevy/blob/deferred/examples/3d/deferred_rendering.rs) does this. - The deferred renderer does not support MSAA. If any deferred materials are used, MSAA must be disabled. Both TAA and FXAA are supported. - Deferred rendering supports WebGL2/WebGPU. ## Custom deferred materials - Custom materials can support both deferred and forward at the same time. The [StandardMaterial](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/render/pbr.wgsl#L166) does this. So does [this example](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56). - Custom deferred materials that require PBR lighting can create a `PbrInput`, write it to the deferred GBuffer and let it be rendered by the `PBRDeferredLightingPlugin`. - Custom deferred materials that require custom lighting have two options: 1. Use the base_color channel of the `PbrInput` combined with the `STANDARD_MATERIAL_FLAGS_UNLIT_BIT` flag. [Example.](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56) (If the unlit bit is set, the base_color is stored as RGB9E5 for extra precision) 2. A Custom Deferred Lighting pass can be created, either overriding the default, or running in addition. The a depth buffer is used to limit rendering to only the required fragments for each deferred lighting pass. Materials can set their respective depth id via the [deferred_lighting_pass_id](https://github.com/DGriffin91/bevy/blob/b79182d2a32cac28c4213c2457a53ac2cc885332/crates/bevy_pbr/src/prepass/prepass_io.wgsl#L95) attachment. The custom deferred lighting pass plugin can then set [its corresponding depth](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl#L37). Then with the lighting pass using [CompareFunction::Equal](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/mod.rs#L335), only the fragments with a depth that equal the corresponding depth written in the material will be rendered. Custom deferred lighting plugins can also be created to render the StandardMaterial. The default deferred lighting plugin can be bypassed with `DefaultPlugins.set(PBRDeferredLightingPlugin { bypass: true })` --------- Co-authored-by: nickrart <nickolas.g.russell@gmail.com>
2023-10-12 22:10:38 +00:00
aspect: TextureAspect::DepthOnly,
2021-07-01 23:48:55 +00:00
base_mip_level: 0,
mip_level_count: None,
2021-07-02 01:05:20 +00:00
base_array_layer: 0,
2021-07-01 23:48:55 +00:00
array_layer_count: None,
});
let directional_light_depth_texture_view = directional_light_depth_texture
.texture
.create_view(&TextureViewDescriptor {
label: Some("directional_light_shadow_map_array_texture_view"),
format: None,
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(any(
not(feature = "webgl"),
not(target_arch = "wasm32"),
feature = "webgpu"
))]
dimension: Some(TextureViewDimension::D2Array),
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]
dimension: Some(TextureViewDimension::D2),
Deferred Renderer (#9258) # Objective - Add a [Deferred Renderer](https://en.wikipedia.org/wiki/Deferred_shading) to Bevy. - This allows subsequent passes to access per pixel material information before/during shading. - Accessing this per pixel material information is needed for some features, like GI. It also makes other features (ex. Decals) simpler to implement and/or improves their capability. There are multiple approaches to accomplishing this. The deferred shading approach works well given the limitations of WebGPU and WebGL2. Motivation: [I'm working on a GI solution for Bevy](https://youtu.be/eH1AkL-mwhI) # Solution - The deferred renderer is implemented with a prepass and a deferred lighting pass. - The prepass renders opaque objects into the Gbuffer attachment (`Rgba32Uint`). The PBR shader generates a `PbrInput` in mostly the same way as the forward implementation and then [packs it into the Gbuffer](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/render/pbr.wgsl#L168). - The deferred lighting pass unpacks the `PbrInput` and [feeds it into the pbr() function](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl#L65), then outputs the shaded color data. - There is now a resource [DefaultOpaqueRendererMethod](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L599) that can be used to set the default render method for opaque materials. If materials return `None` from [opaque_render_method()](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L131) the `DefaultOpaqueRendererMethod` will be used. Otherwise, custom materials can also explicitly choose to only support Deferred or Forward by returning the respective [OpaqueRendererMethod](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/material.rs#L603) - Deferred materials can be used seamlessly along with both opaque and transparent forward rendered materials in the same scene. The [deferred rendering example](https://github.com/DGriffin91/bevy/blob/deferred/examples/3d/deferred_rendering.rs) does this. - The deferred renderer does not support MSAA. If any deferred materials are used, MSAA must be disabled. Both TAA and FXAA are supported. - Deferred rendering supports WebGL2/WebGPU. ## Custom deferred materials - Custom materials can support both deferred and forward at the same time. The [StandardMaterial](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/render/pbr.wgsl#L166) does this. So does [this example](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56). - Custom deferred materials that require PBR lighting can create a `PbrInput`, write it to the deferred GBuffer and let it be rendered by the `PBRDeferredLightingPlugin`. - Custom deferred materials that require custom lighting have two options: 1. Use the base_color channel of the `PbrInput` combined with the `STANDARD_MATERIAL_FLAGS_UNLIT_BIT` flag. [Example.](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56) (If the unlit bit is set, the base_color is stored as RGB9E5 for extra precision) 2. A Custom Deferred Lighting pass can be created, either overriding the default, or running in addition. The a depth buffer is used to limit rendering to only the required fragments for each deferred lighting pass. Materials can set their respective depth id via the [deferred_lighting_pass_id](https://github.com/DGriffin91/bevy/blob/b79182d2a32cac28c4213c2457a53ac2cc885332/crates/bevy_pbr/src/prepass/prepass_io.wgsl#L95) attachment. The custom deferred lighting pass plugin can then set [its corresponding depth](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl#L37). Then with the lighting pass using [CompareFunction::Equal](https://github.com/DGriffin91/bevy/blob/ec1465559f2c82001830e908fc02ff1d7c2efe51/crates/bevy_pbr/src/deferred/mod.rs#L335), only the fragments with a depth that equal the corresponding depth written in the material will be rendered. Custom deferred lighting plugins can also be created to render the StandardMaterial. The default deferred lighting plugin can be bypassed with `DefaultPlugins.set(PBRDeferredLightingPlugin { bypass: true })` --------- Co-authored-by: nickrart <nickolas.g.russell@gmail.com>
2023-10-12 22:10:38 +00:00
aspect: TextureAspect::DepthOnly,
base_mip_level: 0,
mip_level_count: None,
base_array_layer: 0,
array_layer_count: None,
});
2021-07-01 23:48:55 +00:00
Accept Bundles for insert and remove. Deprecate insert/remove_bundle (#6039) # Objective Take advantage of the "impl Bundle for Component" changes in #2975 / add the follow up changes discussed there. ## Solution - Change `insert` and `remove` to accept a Bundle instead of a Component (for both Commands and World) - Deprecate `insert_bundle`, `remove_bundle`, and `remove_bundle_intersection` - Add `remove_intersection` --- ## Changelog - Change `insert` and `remove` now accept a Bundle instead of a Component (for both Commands and World) - `insert_bundle` and `remove_bundle` are deprecated ## Migration Guide Replace `insert_bundle` with `insert`: ```rust // Old (0.8) commands.spawn().insert_bundle(SomeBundle::default()); // New (0.9) commands.spawn().insert(SomeBundle::default()); ``` Replace `remove_bundle` with `remove`: ```rust // Old (0.8) commands.entity(some_entity).remove_bundle::<SomeBundle>(); // New (0.9) commands.entity(some_entity).remove::<SomeBundle>(); ``` Replace `remove_bundle_intersection` with `remove_intersection`: ```rust // Old (0.8) world.entity_mut(some_entity).remove_bundle_intersection::<SomeBundle>(); // New (0.9) world.entity_mut(some_entity).remove_intersection::<SomeBundle>(); ``` Consider consolidating as many operations as possible to improve ergonomics and cut down on archetype moves: ```rust // Old (0.8) commands.spawn() .insert_bundle(SomeBundle::default()) .insert(SomeComponent); // New (0.9) - Option 1 commands.spawn().insert(( SomeBundle::default(), SomeComponent, )) // New (0.9) - Option 2 commands.spawn_bundle(( SomeBundle::default(), SomeComponent, )) ``` ## Next Steps Consider changing `spawn` to accept a bundle and deprecate `spawn_bundle`.
2022-09-21 21:47:53 +00:00
commands.entity(entity).insert((
ViewShadowBindings {
point_light_depth_texture: point_light_depth_texture.texture,
point_light_depth_texture_view,
directional_light_depth_texture: directional_light_depth_texture.texture,
directional_light_depth_texture_view,
},
ViewLightEntities {
lights: view_lights,
},
ViewLightsUniformOffset {
offset: view_gpu_lights_writer.write(&gpu_lights),
},
));
2021-06-02 02:59:17 +00:00
}
shadow_render_phases.retain(|entity, _| live_shadow_mapping_lights.contains(entity));
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
}
Generate `MeshUniform`s on the GPU via compute shader where available. (#12773) Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a *mesh input uniform* and adding a *mesh uniform building* compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the *extraction* phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. | Benchmark | This branch | `main` | Speedup | |------------------------|-------------|---------|---------| | `many_cubes -nfc` | 17.259 | 24.529 | 42.12% | | `many_cubes -nfc -vpi` | 302.116 | 312.123 | 3.31% | | `many_foxes` | 3.227 | 3.515 | 8.92% | Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`: ![Screenshot 2024-04-02 124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10) It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`: ![Screenshot 2024-04-02 124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94) There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`: ![Screenshot 2024-04-02 125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435) There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`: ![Screenshot 2024-04-02 125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2) There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`.
2024-04-10 05:33:32 +00:00
/// For each shadow cascade, iterates over all the meshes "visible" from it and
/// adds them to [`BinnedRenderPhase`]s or [`SortedRenderPhase`]s as
/// appropriate.
#[allow(clippy::too_many_arguments)]
pub fn queue_shadows<M: Material>(
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
shadow_draw_functions: Res<DrawFunctions<Shadow>>,
prepass_pipeline: Res<PrepassPipeline<M>>,
Pack multiple vertex and index arrays together into growable buffers. (#14257) This commit uses the [`offset-allocator`] crate to combine vertex and index arrays from different meshes into single buffers. Since the primary source of `wgpu` overhead is from validation and synchronization when switching buffers, this significantly improves Bevy's rendering performance on many scenes. This patch is a more flexible version of #13218, which also used slabs. Unlike #13218, which used slabs of a fixed size, this commit implements slabs that start small and can grow. In addition to reducing memory usage, supporting slab growth reduces the number of vertex and index buffer switches that need to happen during rendering, leading to improved performance. To prevent pathological fragmentation behavior, slabs are capped to a maximum size, and mesh arrays that are too large get their own dedicated slabs. As an additional improvement over #13218, this commit allows the application to customize all allocator heuristics. The `MeshAllocatorSettings` resource contains values that adjust the minimum and maximum slab sizes, the cutoff point at which meshes get their own dedicated slabs, and the rate at which slabs grow. Hopefully-sensible defaults have been chosen for each value. Unfortunately, WebGL 2 doesn't support the *base vertex* feature, which is necessary to pack vertex arrays from different meshes into the same buffer. `wgpu` represents this restriction as the downlevel flag `BASE_VERTEX`. This patch detects that bit and ensures that all vertex buffers get dedicated slabs on that platform. Even on WebGL 2, though, we can combine all *index* arrays into single buffers to reduce buffer changes, and we do so. The following measurements are on Bistro: Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup): ![Screenshot 2024-07-09 163521](https://github.com/bevyengine/bevy/assets/157897/5d83c824-c0ee-434c-bbaf-218ff7212c48) Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup): ![Screenshot 2024-07-09 163559](https://github.com/bevyengine/bevy/assets/157897/d94e2273-c3a0-496a-9f88-20d394129610) Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup): ![Screenshot 2024-07-09 163536](https://github.com/bevyengine/bevy/assets/157897/e4ef6e48-d60e-44ae-9a71-b9a731c99d9a) ## Migration Guide ### Changed * Vertex and index buffers for meshes may now be packed alongside other buffers, for performance. * `GpuMesh` has been renamed to `RenderMesh`, to reflect the fact that it no longer directly stores handles to GPU objects. * Because meshes no longer have their own vertex and index buffers, the responsibility for the buffers has moved from `GpuMesh` (now called `RenderMesh`) to the `MeshAllocator` resource. To access the vertex data for a mesh, use `MeshAllocator::mesh_vertex_slice`. To access the index data for a mesh, use `MeshAllocator::mesh_index_slice`. [`offset-allocator`]: https://github.com/pcwalton/offset-allocator
2024-07-16 20:33:15 +00:00
render_meshes: Res<RenderAssets<RenderMesh>>,
render_mesh_instances: Res<RenderMeshInstances>,
render_materials: Res<RenderAssets<PreparedMaterial<M>>>,
Use EntityHashMap<Entity, T> for render world entity storage for better performance (#9903) # Objective - Improve rendering performance, particularly by avoiding the large system commands costs of using the ECS in the way that the render world does. ## Solution - Define `EntityHasher` that calculates a hash from the `Entity.to_bits()` by `i | (i.wrapping_mul(0x517cc1b727220a95) << 32)`. `0x517cc1b727220a95` is something like `u64::MAX / N` for N that gives a value close to π and that works well for hashing. Thanks for @SkiFire13 for the suggestion and to @nicopap for alternative suggestions and discussion. This approach comes from `rustc-hash` (a.k.a. `FxHasher`) with some tweaks for the case of hashing an `Entity`. `FxHasher` and `SeaHasher` were also tested but were significantly slower. - Define `EntityHashMap` type that uses the `EntityHashser` - Use `EntityHashMap<Entity, T>` for render world entity storage, including: - `RenderMaterialInstances` - contains the `AssetId<M>` of the material associated with the entity. Also for 2D. - `RenderMeshInstances` - contains mesh transforms, flags and properties about mesh entities. Also for 2D. - `SkinIndices` and `MorphIndices` - contains the skin and morph index for an entity, respectively - `ExtractedSprites` - `ExtractedUiNodes` ## Benchmarks All benchmarks have been conducted on an M1 Max connected to AC power. The tests are run for 1500 frames. The 1000th frame is captured for comparison to check for visual regressions. There were none. ### 2D Meshes `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` #### `--ordered-z` This test spawns the 2D meshes with z incrementing back to front, which is the ideal arrangement allocation order as it matches the sorted render order which means lookups have a high cache hit rate. <img width="1112" alt="Screenshot 2023-09-27 at 07 50 45" src="https://github.com/bevyengine/bevy/assets/302146/e140bc98-7091-4a3b-8ae1-ab75d16d2ccb"> -39.1% median frame time. #### Random This test spawns the 2D meshes with random z. This not only makes the batching and transparent 2D pass lookups get a lot of cache misses, it also currently means that the meshes are almost certain to not be batchable. <img width="1108" alt="Screenshot 2023-09-27 at 07 51 28" src="https://github.com/bevyengine/bevy/assets/302146/29c2e813-645a-43ce-982a-55df4bf7d8c4"> -7.2% median frame time. ### 3D Meshes `many_cubes --benchmark` <img width="1112" alt="Screenshot 2023-09-27 at 07 51 57" src="https://github.com/bevyengine/bevy/assets/302146/1a729673-3254-4e2a-9072-55e27c69f0fc"> -7.7% median frame time. ### Sprites **NOTE: On `main` sprites are using `SparseSet<Entity, T>`!** `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` #### `--ordered-z` This test spawns the sprites with z incrementing back to front, which is the ideal arrangement allocation order as it matches the sorted render order which means lookups have a high cache hit rate. <img width="1116" alt="Screenshot 2023-09-27 at 07 52 31" src="https://github.com/bevyengine/bevy/assets/302146/bc8eab90-e375-4d31-b5cd-f55f6f59ab67"> +13.0% median frame time. #### Random This test spawns the sprites with random z. This makes the batching and transparent 2D pass lookups get a lot of cache misses. <img width="1109" alt="Screenshot 2023-09-27 at 07 53 01" src="https://github.com/bevyengine/bevy/assets/302146/22073f5d-99a7-49b0-9584-d3ac3eac3033"> +0.6% median frame time. ### UI **NOTE: On `main` UI is using `SparseSet<Entity, T>`!** `many_buttons` <img width="1111" alt="Screenshot 2023-09-27 at 07 53 26" src="https://github.com/bevyengine/bevy/assets/302146/66afd56d-cbe4-49e7-8b64-2f28f6043d85"> +15.1% median frame time. ## Alternatives - Cart originally suggested trying out `SparseSet<Entity, T>` and indeed that is slightly faster under ideal conditions. However, `PassHashMap<Entity, T>` has better worst case performance when data is randomly distributed, rather than in sorted render order, and does not have the worst case memory usage that `SparseSet`'s dense `Vec<usize>` that maps from the `Entity` index to sparse index into `Vec<T>`. This dense `Vec` has to be as large as the largest Entity index used with the `SparseSet`. - I also tested `PassHashMap<u32, T>`, intending to use `Entity.index()` as the key, but this proved to sometimes be slower and mostly no different. - The only outstanding approach that has not been implemented and tested is to _not_ clear the render world of its entities each frame. That has its own problems, though they could perhaps be solved. - Performance-wise, if the entities and their component data were not cleared, then they would incur table moves on spawn, and should not thereafter, rather just their component data would be overwritten. Ideally we would have a neat way of either updating data in-place via `&mut T` queries, or inserting components if not present. This would likely be quite cumbersome to have to remember to do everywhere, but perhaps it only needs to be done in the more performance-sensitive systems. - The main problem to solve however is that we want to both maintain a mapping between main world entities and render world entities, be able to run the render app and world in parallel with the main app and world for pipelined rendering, and at the same time be able to spawn entities in the render world in such a way that those Entity ids do not collide with those spawned in the main world. This is potentially quite solvable, but could well be a lot of ECS work to do it in a way that makes sense. --- ## Changelog - Changed: Component data for entities to be drawn are no longer stored on entities in the render world. Instead, data is stored in a `EntityHashMap<Entity, T>` in various resources. This brings significant performance benefits due to the way the render app clears entities every frame. Resources of most interest are `RenderMeshInstances` and `RenderMaterialInstances`, and their 2D counterparts. ## Migration Guide Previously the render app extracted mesh entities and their component data from the main world and stored them as entities and components in the render world. Now they are extracted into essentially `EntityHashMap<Entity, T>` where `T` are structs containing an appropriate group of data. This means that while extract set systems will continue to run extract queries against the main world they will store their data in hash maps. Also, systems in later sets will either need to look up entities in the available resources such as `RenderMeshInstances`, or maintain their own `EntityHashMap<Entity, T>` for their own data. Before: ```rust fn queue_custom( material_meshes: Query<(Entity, &MeshTransforms, &Handle<Mesh>), With<InstanceMaterialData>>, ) { ... for (entity, mesh_transforms, mesh_handle) in &material_meshes { ... } } ``` After: ```rust fn queue_custom( render_mesh_instances: Res<RenderMeshInstances>, instance_entities: Query<Entity, With<InstanceMaterialData>>, ) { ... for entity in &instance_entities { let Some(mesh_instance) = render_mesh_instances.get(&entity) else { continue; }; // The mesh handle in `AssetId<Mesh>` form, and the `MeshTransforms` can now // be found in `mesh_instance` which is a `RenderMeshInstance` ... } } ``` --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
2023-09-27 08:28:28 +00:00
render_material_instances: Res<RenderMaterialInstances<M>>,
mut shadow_render_phases: ResMut<ViewBinnedRenderPhases<Shadow>>,
mut pipelines: ResMut<SpecializedMeshPipelines<PrepassPipeline<M>>>,
pipeline_cache: Res<PipelineCache>,
render_lightmaps: Res<RenderLightmaps>,
view_lights: Query<(Entity, &ViewLightEntities)>,
view_light_entities: Query<&LightEntity>,
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
point_light_entities: Query<&CubemapVisibleEntities, With<ExtractedPointLight>>,
directional_light_entities: Query<&CascadesVisibleEntities, With<ExtractedDirectionalLight>>,
spot_light_entities: Query<&VisibleMeshEntities, With<ExtractedPointLight>>,
) where
M::Data: PartialEq + Eq + Hash + Clone,
{
for (entity, view_lights) in &view_lights {
let draw_shadow_mesh = shadow_draw_functions.read().id::<DrawPrepass<M>>();
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
for view_light_entity in view_lights.lights.iter().copied() {
let Ok(light_entity) = view_light_entities.get(view_light_entity) else {
continue;
};
let Some(shadow_phase) = shadow_render_phases.get_mut(&view_light_entity) else {
continue;
};
let is_directional_light = matches!(light_entity, LightEntity::Directional { .. });
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
let visible_entities = match light_entity {
LightEntity::Directional {
light_entity,
cascade_index,
} => directional_light_entities
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
.get(*light_entity)
.expect("Failed to get directional light visible entities")
.entities
.get(&entity)
.expect("Failed to get directional light visible entities for view")
.get(*cascade_index)
.expect("Failed to get directional light visible entities for cascade"),
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
LightEntity::Point {
light_entity,
face_index,
} => point_light_entities
.get(*light_entity)
.expect("Failed to get point light visible entities")
.get(*face_index),
Spotlights (#4715) # Objective add spotlight support ## Solution / Changelog - add spotlight angles (inner, outer) to ``PointLight`` struct. emitted light is linearly attenuated from 100% to 0% as angle tends from inner to outer. Direction is taken from the existing transform rotation. - add spotlight direction (vec3) and angles (f32,f32) to ``GpuPointLight`` struct (60 bytes -> 80 bytes) in ``pbr/render/lights.rs`` and ``mesh_view_bind_group.wgsl`` - reduce no-buffer-support max point light count to 204 due to above - use spotlight data to attenuate light in ``pbr.wgsl`` - do additional cluster culling on spotlights to minimise cost in ``assign_lights_to_clusters`` - changed one of the lights in the lighting demo to a spotlight - also added a ``spotlight`` demo - probably not justified but so reviewers can see it more easily ## notes increasing the size of the GpuPointLight struct on my machine reduces the FPS of ``many_lights -- sphere`` from ~150fps to 140fps. i thought this was a reasonable tradeoff, and felt better than handling spotlights separately which is possible but would mean introducing a new bind group, refactoring light-assignment code and adding new spotlight-specific code in pbr.wgsl. the FPS impact for smaller numbers of lights should be very small. the cluster culling strategy reintroduces the cluster aabb code which was recently removed... sorry. the aabb is used to get a cluster bounding sphere, which can then be tested fairly efficiently using the strategy described at the end of https://bartwronski.com/2017/04/13/cull-that-cone/. this works well with roughly cubic clusters (where the cluster z size is close to the same as x/y size), less well for other cases like single Z slice / tiled forward rendering. In the worst case we will end up just keeping the culling of the equivalent point light. Co-authored-by: François <mockersf@gmail.com>
2022-07-08 19:57:43 +00:00
LightEntity::Spot { light_entity } => spot_light_entities
.get(*light_entity)
.expect("Failed to get spot light visible entities"),
Frustum culling (#2861) # Objective Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M. macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes. ## Solution - Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc. - I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps. - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](https://github.com/aclysma/rafx/blob/c91bd5fcfdfa3f4d1b43507c32d84b94ffdf1b2e/rafx-visibility/src/geometry/frustum.rs#L64-L92) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations. - When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free. - For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB. - The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice. - `RenderLayers` were copied over from the current renderer. - `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras. - `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false. - The `ComputedVisibility` is used to decide which meshes to extract. - I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win. I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
};
Micro-optimize `queue_material_meshes`, primarily to remove bit manipulation. (#12791) This commit makes the following optimizations: ## `MeshPipelineKey`/`BaseMeshPipelineKey` split `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`. Conceptually, `BaseMeshPipelineKey` is a superclass of `MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the highest (most significant) bit and grow downward toward the lowest bit; for `MeshPipelineKey`, the bits start at the lowest bit and grow upward toward the highest bit. This prevents them from colliding. The goal of this is to avoid having to reassemble bits of the pipeline key for every mesh every frame. Instead, we can just use a bitwise or operation to combine the pieces that make up a `MeshPipelineKey`. ## `specialize_slow` Previously, all of `specialize()` was marked as `#[inline]`. This bloated `queue_material_meshes` unnecessarily, as a large chunk of it ended up being a slow path that was rarely hit. This commit refactors the function to move the slow path to `specialize_slow()`. Together, these two changes shave about 5% off `queue_material_meshes`: ![Screenshot 2024-03-29 130002](https://github.com/bevyengine/bevy/assets/157897/a7e5a994-a807-4328-b314-9003429dcdd2) ## Migration Guide - The `primitive_topology` field on `GpuMesh` is now an accessor method: `GpuMesh::primitive_topology()`. - For performance reasons, `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render`, and `MeshPipelineKey`, which lives in `bevy_pbr`. These two should be combined with bitwise-or to produce the final `MeshPipelineKey`.
2024-04-01 21:58:53 +00:00
let mut light_key = MeshPipelineKey::DEPTH_PREPASS;
light_key.set(MeshPipelineKey::DEPTH_CLAMP_ORTHO, is_directional_light);
Divide the single `VisibleEntities` list into separate lists for 2D meshes, 3D meshes, lights, and UI elements, for performance. (#12582) This commit splits `VisibleEntities::entities` into four separate lists: one for lights, one for 2D meshes, one for 3D meshes, and one for UI elements. This allows `queue_material_meshes` and similar methods to avoid examining entities that are obviously irrelevant. In particular, this separation helps scenes with many skinned meshes, as the individual bones are considered visible entities but have no rendered appearance. Internally, `VisibleEntities::entities` is a `HashMap` from the `TypeId` representing a `QueryFilter` to the appropriate `Entity` list. I had to do this because `VisibleEntities` is located within an upstream crate from the crates that provide lights (`bevy_pbr`) and 2D meshes (`bevy_sprite`). As an added benefit, this setup allows apps to provide their own types of renderable components, by simply adding a specialized `check_visibility` to the schedule. This provides a 16.23% end-to-end speedup on `many_foxes` with 10,000 foxes (24.06 ms/frame to 20.70 ms/frame). ## Migration guide * `check_visibility` and `VisibleEntities` now store the four types of renderable entities--2D meshes, 3D meshes, lights, and UI elements--separately. If your custom rendering code examines `VisibleEntities`, it will now need to specify which type of entity it's interested in using the `WithMesh2d`, `WithMesh`, `WithLight`, and `WithNode` types respectively. If your app introduces a new type of renderable entity, you'll need to add an explicit call to `check_visibility` to the schedule to accommodate your new component or components. ## Analysis `many_foxes`, 10,000 foxes: `main`: ![Screenshot 2024-03-31 114444](https://github.com/bevyengine/bevy/assets/157897/16ecb2ff-6e04-46c0-a4b0-b2fde2084bad) `many_foxes`, 10,000 foxes, this branch: ![Screenshot 2024-03-31 114256](https://github.com/bevyengine/bevy/assets/157897/94dedae4-bd00-45b2-9aaf-dfc237004ddb) `queue_material_meshes` (yellow = this branch, red = `main`): ![Screenshot 2024-03-31 114637](https://github.com/bevyengine/bevy/assets/157897/f90912bd-45bd-42c4-bd74-57d98a0f036e) `queue_shadows` (yellow = this branch, red = `main`): ![Screenshot 2024-03-31 114607](https://github.com/bevyengine/bevy/assets/157897/6ce693e3-20c0-4234-8ec9-a6f191299e2d)
2024-04-11 20:33:20 +00:00
// NOTE: Lights with shadow mapping disabled will have no visible entities
// so no meshes will be queued
for entity in visible_entities.iter().copied() {
Generate `MeshUniform`s on the GPU via compute shader where available. (#12773) Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a *mesh input uniform* and adding a *mesh uniform building* compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the *extraction* phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. | Benchmark | This branch | `main` | Speedup | |------------------------|-------------|---------|---------| | `many_cubes -nfc` | 17.259 | 24.529 | 42.12% | | `many_cubes -nfc -vpi` | 302.116 | 312.123 | 3.31% | | `many_foxes` | 3.227 | 3.515 | 8.92% | Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`: ![Screenshot 2024-04-02 124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10) It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`: ![Screenshot 2024-04-02 124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94) There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`: ![Screenshot 2024-04-02 125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435) There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`: ![Screenshot 2024-04-02 125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2) There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`.
2024-04-10 05:33:32 +00:00
let Some(mesh_instance) = render_mesh_instances.render_mesh_queue_data(entity)
else {
continue;
};
Generate `MeshUniform`s on the GPU via compute shader where available. (#12773) Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a *mesh input uniform* and adding a *mesh uniform building* compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the *extraction* phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. | Benchmark | This branch | `main` | Speedup | |------------------------|-------------|---------|---------| | `many_cubes -nfc` | 17.259 | 24.529 | 42.12% | | `many_cubes -nfc -vpi` | 302.116 | 312.123 | 3.31% | | `many_foxes` | 3.227 | 3.515 | 8.92% | Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`: ![Screenshot 2024-04-02 124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10) It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`: ![Screenshot 2024-04-02 124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94) There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`: ![Screenshot 2024-04-02 125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435) There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`: ![Screenshot 2024-04-02 125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2) There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`.
2024-04-10 05:33:32 +00:00
if !mesh_instance
.flags
.contains(RenderMeshInstanceFlags::SHADOW_CASTER)
{
Use EntityHashMap<Entity, T> for render world entity storage for better performance (#9903) # Objective - Improve rendering performance, particularly by avoiding the large system commands costs of using the ECS in the way that the render world does. ## Solution - Define `EntityHasher` that calculates a hash from the `Entity.to_bits()` by `i | (i.wrapping_mul(0x517cc1b727220a95) << 32)`. `0x517cc1b727220a95` is something like `u64::MAX / N` for N that gives a value close to π and that works well for hashing. Thanks for @SkiFire13 for the suggestion and to @nicopap for alternative suggestions and discussion. This approach comes from `rustc-hash` (a.k.a. `FxHasher`) with some tweaks for the case of hashing an `Entity`. `FxHasher` and `SeaHasher` were also tested but were significantly slower. - Define `EntityHashMap` type that uses the `EntityHashser` - Use `EntityHashMap<Entity, T>` for render world entity storage, including: - `RenderMaterialInstances` - contains the `AssetId<M>` of the material associated with the entity. Also for 2D. - `RenderMeshInstances` - contains mesh transforms, flags and properties about mesh entities. Also for 2D. - `SkinIndices` and `MorphIndices` - contains the skin and morph index for an entity, respectively - `ExtractedSprites` - `ExtractedUiNodes` ## Benchmarks All benchmarks have been conducted on an M1 Max connected to AC power. The tests are run for 1500 frames. The 1000th frame is captured for comparison to check for visual regressions. There were none. ### 2D Meshes `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` #### `--ordered-z` This test spawns the 2D meshes with z incrementing back to front, which is the ideal arrangement allocation order as it matches the sorted render order which means lookups have a high cache hit rate. <img width="1112" alt="Screenshot 2023-09-27 at 07 50 45" src="https://github.com/bevyengine/bevy/assets/302146/e140bc98-7091-4a3b-8ae1-ab75d16d2ccb"> -39.1% median frame time. #### Random This test spawns the 2D meshes with random z. This not only makes the batching and transparent 2D pass lookups get a lot of cache misses, it also currently means that the meshes are almost certain to not be batchable. <img width="1108" alt="Screenshot 2023-09-27 at 07 51 28" src="https://github.com/bevyengine/bevy/assets/302146/29c2e813-645a-43ce-982a-55df4bf7d8c4"> -7.2% median frame time. ### 3D Meshes `many_cubes --benchmark` <img width="1112" alt="Screenshot 2023-09-27 at 07 51 57" src="https://github.com/bevyengine/bevy/assets/302146/1a729673-3254-4e2a-9072-55e27c69f0fc"> -7.7% median frame time. ### Sprites **NOTE: On `main` sprites are using `SparseSet<Entity, T>`!** `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` #### `--ordered-z` This test spawns the sprites with z incrementing back to front, which is the ideal arrangement allocation order as it matches the sorted render order which means lookups have a high cache hit rate. <img width="1116" alt="Screenshot 2023-09-27 at 07 52 31" src="https://github.com/bevyengine/bevy/assets/302146/bc8eab90-e375-4d31-b5cd-f55f6f59ab67"> +13.0% median frame time. #### Random This test spawns the sprites with random z. This makes the batching and transparent 2D pass lookups get a lot of cache misses. <img width="1109" alt="Screenshot 2023-09-27 at 07 53 01" src="https://github.com/bevyengine/bevy/assets/302146/22073f5d-99a7-49b0-9584-d3ac3eac3033"> +0.6% median frame time. ### UI **NOTE: On `main` UI is using `SparseSet<Entity, T>`!** `many_buttons` <img width="1111" alt="Screenshot 2023-09-27 at 07 53 26" src="https://github.com/bevyengine/bevy/assets/302146/66afd56d-cbe4-49e7-8b64-2f28f6043d85"> +15.1% median frame time. ## Alternatives - Cart originally suggested trying out `SparseSet<Entity, T>` and indeed that is slightly faster under ideal conditions. However, `PassHashMap<Entity, T>` has better worst case performance when data is randomly distributed, rather than in sorted render order, and does not have the worst case memory usage that `SparseSet`'s dense `Vec<usize>` that maps from the `Entity` index to sparse index into `Vec<T>`. This dense `Vec` has to be as large as the largest Entity index used with the `SparseSet`. - I also tested `PassHashMap<u32, T>`, intending to use `Entity.index()` as the key, but this proved to sometimes be slower and mostly no different. - The only outstanding approach that has not been implemented and tested is to _not_ clear the render world of its entities each frame. That has its own problems, though they could perhaps be solved. - Performance-wise, if the entities and their component data were not cleared, then they would incur table moves on spawn, and should not thereafter, rather just their component data would be overwritten. Ideally we would have a neat way of either updating data in-place via `&mut T` queries, or inserting components if not present. This would likely be quite cumbersome to have to remember to do everywhere, but perhaps it only needs to be done in the more performance-sensitive systems. - The main problem to solve however is that we want to both maintain a mapping between main world entities and render world entities, be able to run the render app and world in parallel with the main app and world for pipelined rendering, and at the same time be able to spawn entities in the render world in such a way that those Entity ids do not collide with those spawned in the main world. This is potentially quite solvable, but could well be a lot of ECS work to do it in a way that makes sense. --- ## Changelog - Changed: Component data for entities to be drawn are no longer stored on entities in the render world. Instead, data is stored in a `EntityHashMap<Entity, T>` in various resources. This brings significant performance benefits due to the way the render app clears entities every frame. Resources of most interest are `RenderMeshInstances` and `RenderMaterialInstances`, and their 2D counterparts. ## Migration Guide Previously the render app extracted mesh entities and their component data from the main world and stored them as entities and components in the render world. Now they are extracted into essentially `EntityHashMap<Entity, T>` where `T` are structs containing an appropriate group of data. This means that while extract set systems will continue to run extract queries against the main world they will store their data in hash maps. Also, systems in later sets will either need to look up entities in the available resources such as `RenderMeshInstances`, or maintain their own `EntityHashMap<Entity, T>` for their own data. Before: ```rust fn queue_custom( material_meshes: Query<(Entity, &MeshTransforms, &Handle<Mesh>), With<InstanceMaterialData>>, ) { ... for (entity, mesh_transforms, mesh_handle) in &material_meshes { ... } } ``` After: ```rust fn queue_custom( render_mesh_instances: Res<RenderMeshInstances>, instance_entities: Query<Entity, With<InstanceMaterialData>>, ) { ... for entity in &instance_entities { let Some(mesh_instance) = render_mesh_instances.get(&entity) else { continue; }; // The mesh handle in `AssetId<Mesh>` form, and the `MeshTransforms` can now // be found in `mesh_instance` which is a `RenderMeshInstance` ... } } ``` --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
2023-09-27 08:28:28 +00:00
continue;
}
let Some(material_asset_id) = render_material_instances.get(&entity) else {
continue;
};
let Some(material) = render_materials.get(*material_asset_id) else {
continue;
};
Use EntityHashMap<Entity, T> for render world entity storage for better performance (#9903) # Objective - Improve rendering performance, particularly by avoiding the large system commands costs of using the ECS in the way that the render world does. ## Solution - Define `EntityHasher` that calculates a hash from the `Entity.to_bits()` by `i | (i.wrapping_mul(0x517cc1b727220a95) << 32)`. `0x517cc1b727220a95` is something like `u64::MAX / N` for N that gives a value close to π and that works well for hashing. Thanks for @SkiFire13 for the suggestion and to @nicopap for alternative suggestions and discussion. This approach comes from `rustc-hash` (a.k.a. `FxHasher`) with some tweaks for the case of hashing an `Entity`. `FxHasher` and `SeaHasher` were also tested but were significantly slower. - Define `EntityHashMap` type that uses the `EntityHashser` - Use `EntityHashMap<Entity, T>` for render world entity storage, including: - `RenderMaterialInstances` - contains the `AssetId<M>` of the material associated with the entity. Also for 2D. - `RenderMeshInstances` - contains mesh transforms, flags and properties about mesh entities. Also for 2D. - `SkinIndices` and `MorphIndices` - contains the skin and morph index for an entity, respectively - `ExtractedSprites` - `ExtractedUiNodes` ## Benchmarks All benchmarks have been conducted on an M1 Max connected to AC power. The tests are run for 1500 frames. The 1000th frame is captured for comparison to check for visual regressions. There were none. ### 2D Meshes `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` #### `--ordered-z` This test spawns the 2D meshes with z incrementing back to front, which is the ideal arrangement allocation order as it matches the sorted render order which means lookups have a high cache hit rate. <img width="1112" alt="Screenshot 2023-09-27 at 07 50 45" src="https://github.com/bevyengine/bevy/assets/302146/e140bc98-7091-4a3b-8ae1-ab75d16d2ccb"> -39.1% median frame time. #### Random This test spawns the 2D meshes with random z. This not only makes the batching and transparent 2D pass lookups get a lot of cache misses, it also currently means that the meshes are almost certain to not be batchable. <img width="1108" alt="Screenshot 2023-09-27 at 07 51 28" src="https://github.com/bevyengine/bevy/assets/302146/29c2e813-645a-43ce-982a-55df4bf7d8c4"> -7.2% median frame time. ### 3D Meshes `many_cubes --benchmark` <img width="1112" alt="Screenshot 2023-09-27 at 07 51 57" src="https://github.com/bevyengine/bevy/assets/302146/1a729673-3254-4e2a-9072-55e27c69f0fc"> -7.7% median frame time. ### Sprites **NOTE: On `main` sprites are using `SparseSet<Entity, T>`!** `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` #### `--ordered-z` This test spawns the sprites with z incrementing back to front, which is the ideal arrangement allocation order as it matches the sorted render order which means lookups have a high cache hit rate. <img width="1116" alt="Screenshot 2023-09-27 at 07 52 31" src="https://github.com/bevyengine/bevy/assets/302146/bc8eab90-e375-4d31-b5cd-f55f6f59ab67"> +13.0% median frame time. #### Random This test spawns the sprites with random z. This makes the batching and transparent 2D pass lookups get a lot of cache misses. <img width="1109" alt="Screenshot 2023-09-27 at 07 53 01" src="https://github.com/bevyengine/bevy/assets/302146/22073f5d-99a7-49b0-9584-d3ac3eac3033"> +0.6% median frame time. ### UI **NOTE: On `main` UI is using `SparseSet<Entity, T>`!** `many_buttons` <img width="1111" alt="Screenshot 2023-09-27 at 07 53 26" src="https://github.com/bevyengine/bevy/assets/302146/66afd56d-cbe4-49e7-8b64-2f28f6043d85"> +15.1% median frame time. ## Alternatives - Cart originally suggested trying out `SparseSet<Entity, T>` and indeed that is slightly faster under ideal conditions. However, `PassHashMap<Entity, T>` has better worst case performance when data is randomly distributed, rather than in sorted render order, and does not have the worst case memory usage that `SparseSet`'s dense `Vec<usize>` that maps from the `Entity` index to sparse index into `Vec<T>`. This dense `Vec` has to be as large as the largest Entity index used with the `SparseSet`. - I also tested `PassHashMap<u32, T>`, intending to use `Entity.index()` as the key, but this proved to sometimes be slower and mostly no different. - The only outstanding approach that has not been implemented and tested is to _not_ clear the render world of its entities each frame. That has its own problems, though they could perhaps be solved. - Performance-wise, if the entities and their component data were not cleared, then they would incur table moves on spawn, and should not thereafter, rather just their component data would be overwritten. Ideally we would have a neat way of either updating data in-place via `&mut T` queries, or inserting components if not present. This would likely be quite cumbersome to have to remember to do everywhere, but perhaps it only needs to be done in the more performance-sensitive systems. - The main problem to solve however is that we want to both maintain a mapping between main world entities and render world entities, be able to run the render app and world in parallel with the main app and world for pipelined rendering, and at the same time be able to spawn entities in the render world in such a way that those Entity ids do not collide with those spawned in the main world. This is potentially quite solvable, but could well be a lot of ECS work to do it in a way that makes sense. --- ## Changelog - Changed: Component data for entities to be drawn are no longer stored on entities in the render world. Instead, data is stored in a `EntityHashMap<Entity, T>` in various resources. This brings significant performance benefits due to the way the render app clears entities every frame. Resources of most interest are `RenderMeshInstances` and `RenderMaterialInstances`, and their 2D counterparts. ## Migration Guide Previously the render app extracted mesh entities and their component data from the main world and stored them as entities and components in the render world. Now they are extracted into essentially `EntityHashMap<Entity, T>` where `T` are structs containing an appropriate group of data. This means that while extract set systems will continue to run extract queries against the main world they will store their data in hash maps. Also, systems in later sets will either need to look up entities in the available resources such as `RenderMeshInstances`, or maintain their own `EntityHashMap<Entity, T>` for their own data. Before: ```rust fn queue_custom( material_meshes: Query<(Entity, &MeshTransforms, &Handle<Mesh>), With<InstanceMaterialData>>, ) { ... for (entity, mesh_transforms, mesh_handle) in &material_meshes { ... } } ``` After: ```rust fn queue_custom( render_mesh_instances: Res<RenderMeshInstances>, instance_entities: Query<Entity, With<InstanceMaterialData>>, ) { ... for entity in &instance_entities { let Some(mesh_instance) = render_mesh_instances.get(&entity) else { continue; }; // The mesh handle in `AssetId<Mesh>` form, and the `MeshTransforms` can now // be found in `mesh_instance` which is a `RenderMeshInstance` ... } } ``` --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
2023-09-27 08:28:28 +00:00
let Some(mesh) = render_meshes.get(mesh_instance.mesh_asset_id) else {
continue;
};
let mut mesh_key =
Micro-optimize `queue_material_meshes`, primarily to remove bit manipulation. (#12791) This commit makes the following optimizations: ## `MeshPipelineKey`/`BaseMeshPipelineKey` split `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`. Conceptually, `BaseMeshPipelineKey` is a superclass of `MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the highest (most significant) bit and grow downward toward the lowest bit; for `MeshPipelineKey`, the bits start at the lowest bit and grow upward toward the highest bit. This prevents them from colliding. The goal of this is to avoid having to reassemble bits of the pipeline key for every mesh every frame. Instead, we can just use a bitwise or operation to combine the pieces that make up a `MeshPipelineKey`. ## `specialize_slow` Previously, all of `specialize()` was marked as `#[inline]`. This bloated `queue_material_meshes` unnecessarily, as a large chunk of it ended up being a slow path that was rarely hit. This commit refactors the function to move the slow path to `specialize_slow()`. Together, these two changes shave about 5% off `queue_material_meshes`: ![Screenshot 2024-03-29 130002](https://github.com/bevyengine/bevy/assets/157897/a7e5a994-a807-4328-b314-9003429dcdd2) ## Migration Guide - The `primitive_topology` field on `GpuMesh` is now an accessor method: `GpuMesh::primitive_topology()`. - For performance reasons, `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render`, and `MeshPipelineKey`, which lives in `bevy_pbr`. These two should be combined with bitwise-or to produce the final `MeshPipelineKey`.
2024-04-01 21:58:53 +00:00
light_key | MeshPipelineKey::from_bits_retain(mesh.key_bits.bits());
// Even though we don't use the lightmap in the shadow map, the
// `SetMeshBindGroup` render command will bind the data for it. So
// we need to include the appropriate flag in the mesh pipeline key
// to ensure that the necessary bind group layout entries are
// present.
if render_lightmaps.render_lightmaps.contains_key(&entity) {
mesh_key |= MeshPipelineKey::LIGHTMAPPED;
}
mesh_key |= match material.properties.alpha_mode {
AlphaMode::Mask(_)
| AlphaMode::Blend
| AlphaMode::Premultiplied
Implement alpha to coverage (A2C) support. (#12970) [Alpha to coverage] (A2C) replaces alpha blending with a hardware-specific multisample coverage mask when multisample antialiasing is in use. It's a simple form of [order-independent transparency] that relies on MSAA. ["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"] is a good summary of the motivation for and best practices relating to A2C. This commit implements alpha to coverage support as a new variant for `AlphaMode`. You can supply `AlphaMode::AlphaToCoverage` as the `alpha_mode` field in `StandardMaterial` to use it. When in use, the standard material shader automatically applies the texture filtering method from ["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"]. Objects with alpha-to-coverage materials are binned in the opaque pass, as they're fully order-independent. The `transparency_3d` example has been updated to feature an object with alpha to coverage. Happily, the example was already using MSAA. This is part of #2223, as far as I can tell. [Alpha to coverage]: https://en.wikipedia.org/wiki/Alpha_to_coverage [order-independent transparency]: https://en.wikipedia.org/wiki/Order-independent_transparency ["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"]: https://bgolus.medium.com/anti-aliased-alpha-test-the-esoteric-alpha-to-coverage-8b177335ae4f --- ## Changelog ### Added * The `AlphaMode` enum now supports `AlphaToCoverage`, to provide limited order-independent transparency when multisample antialiasing is in use.
2024-04-15 20:37:52 +00:00
| AlphaMode::Add
| AlphaMode::AlphaToCoverage => MeshPipelineKey::MAY_DISCARD,
_ => MeshPipelineKey::NONE,
};
let pipeline_id = pipelines.specialize(
&pipeline_cache,
&prepass_pipeline,
MaterialPipelineKey {
mesh_key,
bind_group_data: material.key.clone(),
},
&mesh.layout,
);
let pipeline_id = match pipeline_id {
Ok(id) => id,
Err(err) => {
error!("{}", err);
continue;
}
};
mesh_instance
.material_bind_group_id
.set(material.get_bind_group_id());
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
shadow_phase.add(
ShadowBinKey {
draw_function: draw_shadow_mesh,
pipeline: pipeline_id,
2024-06-27 16:13:03 +00:00
asset_id: mesh_instance.mesh_asset_id.into(),
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
},
entity,
2024-06-27 16:13:03 +00:00
BinnedRenderPhaseType::mesh(mesh_instance.should_batch()),
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
);
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
}
}
}
}
pub struct Shadow {
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
pub key: ShadowBinKey,
pub representative_entity: Entity,
Automatic batching/instancing of draw commands (#9685) # Objective - Implement the foundations of automatic batching/instancing of draw commands as the next step from #89 - NOTE: More performance improvements will come when more data is managed and bound in ways that do not require rebinding such as mesh, material, and texture data. ## Solution - The core idea for batching of draw commands is to check whether any of the information that has to be passed when encoding a draw command changes between two things that are being drawn according to the sorted render phase order. These should be things like the pipeline, bind groups and their dynamic offsets, index/vertex buffers, and so on. - The following assumptions have been made: - Only entities with prepared assets (pipelines, materials, meshes) are queued to phases - View bindings are constant across a phase for a given draw function as phases are per-view - `batch_and_prepare_render_phase` is the only system that performs this batching and has sole responsibility for preparing the per-object data. As such the mesh binding and dynamic offsets are assumed to only vary as a result of the `batch_and_prepare_render_phase` system, e.g. due to having to split data across separate uniform bindings within the same buffer due to the maximum uniform buffer binding size. - Implement `GpuArrayBuffer` for `Mesh2dUniform` to store Mesh2dUniform in arrays in GPU buffers rather than each one being at a dynamic offset in a uniform buffer. This is the same optimisation that was made for 3D not long ago. - Change batch size for a range in `PhaseItem`, adding API for getting or mutating the range. This is more flexible than a size as the length of the range can be used in place of the size, but the start and end can be otherwise whatever is needed. - Add an optional mesh bind group dynamic offset to `PhaseItem`. This avoids having to do a massive table move just to insert `GpuArrayBufferIndex` components. ## Benchmarks All tests have been run on an M1 Max on AC power. `bevymark` and `many_cubes` were modified to use 1920x1080 with a scale factor of 1. I run a script that runs a separate Tracy capture process, and then runs the bevy example with `--features bevy_ci_testing,trace_tracy` and `CI_TESTING_CONFIG=../benchmark.ron` with the contents of `../benchmark.ron`: ```rust ( exit_after: Some(1500) ) ``` ...in order to run each test for 1500 frames. The recent changes to `many_cubes` and `bevymark` added reproducible random number generation so that with the same settings, the same rng will occur. They also added benchmark modes that use a fixed delta time for animations. Combined this means that the same frames should be rendered both on main and on the branch. The graphs compare main (yellow) to this PR (red). ### 3D Mesh `many_cubes --benchmark` <img width="1411" alt="Screenshot 2023-09-03 at 23 42 10" src="https://github.com/bevyengine/bevy/assets/302146/2088716a-c918-486c-8129-090b26fd2bc4"> The mesh and material are the same for all instances. This is basically the best case for the initial batching implementation as it results in 1 draw for the ~11.7k visible meshes. It gives a ~30% reduction in median frame time. The 1000th frame is identical using the flip tool: ![flip many_cubes-main-mesh3d many_cubes-batching-mesh3d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2511f37a-6df8-481a-932f-706ca4de7643) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4615 seconds ``` ### 3D Mesh `many_cubes --benchmark --material-texture-count 10` <img width="1404" alt="Screenshot 2023-09-03 at 23 45 18" src="https://github.com/bevyengine/bevy/assets/302146/5ee9c447-5bd2-45c6-9706-ac5ff8916daf"> This run uses 10 different materials by varying their textures. The materials are randomly selected, and there is no sorting by material bind group for opaque 3D so any batching is 'random'. The PR produces a ~5% reduction in median frame time. If we were to sort the opaque phase by the material bind group, then this should be a lot faster. This produces about 10.5k draws for the 11.7k visible entities. This makes sense as randomly selecting from 10 materials gives a chance that two adjacent entities randomly select the same material and can be batched. The 1000th frame is identical in flip: ![flip many_cubes-main-mesh3d-mtc10 many_cubes-batching-mesh3d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2b3a8614-9466-4ed8-b50c-d4aa71615dbb) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4537 seconds ``` ### 3D Mesh `many_cubes --benchmark --vary-per-instance` <img width="1394" alt="Screenshot 2023-09-03 at 23 48 44" src="https://github.com/bevyengine/bevy/assets/302146/f02a816b-a444-4c18-a96a-63b5436f3b7f"> This run varies the material data per instance by randomly-generating its colour. This is the worst case for batching and that it performs about the same as `main` is a good thing as it demonstrates that the batching has minimal overhead when dealing with ~11k visible mesh entities. The 1000th frame is identical according to flip: ![flip many_cubes-main-mesh3d-vpi many_cubes-batching-mesh3d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ac5f5c14-9bda-4d1a-8219-7577d4aac68c) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4568 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` <img width="1412" alt="Screenshot 2023-09-03 at 23 59 56" src="https://github.com/bevyengine/bevy/assets/302146/cb02ae07-237b-4646-ae9f-fda4dafcbad4"> This spawns 160 waves of 1000 quad meshes that are shaded with ColorMaterial. Each wave has a different material so 160 waves currently should result in 160 batches. This results in a 50% reduction in median frame time. Capturing a screenshot of the 1000th frame main vs PR gives: ![flip bevymark-main-mesh2d bevymark-batching-mesh2d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/80102728-1217-4059-87af-14d05044df40) ``` Mean: 0.001222 Weighted median: 0.750432 1st weighted quartile: 0.453494 3rd weighted quartile: 0.969758 Min: 0.000000 Max: 0.990296 Evaluation time: 0.4255 seconds ``` So they seem to produce the same results. I also double-checked the number of draws. `main` does 160000 draws, and the PR does 160, as expected. ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --material-texture-count 10` <img width="1392" alt="Screenshot 2023-09-04 at 00 09 22" src="https://github.com/bevyengine/bevy/assets/302146/4358da2e-ce32-4134-82df-3ab74c40849c"> This generates 10 textures and generates materials for each of those and then selects one material per wave. The median frame time is reduced by 50%. Similar to the plain run above, this produces 160 draws on the PR and 160000 on `main` and the 1000th frame is identical (ignoring the fps counter text overlay). ![flip bevymark-main-mesh2d-mtc10 bevymark-batching-mesh2d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ebed2822-dce7-426a-858b-b77dc45b986f) ``` Mean: 0.002877 Weighted median: 0.964980 1st weighted quartile: 0.668871 3rd weighted quartile: 0.982749 Min: 0.000000 Max: 0.992377 Evaluation time: 0.4301 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --vary-per-instance` <img width="1396" alt="Screenshot 2023-09-04 at 00 13 53" src="https://github.com/bevyengine/bevy/assets/302146/b2198b18-3439-47ad-919a-cdabe190facb"> This creates unique materials per instance by randomly-generating the material's colour. This is the worst case for 2D batching. Somehow, this PR manages a 7% reduction in median frame time. Both main and this PR issue 160000 draws. The 1000th frame is the same: ![flip bevymark-main-mesh2d-vpi bevymark-batching-mesh2d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/a2ec471c-f576-4a36-a23b-b24b22578b97) ``` Mean: 0.001214 Weighted median: 0.937499 1st weighted quartile: 0.635467 3rd weighted quartile: 0.979085 Min: 0.000000 Max: 0.988971 Evaluation time: 0.4462 seconds ``` ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` <img width="1396" alt="Screenshot 2023-09-04 at 12 21 12" src="https://github.com/bevyengine/bevy/assets/302146/8b31e915-d6be-4cac-abf5-c6a4da9c3d43"> This just spawns 160 waves of 1000 sprites. There should be and is no notable difference between main and the PR. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --material-texture-count 10` <img width="1389" alt="Screenshot 2023-09-04 at 12 36 08" src="https://github.com/bevyengine/bevy/assets/302146/45fe8d6d-c901-4062-a349-3693dd044413"> This spawns the sprites selecting a texture at random per instance from the 10 generated textures. This has no significant change vs main and shouldn't. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --vary-per-instance` <img width="1401" alt="Screenshot 2023-09-04 at 12 29 52" src="https://github.com/bevyengine/bevy/assets/302146/762c5c60-352e-471f-8dbe-bbf10e24ebd6"> This sets the sprite colour as being unique per instance. This can still all be drawn using one batch. There should be no difference but the PR produces median frame times that are 4% higher. Investigation showed no clear sources of cost, rather a mix of give and take that should not happen. It seems like noise in the results. ### Summary | Benchmark | % change in median frame time | | ------------- | ------------- | | many_cubes | 🟩 -30% | | many_cubes 10 materials | 🟩 -5% | | many_cubes unique materials | 🟩 ~0% | | bevymark mesh2d | 🟩 -50% | | bevymark mesh2d 10 materials | 🟩 -50% | | bevymark mesh2d unique materials | 🟩 -7% | | bevymark sprite | 🟥 2% | | bevymark sprite 10 materials | 🟥 0.6% | | bevymark sprite unique materials | 🟥 4.1% | --- ## Changelog - Added: 2D and 3D mesh entities that share the same mesh and material (same textures, same data) are now batched into the same draw command for better performance. --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Nicola Papale <nico@nicopap.ch>
2023-09-21 22:12:34 +00:00
pub batch_range: Range<u32>,
Implement GPU frustum culling. (#12889) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
pub extra_index: PhaseItemExtraIndex,
2021-06-02 02:59:17 +00:00
}
2024-06-27 16:13:03 +00:00
/// Data used to bin each object in the shadow map phase.
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct ShadowBinKey {
/// The identifier of the render pipeline.
pub pipeline: CachedRenderPipelineId,
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
/// The function used to draw.
pub draw_function: DrawFunctionId,
2024-06-27 16:13:03 +00:00
/// The object.
pub asset_id: UntypedAssetId,
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
}
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
impl PhaseItem for Shadow {
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
#[inline]
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
fn entity(&self) -> Entity {
self.representative_entity
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
}
Allow unbatched render phases to use unstable sorts (#5049) # Objective Partially addresses #4291. Speed up the sort phase for unbatched render phases. ## Solution Split out one of the optimizations in #4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass. ## Performance This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change. On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction. ![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png) ## Future Work There were prior discussions to add support for faster radix sorts in #4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems. Another optimization included in #4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`. Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while. --- ## Changelog Added: `PhaseItem::sort` ## Migration Guide RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`). Co-authored-by: Federico Rinaldi <gisquerin@gmail.com> Co-authored-by: Robert Swain <robert.swain@gmail.com> Co-authored-by: colepoirier <colepoirier@gmail.com>
2022-06-23 10:52:49 +00:00
#[inline]
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
fn draw_function(&self) -> DrawFunctionId {
self.key.draw_function
Allow unbatched render phases to use unstable sorts (#5049) # Objective Partially addresses #4291. Speed up the sort phase for unbatched render phases. ## Solution Split out one of the optimizations in #4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass. ## Performance This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change. On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction. ![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png) ## Future Work There were prior discussions to add support for faster radix sorts in #4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems. Another optimization included in #4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`. Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while. --- ## Changelog Added: `PhaseItem::sort` ## Migration Guide RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`). Co-authored-by: Federico Rinaldi <gisquerin@gmail.com> Co-authored-by: Robert Swain <robert.swain@gmail.com> Co-authored-by: colepoirier <colepoirier@gmail.com>
2022-06-23 10:52:49 +00:00
}
Automatic batching/instancing of draw commands (#9685) # Objective - Implement the foundations of automatic batching/instancing of draw commands as the next step from #89 - NOTE: More performance improvements will come when more data is managed and bound in ways that do not require rebinding such as mesh, material, and texture data. ## Solution - The core idea for batching of draw commands is to check whether any of the information that has to be passed when encoding a draw command changes between two things that are being drawn according to the sorted render phase order. These should be things like the pipeline, bind groups and their dynamic offsets, index/vertex buffers, and so on. - The following assumptions have been made: - Only entities with prepared assets (pipelines, materials, meshes) are queued to phases - View bindings are constant across a phase for a given draw function as phases are per-view - `batch_and_prepare_render_phase` is the only system that performs this batching and has sole responsibility for preparing the per-object data. As such the mesh binding and dynamic offsets are assumed to only vary as a result of the `batch_and_prepare_render_phase` system, e.g. due to having to split data across separate uniform bindings within the same buffer due to the maximum uniform buffer binding size. - Implement `GpuArrayBuffer` for `Mesh2dUniform` to store Mesh2dUniform in arrays in GPU buffers rather than each one being at a dynamic offset in a uniform buffer. This is the same optimisation that was made for 3D not long ago. - Change batch size for a range in `PhaseItem`, adding API for getting or mutating the range. This is more flexible than a size as the length of the range can be used in place of the size, but the start and end can be otherwise whatever is needed. - Add an optional mesh bind group dynamic offset to `PhaseItem`. This avoids having to do a massive table move just to insert `GpuArrayBufferIndex` components. ## Benchmarks All tests have been run on an M1 Max on AC power. `bevymark` and `many_cubes` were modified to use 1920x1080 with a scale factor of 1. I run a script that runs a separate Tracy capture process, and then runs the bevy example with `--features bevy_ci_testing,trace_tracy` and `CI_TESTING_CONFIG=../benchmark.ron` with the contents of `../benchmark.ron`: ```rust ( exit_after: Some(1500) ) ``` ...in order to run each test for 1500 frames. The recent changes to `many_cubes` and `bevymark` added reproducible random number generation so that with the same settings, the same rng will occur. They also added benchmark modes that use a fixed delta time for animations. Combined this means that the same frames should be rendered both on main and on the branch. The graphs compare main (yellow) to this PR (red). ### 3D Mesh `many_cubes --benchmark` <img width="1411" alt="Screenshot 2023-09-03 at 23 42 10" src="https://github.com/bevyengine/bevy/assets/302146/2088716a-c918-486c-8129-090b26fd2bc4"> The mesh and material are the same for all instances. This is basically the best case for the initial batching implementation as it results in 1 draw for the ~11.7k visible meshes. It gives a ~30% reduction in median frame time. The 1000th frame is identical using the flip tool: ![flip many_cubes-main-mesh3d many_cubes-batching-mesh3d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2511f37a-6df8-481a-932f-706ca4de7643) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4615 seconds ``` ### 3D Mesh `many_cubes --benchmark --material-texture-count 10` <img width="1404" alt="Screenshot 2023-09-03 at 23 45 18" src="https://github.com/bevyengine/bevy/assets/302146/5ee9c447-5bd2-45c6-9706-ac5ff8916daf"> This run uses 10 different materials by varying their textures. The materials are randomly selected, and there is no sorting by material bind group for opaque 3D so any batching is 'random'. The PR produces a ~5% reduction in median frame time. If we were to sort the opaque phase by the material bind group, then this should be a lot faster. This produces about 10.5k draws for the 11.7k visible entities. This makes sense as randomly selecting from 10 materials gives a chance that two adjacent entities randomly select the same material and can be batched. The 1000th frame is identical in flip: ![flip many_cubes-main-mesh3d-mtc10 many_cubes-batching-mesh3d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2b3a8614-9466-4ed8-b50c-d4aa71615dbb) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4537 seconds ``` ### 3D Mesh `many_cubes --benchmark --vary-per-instance` <img width="1394" alt="Screenshot 2023-09-03 at 23 48 44" src="https://github.com/bevyengine/bevy/assets/302146/f02a816b-a444-4c18-a96a-63b5436f3b7f"> This run varies the material data per instance by randomly-generating its colour. This is the worst case for batching and that it performs about the same as `main` is a good thing as it demonstrates that the batching has minimal overhead when dealing with ~11k visible mesh entities. The 1000th frame is identical according to flip: ![flip many_cubes-main-mesh3d-vpi many_cubes-batching-mesh3d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ac5f5c14-9bda-4d1a-8219-7577d4aac68c) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4568 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` <img width="1412" alt="Screenshot 2023-09-03 at 23 59 56" src="https://github.com/bevyengine/bevy/assets/302146/cb02ae07-237b-4646-ae9f-fda4dafcbad4"> This spawns 160 waves of 1000 quad meshes that are shaded with ColorMaterial. Each wave has a different material so 160 waves currently should result in 160 batches. This results in a 50% reduction in median frame time. Capturing a screenshot of the 1000th frame main vs PR gives: ![flip bevymark-main-mesh2d bevymark-batching-mesh2d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/80102728-1217-4059-87af-14d05044df40) ``` Mean: 0.001222 Weighted median: 0.750432 1st weighted quartile: 0.453494 3rd weighted quartile: 0.969758 Min: 0.000000 Max: 0.990296 Evaluation time: 0.4255 seconds ``` So they seem to produce the same results. I also double-checked the number of draws. `main` does 160000 draws, and the PR does 160, as expected. ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --material-texture-count 10` <img width="1392" alt="Screenshot 2023-09-04 at 00 09 22" src="https://github.com/bevyengine/bevy/assets/302146/4358da2e-ce32-4134-82df-3ab74c40849c"> This generates 10 textures and generates materials for each of those and then selects one material per wave. The median frame time is reduced by 50%. Similar to the plain run above, this produces 160 draws on the PR and 160000 on `main` and the 1000th frame is identical (ignoring the fps counter text overlay). ![flip bevymark-main-mesh2d-mtc10 bevymark-batching-mesh2d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ebed2822-dce7-426a-858b-b77dc45b986f) ``` Mean: 0.002877 Weighted median: 0.964980 1st weighted quartile: 0.668871 3rd weighted quartile: 0.982749 Min: 0.000000 Max: 0.992377 Evaluation time: 0.4301 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --vary-per-instance` <img width="1396" alt="Screenshot 2023-09-04 at 00 13 53" src="https://github.com/bevyengine/bevy/assets/302146/b2198b18-3439-47ad-919a-cdabe190facb"> This creates unique materials per instance by randomly-generating the material's colour. This is the worst case for 2D batching. Somehow, this PR manages a 7% reduction in median frame time. Both main and this PR issue 160000 draws. The 1000th frame is the same: ![flip bevymark-main-mesh2d-vpi bevymark-batching-mesh2d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/a2ec471c-f576-4a36-a23b-b24b22578b97) ``` Mean: 0.001214 Weighted median: 0.937499 1st weighted quartile: 0.635467 3rd weighted quartile: 0.979085 Min: 0.000000 Max: 0.988971 Evaluation time: 0.4462 seconds ``` ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` <img width="1396" alt="Screenshot 2023-09-04 at 12 21 12" src="https://github.com/bevyengine/bevy/assets/302146/8b31e915-d6be-4cac-abf5-c6a4da9c3d43"> This just spawns 160 waves of 1000 sprites. There should be and is no notable difference between main and the PR. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --material-texture-count 10` <img width="1389" alt="Screenshot 2023-09-04 at 12 36 08" src="https://github.com/bevyengine/bevy/assets/302146/45fe8d6d-c901-4062-a349-3693dd044413"> This spawns the sprites selecting a texture at random per instance from the 10 generated textures. This has no significant change vs main and shouldn't. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --vary-per-instance` <img width="1401" alt="Screenshot 2023-09-04 at 12 29 52" src="https://github.com/bevyengine/bevy/assets/302146/762c5c60-352e-471f-8dbe-bbf10e24ebd6"> This sets the sprite colour as being unique per instance. This can still all be drawn using one batch. There should be no difference but the PR produces median frame times that are 4% higher. Investigation showed no clear sources of cost, rather a mix of give and take that should not happen. It seems like noise in the results. ### Summary | Benchmark | % change in median frame time | | ------------- | ------------- | | many_cubes | 🟩 -30% | | many_cubes 10 materials | 🟩 -5% | | many_cubes unique materials | 🟩 ~0% | | bevymark mesh2d | 🟩 -50% | | bevymark mesh2d 10 materials | 🟩 -50% | | bevymark mesh2d unique materials | 🟩 -7% | | bevymark sprite | 🟥 2% | | bevymark sprite 10 materials | 🟥 0.6% | | bevymark sprite unique materials | 🟥 4.1% | --- ## Changelog - Added: 2D and 3D mesh entities that share the same mesh and material (same textures, same data) are now batched into the same draw command for better performance. --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Nicola Papale <nico@nicopap.ch>
2023-09-21 22:12:34 +00:00
#[inline]
fn batch_range(&self) -> &Range<u32> {
&self.batch_range
}
#[inline]
fn batch_range_mut(&mut self) -> &mut Range<u32> {
&mut self.batch_range
}
#[inline]
Implement GPU frustum culling. (#12889) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
fn extra_index(&self) -> PhaseItemExtraIndex {
self.extra_index
Automatic batching/instancing of draw commands (#9685) # Objective - Implement the foundations of automatic batching/instancing of draw commands as the next step from #89 - NOTE: More performance improvements will come when more data is managed and bound in ways that do not require rebinding such as mesh, material, and texture data. ## Solution - The core idea for batching of draw commands is to check whether any of the information that has to be passed when encoding a draw command changes between two things that are being drawn according to the sorted render phase order. These should be things like the pipeline, bind groups and their dynamic offsets, index/vertex buffers, and so on. - The following assumptions have been made: - Only entities with prepared assets (pipelines, materials, meshes) are queued to phases - View bindings are constant across a phase for a given draw function as phases are per-view - `batch_and_prepare_render_phase` is the only system that performs this batching and has sole responsibility for preparing the per-object data. As such the mesh binding and dynamic offsets are assumed to only vary as a result of the `batch_and_prepare_render_phase` system, e.g. due to having to split data across separate uniform bindings within the same buffer due to the maximum uniform buffer binding size. - Implement `GpuArrayBuffer` for `Mesh2dUniform` to store Mesh2dUniform in arrays in GPU buffers rather than each one being at a dynamic offset in a uniform buffer. This is the same optimisation that was made for 3D not long ago. - Change batch size for a range in `PhaseItem`, adding API for getting or mutating the range. This is more flexible than a size as the length of the range can be used in place of the size, but the start and end can be otherwise whatever is needed. - Add an optional mesh bind group dynamic offset to `PhaseItem`. This avoids having to do a massive table move just to insert `GpuArrayBufferIndex` components. ## Benchmarks All tests have been run on an M1 Max on AC power. `bevymark` and `many_cubes` were modified to use 1920x1080 with a scale factor of 1. I run a script that runs a separate Tracy capture process, and then runs the bevy example with `--features bevy_ci_testing,trace_tracy` and `CI_TESTING_CONFIG=../benchmark.ron` with the contents of `../benchmark.ron`: ```rust ( exit_after: Some(1500) ) ``` ...in order to run each test for 1500 frames. The recent changes to `many_cubes` and `bevymark` added reproducible random number generation so that with the same settings, the same rng will occur. They also added benchmark modes that use a fixed delta time for animations. Combined this means that the same frames should be rendered both on main and on the branch. The graphs compare main (yellow) to this PR (red). ### 3D Mesh `many_cubes --benchmark` <img width="1411" alt="Screenshot 2023-09-03 at 23 42 10" src="https://github.com/bevyengine/bevy/assets/302146/2088716a-c918-486c-8129-090b26fd2bc4"> The mesh and material are the same for all instances. This is basically the best case for the initial batching implementation as it results in 1 draw for the ~11.7k visible meshes. It gives a ~30% reduction in median frame time. The 1000th frame is identical using the flip tool: ![flip many_cubes-main-mesh3d many_cubes-batching-mesh3d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2511f37a-6df8-481a-932f-706ca4de7643) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4615 seconds ``` ### 3D Mesh `many_cubes --benchmark --material-texture-count 10` <img width="1404" alt="Screenshot 2023-09-03 at 23 45 18" src="https://github.com/bevyengine/bevy/assets/302146/5ee9c447-5bd2-45c6-9706-ac5ff8916daf"> This run uses 10 different materials by varying their textures. The materials are randomly selected, and there is no sorting by material bind group for opaque 3D so any batching is 'random'. The PR produces a ~5% reduction in median frame time. If we were to sort the opaque phase by the material bind group, then this should be a lot faster. This produces about 10.5k draws for the 11.7k visible entities. This makes sense as randomly selecting from 10 materials gives a chance that two adjacent entities randomly select the same material and can be batched. The 1000th frame is identical in flip: ![flip many_cubes-main-mesh3d-mtc10 many_cubes-batching-mesh3d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2b3a8614-9466-4ed8-b50c-d4aa71615dbb) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4537 seconds ``` ### 3D Mesh `many_cubes --benchmark --vary-per-instance` <img width="1394" alt="Screenshot 2023-09-03 at 23 48 44" src="https://github.com/bevyengine/bevy/assets/302146/f02a816b-a444-4c18-a96a-63b5436f3b7f"> This run varies the material data per instance by randomly-generating its colour. This is the worst case for batching and that it performs about the same as `main` is a good thing as it demonstrates that the batching has minimal overhead when dealing with ~11k visible mesh entities. The 1000th frame is identical according to flip: ![flip many_cubes-main-mesh3d-vpi many_cubes-batching-mesh3d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ac5f5c14-9bda-4d1a-8219-7577d4aac68c) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4568 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` <img width="1412" alt="Screenshot 2023-09-03 at 23 59 56" src="https://github.com/bevyengine/bevy/assets/302146/cb02ae07-237b-4646-ae9f-fda4dafcbad4"> This spawns 160 waves of 1000 quad meshes that are shaded with ColorMaterial. Each wave has a different material so 160 waves currently should result in 160 batches. This results in a 50% reduction in median frame time. Capturing a screenshot of the 1000th frame main vs PR gives: ![flip bevymark-main-mesh2d bevymark-batching-mesh2d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/80102728-1217-4059-87af-14d05044df40) ``` Mean: 0.001222 Weighted median: 0.750432 1st weighted quartile: 0.453494 3rd weighted quartile: 0.969758 Min: 0.000000 Max: 0.990296 Evaluation time: 0.4255 seconds ``` So they seem to produce the same results. I also double-checked the number of draws. `main` does 160000 draws, and the PR does 160, as expected. ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --material-texture-count 10` <img width="1392" alt="Screenshot 2023-09-04 at 00 09 22" src="https://github.com/bevyengine/bevy/assets/302146/4358da2e-ce32-4134-82df-3ab74c40849c"> This generates 10 textures and generates materials for each of those and then selects one material per wave. The median frame time is reduced by 50%. Similar to the plain run above, this produces 160 draws on the PR and 160000 on `main` and the 1000th frame is identical (ignoring the fps counter text overlay). ![flip bevymark-main-mesh2d-mtc10 bevymark-batching-mesh2d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ebed2822-dce7-426a-858b-b77dc45b986f) ``` Mean: 0.002877 Weighted median: 0.964980 1st weighted quartile: 0.668871 3rd weighted quartile: 0.982749 Min: 0.000000 Max: 0.992377 Evaluation time: 0.4301 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --vary-per-instance` <img width="1396" alt="Screenshot 2023-09-04 at 00 13 53" src="https://github.com/bevyengine/bevy/assets/302146/b2198b18-3439-47ad-919a-cdabe190facb"> This creates unique materials per instance by randomly-generating the material's colour. This is the worst case for 2D batching. Somehow, this PR manages a 7% reduction in median frame time. Both main and this PR issue 160000 draws. The 1000th frame is the same: ![flip bevymark-main-mesh2d-vpi bevymark-batching-mesh2d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/a2ec471c-f576-4a36-a23b-b24b22578b97) ``` Mean: 0.001214 Weighted median: 0.937499 1st weighted quartile: 0.635467 3rd weighted quartile: 0.979085 Min: 0.000000 Max: 0.988971 Evaluation time: 0.4462 seconds ``` ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` <img width="1396" alt="Screenshot 2023-09-04 at 12 21 12" src="https://github.com/bevyengine/bevy/assets/302146/8b31e915-d6be-4cac-abf5-c6a4da9c3d43"> This just spawns 160 waves of 1000 sprites. There should be and is no notable difference between main and the PR. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --material-texture-count 10` <img width="1389" alt="Screenshot 2023-09-04 at 12 36 08" src="https://github.com/bevyengine/bevy/assets/302146/45fe8d6d-c901-4062-a349-3693dd044413"> This spawns the sprites selecting a texture at random per instance from the 10 generated textures. This has no significant change vs main and shouldn't. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --vary-per-instance` <img width="1401" alt="Screenshot 2023-09-04 at 12 29 52" src="https://github.com/bevyengine/bevy/assets/302146/762c5c60-352e-471f-8dbe-bbf10e24ebd6"> This sets the sprite colour as being unique per instance. This can still all be drawn using one batch. There should be no difference but the PR produces median frame times that are 4% higher. Investigation showed no clear sources of cost, rather a mix of give and take that should not happen. It seems like noise in the results. ### Summary | Benchmark | % change in median frame time | | ------------- | ------------- | | many_cubes | 🟩 -30% | | many_cubes 10 materials | 🟩 -5% | | many_cubes unique materials | 🟩 ~0% | | bevymark mesh2d | 🟩 -50% | | bevymark mesh2d 10 materials | 🟩 -50% | | bevymark mesh2d unique materials | 🟩 -7% | | bevymark sprite | 🟥 2% | | bevymark sprite 10 materials | 🟥 0.6% | | bevymark sprite unique materials | 🟥 4.1% | --- ## Changelog - Added: 2D and 3D mesh entities that share the same mesh and material (same textures, same data) are now batched into the same draw command for better performance. --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Nicola Papale <nico@nicopap.ch>
2023-09-21 22:12:34 +00:00
}
#[inline]
Implement GPU frustum culling. (#12889) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
fn batch_range_and_extra_index_mut(&mut self) -> (&mut Range<u32>, &mut PhaseItemExtraIndex) {
(&mut self.batch_range, &mut self.extra_index)
Automatic batching/instancing of draw commands (#9685) # Objective - Implement the foundations of automatic batching/instancing of draw commands as the next step from #89 - NOTE: More performance improvements will come when more data is managed and bound in ways that do not require rebinding such as mesh, material, and texture data. ## Solution - The core idea for batching of draw commands is to check whether any of the information that has to be passed when encoding a draw command changes between two things that are being drawn according to the sorted render phase order. These should be things like the pipeline, bind groups and their dynamic offsets, index/vertex buffers, and so on. - The following assumptions have been made: - Only entities with prepared assets (pipelines, materials, meshes) are queued to phases - View bindings are constant across a phase for a given draw function as phases are per-view - `batch_and_prepare_render_phase` is the only system that performs this batching and has sole responsibility for preparing the per-object data. As such the mesh binding and dynamic offsets are assumed to only vary as a result of the `batch_and_prepare_render_phase` system, e.g. due to having to split data across separate uniform bindings within the same buffer due to the maximum uniform buffer binding size. - Implement `GpuArrayBuffer` for `Mesh2dUniform` to store Mesh2dUniform in arrays in GPU buffers rather than each one being at a dynamic offset in a uniform buffer. This is the same optimisation that was made for 3D not long ago. - Change batch size for a range in `PhaseItem`, adding API for getting or mutating the range. This is more flexible than a size as the length of the range can be used in place of the size, but the start and end can be otherwise whatever is needed. - Add an optional mesh bind group dynamic offset to `PhaseItem`. This avoids having to do a massive table move just to insert `GpuArrayBufferIndex` components. ## Benchmarks All tests have been run on an M1 Max on AC power. `bevymark` and `many_cubes` were modified to use 1920x1080 with a scale factor of 1. I run a script that runs a separate Tracy capture process, and then runs the bevy example with `--features bevy_ci_testing,trace_tracy` and `CI_TESTING_CONFIG=../benchmark.ron` with the contents of `../benchmark.ron`: ```rust ( exit_after: Some(1500) ) ``` ...in order to run each test for 1500 frames. The recent changes to `many_cubes` and `bevymark` added reproducible random number generation so that with the same settings, the same rng will occur. They also added benchmark modes that use a fixed delta time for animations. Combined this means that the same frames should be rendered both on main and on the branch. The graphs compare main (yellow) to this PR (red). ### 3D Mesh `many_cubes --benchmark` <img width="1411" alt="Screenshot 2023-09-03 at 23 42 10" src="https://github.com/bevyengine/bevy/assets/302146/2088716a-c918-486c-8129-090b26fd2bc4"> The mesh and material are the same for all instances. This is basically the best case for the initial batching implementation as it results in 1 draw for the ~11.7k visible meshes. It gives a ~30% reduction in median frame time. The 1000th frame is identical using the flip tool: ![flip many_cubes-main-mesh3d many_cubes-batching-mesh3d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2511f37a-6df8-481a-932f-706ca4de7643) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4615 seconds ``` ### 3D Mesh `many_cubes --benchmark --material-texture-count 10` <img width="1404" alt="Screenshot 2023-09-03 at 23 45 18" src="https://github.com/bevyengine/bevy/assets/302146/5ee9c447-5bd2-45c6-9706-ac5ff8916daf"> This run uses 10 different materials by varying their textures. The materials are randomly selected, and there is no sorting by material bind group for opaque 3D so any batching is 'random'. The PR produces a ~5% reduction in median frame time. If we were to sort the opaque phase by the material bind group, then this should be a lot faster. This produces about 10.5k draws for the 11.7k visible entities. This makes sense as randomly selecting from 10 materials gives a chance that two adjacent entities randomly select the same material and can be batched. The 1000th frame is identical in flip: ![flip many_cubes-main-mesh3d-mtc10 many_cubes-batching-mesh3d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2b3a8614-9466-4ed8-b50c-d4aa71615dbb) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4537 seconds ``` ### 3D Mesh `many_cubes --benchmark --vary-per-instance` <img width="1394" alt="Screenshot 2023-09-03 at 23 48 44" src="https://github.com/bevyengine/bevy/assets/302146/f02a816b-a444-4c18-a96a-63b5436f3b7f"> This run varies the material data per instance by randomly-generating its colour. This is the worst case for batching and that it performs about the same as `main` is a good thing as it demonstrates that the batching has minimal overhead when dealing with ~11k visible mesh entities. The 1000th frame is identical according to flip: ![flip many_cubes-main-mesh3d-vpi many_cubes-batching-mesh3d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ac5f5c14-9bda-4d1a-8219-7577d4aac68c) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4568 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` <img width="1412" alt="Screenshot 2023-09-03 at 23 59 56" src="https://github.com/bevyengine/bevy/assets/302146/cb02ae07-237b-4646-ae9f-fda4dafcbad4"> This spawns 160 waves of 1000 quad meshes that are shaded with ColorMaterial. Each wave has a different material so 160 waves currently should result in 160 batches. This results in a 50% reduction in median frame time. Capturing a screenshot of the 1000th frame main vs PR gives: ![flip bevymark-main-mesh2d bevymark-batching-mesh2d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/80102728-1217-4059-87af-14d05044df40) ``` Mean: 0.001222 Weighted median: 0.750432 1st weighted quartile: 0.453494 3rd weighted quartile: 0.969758 Min: 0.000000 Max: 0.990296 Evaluation time: 0.4255 seconds ``` So they seem to produce the same results. I also double-checked the number of draws. `main` does 160000 draws, and the PR does 160, as expected. ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --material-texture-count 10` <img width="1392" alt="Screenshot 2023-09-04 at 00 09 22" src="https://github.com/bevyengine/bevy/assets/302146/4358da2e-ce32-4134-82df-3ab74c40849c"> This generates 10 textures and generates materials for each of those and then selects one material per wave. The median frame time is reduced by 50%. Similar to the plain run above, this produces 160 draws on the PR and 160000 on `main` and the 1000th frame is identical (ignoring the fps counter text overlay). ![flip bevymark-main-mesh2d-mtc10 bevymark-batching-mesh2d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ebed2822-dce7-426a-858b-b77dc45b986f) ``` Mean: 0.002877 Weighted median: 0.964980 1st weighted quartile: 0.668871 3rd weighted quartile: 0.982749 Min: 0.000000 Max: 0.992377 Evaluation time: 0.4301 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --vary-per-instance` <img width="1396" alt="Screenshot 2023-09-04 at 00 13 53" src="https://github.com/bevyengine/bevy/assets/302146/b2198b18-3439-47ad-919a-cdabe190facb"> This creates unique materials per instance by randomly-generating the material's colour. This is the worst case for 2D batching. Somehow, this PR manages a 7% reduction in median frame time. Both main and this PR issue 160000 draws. The 1000th frame is the same: ![flip bevymark-main-mesh2d-vpi bevymark-batching-mesh2d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/a2ec471c-f576-4a36-a23b-b24b22578b97) ``` Mean: 0.001214 Weighted median: 0.937499 1st weighted quartile: 0.635467 3rd weighted quartile: 0.979085 Min: 0.000000 Max: 0.988971 Evaluation time: 0.4462 seconds ``` ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` <img width="1396" alt="Screenshot 2023-09-04 at 12 21 12" src="https://github.com/bevyengine/bevy/assets/302146/8b31e915-d6be-4cac-abf5-c6a4da9c3d43"> This just spawns 160 waves of 1000 sprites. There should be and is no notable difference between main and the PR. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --material-texture-count 10` <img width="1389" alt="Screenshot 2023-09-04 at 12 36 08" src="https://github.com/bevyengine/bevy/assets/302146/45fe8d6d-c901-4062-a349-3693dd044413"> This spawns the sprites selecting a texture at random per instance from the 10 generated textures. This has no significant change vs main and shouldn't. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --vary-per-instance` <img width="1401" alt="Screenshot 2023-09-04 at 12 29 52" src="https://github.com/bevyengine/bevy/assets/302146/762c5c60-352e-471f-8dbe-bbf10e24ebd6"> This sets the sprite colour as being unique per instance. This can still all be drawn using one batch. There should be no difference but the PR produces median frame times that are 4% higher. Investigation showed no clear sources of cost, rather a mix of give and take that should not happen. It seems like noise in the results. ### Summary | Benchmark | % change in median frame time | | ------------- | ------------- | | many_cubes | 🟩 -30% | | many_cubes 10 materials | 🟩 -5% | | many_cubes unique materials | 🟩 ~0% | | bevymark mesh2d | 🟩 -50% | | bevymark mesh2d 10 materials | 🟩 -50% | | bevymark mesh2d unique materials | 🟩 -7% | | bevymark sprite | 🟥 2% | | bevymark sprite 10 materials | 🟥 0.6% | | bevymark sprite unique materials | 🟥 4.1% | --- ## Changelog - Added: 2D and 3D mesh entities that share the same mesh and material (same textures, same data) are now batched into the same draw command for better performance. --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Nicola Papale <nico@nicopap.ch>
2023-09-21 22:12:34 +00:00
}
Modular Rendering (#2831) This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts. To make rendering more modular, I made a number of changes: * Entities now drive rendering: * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering" * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815 * Reworked the `Draw` abstraction: * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key" * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems) * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem * Modular / Ergonomic `DrawFunctions` via `RenderCommands` * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations: ```rust pub type DrawPbr = ( SetPbrPipeline, SetMeshViewBindGroup<0>, SetStandardMaterialBindGroup<1>, SetTransformBindGroup<2>, DrawMesh, ); ``` * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function: ```rust type DrawCustom = ( SetCustomMaterialPipeline, SetMeshViewBindGroup<0>, SetTransformBindGroup<2>, DrawMesh, ); ``` * ExtractComponentPlugin and UniformComponentPlugin: * Simple, standardized ways to easily extract individual components and write them to GPU buffers * Ported PBR and Sprite rendering to the new primitives above. * Removed staging buffer from UniformVec in favor of direct Queue usage * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems). * Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark! * Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know! * RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers). * RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes. * Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
}
2021-06-02 02:59:17 +00:00
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
impl BinnedPhaseItem for Shadow {
type BinKey = ShadowBinKey;
#[inline]
fn new(
key: Self::BinKey,
representative_entity: Entity,
batch_range: Range<u32>,
Implement GPU frustum culling. (#12889) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
extra_index: PhaseItemExtraIndex,
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
) -> Self {
Shadow {
key,
representative_entity,
batch_range,
Implement GPU frustum culling. (#12889) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
extra_index,
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
}
}
}
impl CachedRenderPipelinePhaseItem for Shadow {
#[inline]
fn cached_pipeline(&self) -> CachedRenderPipelineId {
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
self.key.pipeline
}
}
2021-06-02 02:59:17 +00:00
pub struct ShadowPassNode {
Improve performance by binning together opaque items instead of sorting them. (#12453) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
main_view_query: QueryState<Read<ViewLightEntities>>,
view_light_query: QueryState<Read<ShadowView>>,
2021-06-02 02:59:17 +00:00
}
impl ShadowPassNode {
pub fn new(world: &mut World) -> Self {
Self {
main_view_query: QueryState::new(world),
view_light_query: QueryState::new(world),
}
}
}
impl Node for ShadowPassNode {
fn update(&mut self, world: &mut World) {
self.main_view_query.update_archetypes(world);
self.view_light_query.update_archetypes(world);
}
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
fn run<'w>(
2021-06-02 02:59:17 +00:00
&self,
graph: &mut RenderGraphContext,
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
render_context: &mut RenderContext<'w>,
world: &'w World,
2021-06-02 02:59:17 +00:00
) -> Result<(), NodeRunError> {
let diagnostics = render_context.diagnostic_recorder();
Make render graph slots optional for most cases (#8109) # Objective - Currently, the render graph slots are only used to pass the view_entity around. This introduces significant boilerplate for very little value. Instead of using slots for this, make the view_entity part of the `RenderGraphContext`. This also means we won't need to have `IN_VIEW` on every node and and we'll be able to use the default impl of `Node::input()`. ## Solution - Add `view_entity: Option<Entity>` to the `RenderGraphContext` - Update all nodes to use this instead of entity slot input --- ## Changelog - Add optional `view_entity` to `RenderGraphContext` ## Migration Guide You can now get the view_entity directly from the `RenderGraphContext`. When implementing the Node: ```rust // 0.10 struct FooNode; impl FooNode { const IN_VIEW: &'static str = "view"; } impl Node for FooNode { fn input(&self) -> Vec<SlotInfo> { vec![SlotInfo::new(Self::IN_VIEW, SlotType::Entity)] } fn run( &self, graph: &mut RenderGraphContext, // ... ) -> Result<(), NodeRunError> { let view_entity = graph.get_input_entity(Self::IN_VIEW)?; // ... Ok(()) } } // 0.11 struct FooNode; impl Node for FooNode { fn run( &self, graph: &mut RenderGraphContext, // ... ) -> Result<(), NodeRunError> { let view_entity = graph.view_entity(); // ... Ok(()) } } ``` When adding the node to the graph, you don't need to specify a slot_edge for the view_entity. ```rust // 0.10 let mut graph = RenderGraph::default(); graph.add_node(FooNode::NAME, node); let input_node_id = draw_2d_graph.set_input(vec![SlotInfo::new( graph::input::VIEW_ENTITY, SlotType::Entity, )]); graph.add_slot_edge( input_node_id, graph::input::VIEW_ENTITY, FooNode::NAME, FooNode::IN_VIEW, ); // add_node_edge ... // 0.11 let mut graph = RenderGraph::default(); graph.add_node(FooNode::NAME, node); // add_node_edge ... ``` ## Notes This PR paired with #8007 will help reduce a lot of annoying boilerplate with the render nodes. Depending on which one gets merged first. It will require a bit of clean up work to make both compatible. I tagged this as a breaking change, because using the old system to get the view_entity will break things because it's not a node input slot anymore. ## Notes for reviewers A lot of the diffs are just removing the slots in every nodes and graph creation. The important part is mostly in the graph_runner/CameraDriverNode.
2023-03-21 20:11:13 +00:00
let view_entity = graph.view_entity();
let Some(shadow_render_phases) = world.get_resource::<ViewBinnedRenderPhases<Shadow>>()
else {
return Ok(());
};
let time_span = diagnostics.time_span(render_context.command_encoder(), "shadows");
2021-07-02 01:05:20 +00:00
if let Ok(view_lights) = self.main_view_query.get_manual(world, view_entity) {
for view_light_entity in view_lights.lights.iter().copied() {
let Some(shadow_phase) = shadow_render_phases.get(&view_light_entity) else {
continue;
};
let view_light = self
.view_light_query
.get_manual(world, view_light_entity)
.unwrap();
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
let depth_stencil_attachment =
Some(view_light.depth_attachment.get_attachment(StoreOp::Store));
let diagnostics = render_context.diagnostic_recorder();
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
render_context.add_command_buffer_generation_task(move |render_device| {
#[cfg(feature = "trace")]
let _shadow_pass_span = info_span!("", "{}", view_light.pass_name).entered();
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
let mut command_encoder =
render_device.create_command_encoder(&CommandEncoderDescriptor {
label: Some("shadow_pass_command_encoder"),
});
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
let render_pass = command_encoder.begin_render_pass(&RenderPassDescriptor {
label: Some(&view_light.pass_name),
color_attachments: &[],
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
depth_stencil_attachment,
Update to wgpu 0.18 (#10266) # Objective Keep up to date with wgpu. ## Solution Update the wgpu version. Currently blocked on naga_oil updating to naga 0.14 and releasing a new version. 3d scenes (or maybe any scene with lighting?) currently don't render anything due to ``` error: naga_oil bug, please file a report: composer failed to build a valid header: Type [2] '' is invalid = Capability Capabilities(CUBE_ARRAY_TEXTURES) is required ``` I'm not sure what should be passed in for `wgpu::InstanceFlags`, or if we want to make the gles3minorversion configurable (might be useful for debugging?) Currently blocked on https://github.com/bevyengine/naga_oil/pull/63, and https://github.com/gfx-rs/wgpu/issues/4569 to be fixed upstream in wgpu first. ## Known issues Amd+windows+vulkan has issues with texture_binding_arrays (see the image [here](https://github.com/bevyengine/bevy/pull/10266#issuecomment-1819946278)), but that'll be fixed in the next wgpu/naga version, and you can just use dx12 as a workaround for now (Amd+linux mesa+vulkan texture_binding_arrays are fixed though). --- ## Changelog Updated wgpu to 0.18, naga to 0.14.2, and naga_oil to 0.11. - Windows desktop GL should now be less painful as it no longer requires Angle. - You can now toggle shader validation and debug information for debug and release builds using `WgpuSettings.instance_flags` and [InstanceFlags](https://docs.rs/wgpu/0.18.0/wgpu/struct.InstanceFlags.html) ## Migration Guide - `RenderPassDescriptor` `color_attachments` (as well as `RenderPassColorAttachment`, and `RenderPassDepthStencilAttachment`) now use `StoreOp::Store` or `StoreOp::Discard` instead of a `boolean` to declare whether or not they should be stored. - `RenderPassDescriptor` now have `timestamp_writes` and `occlusion_query_set` fields. These can safely be set to `None`. - `ComputePassDescriptor` now have a `timestamp_writes` field. This can be set to `None` for now. - See the [wgpu changelog](https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md#v0180-2023-10-25) for additional details
2023-12-14 02:45:47 +00:00
timestamp_writes: None,
occlusion_query_set: None,
});
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
let mut render_pass = TrackedRenderPass::new(&render_device, render_pass);
let pass_span =
diagnostics.pass_span(&mut render_pass, view_light.pass_name.clone());
if let Err(err) =
shadow_phase.render(&mut render_pass, world, view_light_entity)
{
error!("Error encountered while rendering the shadow phase {err:?}");
}
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
pass_span.end(&mut render_pass);
Multithreaded render command encoding (#9172) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives **~40 more fps** than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>
2024-02-09 07:35:35 +00:00
drop(render_pass);
command_encoder.finish()
});
}
2021-06-02 02:59:17 +00:00
}
time_span.end(render_context.command_encoder());
2021-06-02 02:59:17 +00:00
Ok(())
}
}