Split mesh shader files (#4867)
# Objective
- Split PBR and 2D mesh shaders into types and bindings to prepare the shaders to be more reusable.
- See #3969 for details. I'm doing this in multiple steps to make review easier.
---
## Changelog
- Changed: 2D and PBR mesh shaders are now split into types and bindings, the following shader imports are available: `bevy_pbr::mesh_view_types`, `bevy_pbr::mesh_view_bindings`, `bevy_pbr::mesh_types`, `bevy_pbr::mesh_bindings`, `bevy_sprite::mesh2d_view_types`, `bevy_sprite::mesh2d_view_bindings`, `bevy_sprite::mesh2d_types`, `bevy_sprite::mesh2d_bindings`
## Migration Guide
- In shaders for 3D meshes:
- `#import bevy_pbr::mesh_view_bind_group` -> `#import bevy_pbr::mesh_view_bindings`
- `#import bevy_pbr::mesh_struct` -> `#import bevy_pbr::mesh_types`
- NOTE: If you are using the mesh bind group at bind group index 2, you can remove those binding statements in your shader and just use `#import bevy_pbr::mesh_bindings` which itself imports the mesh types needed for the bindings.
- In shaders for 2D meshes:
- `#import bevy_sprite::mesh2d_view_bind_group` -> `#import bevy_sprite::mesh2d_view_bindings`
- `#import bevy_sprite::mesh2d_struct` -> `#import bevy_sprite::mesh2d_types`
- NOTE: If you are using the mesh2d bind group at bind group index 2, you can remove those binding statements in your shader and just use `#import bevy_sprite::mesh2d_bindings` which itself imports the mesh2d types needed for the bindings.
2022-05-31 23:23:25 +00:00
|
|
|
#import bevy_pbr::mesh_view_bindings
|
|
|
|
#import bevy_pbr::pbr_bindings
|
|
|
|
#import bevy_pbr::mesh_bindings
|
2021-06-28 22:36:50 +00:00
|
|
|
|
Separate out PBR lighting, shadows, clustered forward, and utils from pbr.wgsl (#4938)
# Objective
- Builds on top of #4901
- Separate out PBR lighting, shadows, clustered forward, and utils from `pbr.wgsl` as part of making the PBR code more reusable and extensible.
- See #3969 for details.
## Solution
- Add `bevy_pbr::utils`, `bevy_pbr::clustered_forward`, `bevy_pbr::lighting`, `bevy_pbr::shadows` shader imports exposing many shader functions for external use
- Split `PI`, `saturate()`, `hsv2rgb()`, and `random1D()` into `bevy_pbr::utils`
- Split clustered-forward-specific functions into `bevy_pbr::clustered_forward`, including moving the debug visualization code into a `cluster_debug_visualization()` function in that import
- Split PBR lighting functions into `bevy_pbr::lighting`
- Split shadow functions into `bevy_pbr::shadows`
---
## Changelog
- Added: `bevy_pbr::utils`, `bevy_pbr::clustered_forward`, `bevy_pbr::lighting`, `bevy_pbr::shadows` shader imports exposing many shader functions for external use
- Split `PI`, `saturate()`, `hsv2rgb()`, and `random1D()` into `bevy_pbr::utils`
- Split clustered-forward-specific functions into `bevy_pbr::clustered_forward`, including moving the debug visualization code into a `cluster_debug_visualization()` function in that import
- Split PBR lighting functions into `bevy_pbr::lighting`
- Split shadow functions into `bevy_pbr::shadows`
2022-06-14 00:58:30 +00:00
|
|
|
#import bevy_pbr::utils
|
|
|
|
#import bevy_pbr::clustered_forward
|
|
|
|
#import bevy_pbr::lighting
|
|
|
|
#import bevy_pbr::shadows
|
2021-12-14 23:42:35 +00:00
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
struct FragmentInput {
|
|
|
|
[[builtin(front_facing)]] is_front: bool;
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
[[builtin(position)]] frag_coord: vec4<f32>;
|
2021-06-29 23:56:45 +00:00
|
|
|
[[location(0)]] world_position: vec4<f32>;
|
|
|
|
[[location(1)]] world_normal: vec3<f32>;
|
|
|
|
[[location(2)]] uv: vec2<f32>;
|
2021-11-04 21:47:57 +00:00
|
|
|
#ifdef VERTEX_TANGENTS
|
|
|
|
[[location(3)]] world_tangent: vec4<f32>;
|
|
|
|
#endif
|
2022-05-05 00:46:32 +00:00
|
|
|
#ifdef VERTEX_COLORS
|
|
|
|
[[location(4)]] color: vec4<f32>;
|
Split mesh shader files (#4867)
# Objective
- Split PBR and 2D mesh shaders into types and bindings to prepare the shaders to be more reusable.
- See #3969 for details. I'm doing this in multiple steps to make review easier.
---
## Changelog
- Changed: 2D and PBR mesh shaders are now split into types and bindings, the following shader imports are available: `bevy_pbr::mesh_view_types`, `bevy_pbr::mesh_view_bindings`, `bevy_pbr::mesh_types`, `bevy_pbr::mesh_bindings`, `bevy_sprite::mesh2d_view_types`, `bevy_sprite::mesh2d_view_bindings`, `bevy_sprite::mesh2d_types`, `bevy_sprite::mesh2d_bindings`
## Migration Guide
- In shaders for 3D meshes:
- `#import bevy_pbr::mesh_view_bind_group` -> `#import bevy_pbr::mesh_view_bindings`
- `#import bevy_pbr::mesh_struct` -> `#import bevy_pbr::mesh_types`
- NOTE: If you are using the mesh bind group at bind group index 2, you can remove those binding statements in your shader and just use `#import bevy_pbr::mesh_bindings` which itself imports the mesh types needed for the bindings.
- In shaders for 2D meshes:
- `#import bevy_sprite::mesh2d_view_bind_group` -> `#import bevy_sprite::mesh2d_view_bindings`
- `#import bevy_sprite::mesh2d_struct` -> `#import bevy_sprite::mesh2d_types`
- NOTE: If you are using the mesh2d bind group at bind group index 2, you can remove those binding statements in your shader and just use `#import bevy_sprite::mesh2d_bindings` which itself imports the mesh2d types needed for the bindings.
2022-05-31 23:23:25 +00:00
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
[[stage(fragment)]]
|
|
|
|
fn fragment(in: FragmentInput) -> [[location(0)]] vec4<f32> {
|
|
|
|
var output_color: vec4<f32> = material.base_color;
|
2022-05-05 00:46:32 +00:00
|
|
|
#ifdef VERTEX_COLORS
|
|
|
|
output_color = output_color * in.color;
|
|
|
|
#endif
|
2021-08-25 19:44:20 +00:00
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_BASE_COLOR_TEXTURE_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
output_color = output_color * textureSample(base_color_texture, base_color_sampler, in.uv);
|
|
|
|
}
|
|
|
|
|
|
|
|
// // NOTE: Unlit bit not set means == 0 is true, so the true case is if lit
|
2021-08-25 19:44:20 +00:00
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_UNLIT_BIT) == 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
// TODO use .a for exposure compensation in HDR
|
|
|
|
var emissive: vec4<f32> = material.emissive;
|
2021-08-25 19:44:20 +00:00
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_EMISSIVE_TEXTURE_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
emissive = vec4<f32>(emissive.rgb * textureSample(emissive_texture, emissive_sampler, in.uv).rgb, 1.0);
|
|
|
|
}
|
|
|
|
|
|
|
|
// calculate non-linear roughness from linear perceptualRoughness
|
|
|
|
var metallic: f32 = material.metallic;
|
|
|
|
var perceptual_roughness: f32 = material.perceptual_roughness;
|
2021-08-25 19:44:20 +00:00
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_METALLIC_ROUGHNESS_TEXTURE_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
let metallic_roughness = textureSample(metallic_roughness_texture, metallic_roughness_sampler, in.uv);
|
|
|
|
// Sampling from GLTF standard channels for now
|
|
|
|
metallic = metallic * metallic_roughness.b;
|
|
|
|
perceptual_roughness = perceptual_roughness * metallic_roughness.g;
|
|
|
|
}
|
|
|
|
let roughness = perceptualRoughnessToRoughness(perceptual_roughness);
|
|
|
|
|
|
|
|
var occlusion: f32 = 1.0;
|
2021-08-25 19:44:20 +00:00
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_OCCLUSION_TEXTURE_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
occlusion = textureSample(occlusion_texture, occlusion_sampler, in.uv).r;
|
|
|
|
}
|
|
|
|
|
|
|
|
var N: vec3<f32> = normalize(in.world_normal);
|
|
|
|
|
2021-11-04 21:47:57 +00:00
|
|
|
#ifdef VERTEX_TANGENTS
|
|
|
|
#ifdef STANDARDMATERIAL_NORMAL_MAP
|
2022-05-31 22:53:54 +00:00
|
|
|
// NOTE: The mikktspace method of normal mapping explicitly requires that these NOT be
|
|
|
|
// normalized nor any Gram-Schmidt applied to ensure the vertex normal is orthogonal to the
|
|
|
|
// vertex tangent! Do not change this code unless you really know what you are doing.
|
|
|
|
// http://www.mikktspace.com/
|
|
|
|
var T: vec3<f32> = in.world_tangent.xyz;
|
|
|
|
var B: vec3<f32> = in.world_tangent.w * cross(N, T);
|
2021-11-04 21:47:57 +00:00
|
|
|
#endif
|
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
|
2021-08-25 19:44:20 +00:00
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_DOUBLE_SIDED_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
if (!in.is_front) {
|
|
|
|
N = -N;
|
2021-11-04 21:47:57 +00:00
|
|
|
#ifdef VERTEX_TANGENTS
|
|
|
|
#ifdef STANDARDMATERIAL_NORMAL_MAP
|
|
|
|
T = -T;
|
|
|
|
B = -B;
|
|
|
|
#endif
|
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-11-04 21:47:57 +00:00
|
|
|
#ifdef VERTEX_TANGENTS
|
|
|
|
#ifdef STANDARDMATERIAL_NORMAL_MAP
|
|
|
|
let TBN = mat3x3<f32>(T, B, N);
|
2022-03-15 22:26:46 +00:00
|
|
|
// Nt is the tangent-space normal.
|
|
|
|
var Nt: vec3<f32>;
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_TWO_COMPONENT_NORMAL_MAP) != 0u) {
|
|
|
|
// Only use the xy components and derive z for 2-component normal maps.
|
|
|
|
Nt = vec3<f32>(textureSample(normal_map_texture, normal_map_sampler, in.uv).rg * 2.0 - 1.0, 0.0);
|
|
|
|
Nt.z = sqrt(1.0 - Nt.x * Nt.x - Nt.y * Nt.y);
|
|
|
|
} else {
|
|
|
|
Nt = textureSample(normal_map_texture, normal_map_sampler, in.uv).rgb * 2.0 - 1.0;
|
|
|
|
}
|
2022-04-07 15:50:14 +00:00
|
|
|
// Normal maps authored for DirectX require flipping the y component
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_FLIP_NORMAL_MAP_Y) != 0u) {
|
|
|
|
Nt.y = -Nt.y;
|
|
|
|
}
|
2022-05-31 22:53:54 +00:00
|
|
|
// NOTE: The mikktspace method of normal mapping applies maps the tangent-space normal from
|
|
|
|
// the normal map texture in this way to be an EXACT inverse of how the normal map baker
|
|
|
|
// calculates the normal maps so there is no error introduced. Do not change this code
|
|
|
|
// unless you really know what you are doing.
|
|
|
|
// http://www.mikktspace.com/
|
|
|
|
N = normalize(Nt.x * T + Nt.y * B + Nt.z * N);
|
2021-11-04 21:47:57 +00:00
|
|
|
#endif
|
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
|
Add support for opaque, alpha mask, and alpha blend modes (#3072)
# Objective
Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.
## Solution
NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂
There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!
* An `AlphaMode` component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
* Two depth prepasses are added:
* Opaque does a plain vertex stage
* Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
* Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
* Three main passes are added:
* Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
* The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
* The opaque fragment shading is as before except that alpha is explicitly set to 1.0
* Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
* Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
* Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-11-16 03:03:27 +00:00
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_ALPHA_MODE_OPAQUE) != 0u) {
|
|
|
|
// NOTE: If rendering as opaque, alpha should be ignored so set to 1.0
|
|
|
|
output_color.a = 1.0;
|
2021-12-19 03:03:06 +00:00
|
|
|
} else if ((material.flags & STANDARD_MATERIAL_FLAGS_ALPHA_MODE_MASK) != 0u) {
|
Add support for opaque, alpha mask, and alpha blend modes (#3072)
# Objective
Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.
## Solution
NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂
There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!
* An `AlphaMode` component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
* Two depth prepasses are added:
* Opaque does a plain vertex stage
* Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
* Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
* Three main passes are added:
* Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
* The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
* The opaque fragment shading is as before except that alpha is explicitly set to 1.0
* Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
* Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
* Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-11-16 03:03:27 +00:00
|
|
|
if (output_color.a >= material.alpha_cutoff) {
|
|
|
|
// NOTE: If rendering as masked alpha and >= the cutoff, render as fully opaque
|
|
|
|
output_color.a = 1.0;
|
|
|
|
} else {
|
|
|
|
// NOTE: output_color.a < material.alpha_cutoff should not is not rendered
|
|
|
|
// NOTE: This and any other discards mean that early-z testing cannot be done!
|
|
|
|
discard;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
var V: vec3<f32>;
|
2021-12-14 23:42:35 +00:00
|
|
|
// If the projection is not orthographic
|
|
|
|
let is_orthographic = view.projection[3].w == 1.0;
|
|
|
|
if (is_orthographic) {
|
|
|
|
// Orthographic view vector
|
|
|
|
V = normalize(vec3<f32>(view.view_proj[0].z, view.view_proj[1].z, view.view_proj[2].z));
|
|
|
|
} else {
|
2021-06-28 22:36:50 +00:00
|
|
|
// Only valid for a perpective projection
|
|
|
|
V = normalize(view.world_position.xyz - in.world_position.xyz);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Neubelt and Pettineo 2013, "Crafting a Next-gen Material Pipeline for The Order: 1886"
|
|
|
|
let NdotV = max(dot(N, V), 0.0001);
|
|
|
|
|
|
|
|
// Remapping [0,1] reflectance to F0
|
|
|
|
// See https://google.github.io/filament/Filament.html#materialsystem/parameterization/remapping
|
|
|
|
let reflectance = material.reflectance;
|
|
|
|
let F0 = 0.16 * reflectance * reflectance * (1.0 - metallic) + output_color.rgb * metallic;
|
|
|
|
|
|
|
|
// Diffuse strength inversely related to metallicity
|
|
|
|
let diffuse_color = output_color.rgb * (1.0 - metallic);
|
|
|
|
|
|
|
|
let R = reflect(-V, N);
|
|
|
|
|
|
|
|
// accumulate color
|
|
|
|
var light_accum: vec3<f32> = vec3<f32>(0.0);
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
|
|
|
let view_z = dot(vec4<f32>(
|
|
|
|
view.inverse_view[0].z,
|
|
|
|
view.inverse_view[1].z,
|
|
|
|
view.inverse_view[2].z,
|
|
|
|
view.inverse_view[3].z
|
|
|
|
), in.world_position);
|
2021-12-14 23:42:35 +00:00
|
|
|
let cluster_index = fragment_cluster_index(in.frag_coord.xy, view_z, is_orthographic);
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
let offset_and_count = unpack_offset_and_count(cluster_index);
|
2022-04-07 16:16:35 +00:00
|
|
|
for (var i: u32 = offset_and_count[0]; i < offset_and_count[0] + offset_and_count[1]; i = i + 1u) {
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
let light_id = get_light_id(i);
|
|
|
|
let light = point_lights.data[light_id];
|
|
|
|
var shadow: f32 = 1.0;
|
2021-11-19 21:16:58 +00:00
|
|
|
if ((mesh.flags & MESH_FLAGS_SHADOW_RECEIVER_BIT) != 0u
|
2021-11-26 13:16:11 +00:00
|
|
|
&& (light.flags & POINT_LIGHT_FLAGS_SHADOWS_ENABLED_BIT) != 0u) {
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
shadow = fetch_point_shadow(light_id, in.world_position, in.world_normal);
|
2021-08-25 19:44:20 +00:00
|
|
|
}
|
2021-07-16 22:41:56 +00:00
|
|
|
let light_contrib = point_light(in.world_position.xyz, light, roughness, NdotV, N, V, R, F0, diffuse_color);
|
2021-07-08 02:49:33 +00:00
|
|
|
light_accum = light_accum + light_contrib * shadow;
|
|
|
|
}
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
|
|
|
let n_directional_lights = lights.n_directional_lights;
|
|
|
|
for (var i: u32 = 0u; i < n_directional_lights; i = i + 1u) {
|
2021-07-08 02:49:33 +00:00
|
|
|
let light = lights.directional_lights[i];
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
var shadow: f32 = 1.0;
|
2021-11-19 21:16:58 +00:00
|
|
|
if ((mesh.flags & MESH_FLAGS_SHADOW_RECEIVER_BIT) != 0u
|
2021-11-26 13:16:11 +00:00
|
|
|
&& (light.flags & DIRECTIONAL_LIGHT_FLAGS_SHADOWS_ENABLED_BIT) != 0u) {
|
2021-08-25 19:44:20 +00:00
|
|
|
shadow = fetch_directional_shadow(i, in.world_position, in.world_normal);
|
|
|
|
}
|
2021-07-08 02:49:33 +00:00
|
|
|
let light_contrib = directional_light(light, roughness, NdotV, N, V, R, F0, diffuse_color);
|
2021-06-28 22:36:50 +00:00
|
|
|
light_accum = light_accum + light_contrib * shadow;
|
|
|
|
}
|
|
|
|
|
|
|
|
let diffuse_ambient = EnvBRDFApprox(diffuse_color, 1.0, NdotV);
|
|
|
|
let specular_ambient = EnvBRDFApprox(F0, perceptual_roughness, NdotV);
|
|
|
|
|
|
|
|
output_color = vec4<f32>(
|
|
|
|
light_accum +
|
|
|
|
(diffuse_ambient + specular_ambient) * lights.ambient_color.rgb * occlusion +
|
|
|
|
emissive.rgb * output_color.a,
|
|
|
|
output_color.a);
|
|
|
|
|
Separate out PBR lighting, shadows, clustered forward, and utils from pbr.wgsl (#4938)
# Objective
- Builds on top of #4901
- Separate out PBR lighting, shadows, clustered forward, and utils from `pbr.wgsl` as part of making the PBR code more reusable and extensible.
- See #3969 for details.
## Solution
- Add `bevy_pbr::utils`, `bevy_pbr::clustered_forward`, `bevy_pbr::lighting`, `bevy_pbr::shadows` shader imports exposing many shader functions for external use
- Split `PI`, `saturate()`, `hsv2rgb()`, and `random1D()` into `bevy_pbr::utils`
- Split clustered-forward-specific functions into `bevy_pbr::clustered_forward`, including moving the debug visualization code into a `cluster_debug_visualization()` function in that import
- Split PBR lighting functions into `bevy_pbr::lighting`
- Split shadow functions into `bevy_pbr::shadows`
---
## Changelog
- Added: `bevy_pbr::utils`, `bevy_pbr::clustered_forward`, `bevy_pbr::lighting`, `bevy_pbr::shadows` shader imports exposing many shader functions for external use
- Split `PI`, `saturate()`, `hsv2rgb()`, and `random1D()` into `bevy_pbr::utils`
- Split clustered-forward-specific functions into `bevy_pbr::clustered_forward`, including moving the debug visualization code into a `cluster_debug_visualization()` function in that import
- Split PBR lighting functions into `bevy_pbr::lighting`
- Split shadow functions into `bevy_pbr::shadows`
2022-06-14 00:58:30 +00:00
|
|
|
output_color = cluster_debug_visualization(
|
|
|
|
output_color,
|
|
|
|
view_z,
|
|
|
|
is_orthographic,
|
|
|
|
offset_and_count,
|
|
|
|
cluster_index,
|
2021-12-14 23:42:35 +00:00
|
|
|
);
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
// tone_mapping
|
|
|
|
output_color = vec4<f32>(reinhard_luminance(output_color.rgb), output_color.a);
|
|
|
|
// Gamma correction.
|
|
|
|
// Not needed with sRGB buffer
|
|
|
|
// output_color.rgb = pow(output_color.rgb, vec3(1.0 / 2.2));
|
|
|
|
}
|
|
|
|
|
|
|
|
return output_color;
|
2021-07-01 23:48:55 +00:00
|
|
|
}
|