2021-06-28 22:36:50 +00:00
|
|
|
|
// From the Filament design doc
|
|
|
|
|
// https://google.github.io/filament/Filament.html#table_symbols
|
|
|
|
|
// Symbol Definition
|
|
|
|
|
// v View unit vector
|
|
|
|
|
// l Incident light unit vector
|
|
|
|
|
// n Surface normal unit vector
|
|
|
|
|
// h Half unit vector between l and v
|
|
|
|
|
// f BRDF
|
|
|
|
|
// f_d Diffuse component of a BRDF
|
|
|
|
|
// f_r Specular component of a BRDF
|
|
|
|
|
// α Roughness, remapped from using input perceptualRoughness
|
|
|
|
|
// σ Diffuse reflectance
|
|
|
|
|
// Ω Spherical domain
|
|
|
|
|
// f0 Reflectance at normal incidence
|
|
|
|
|
// f90 Reflectance at grazing angle
|
|
|
|
|
// χ+(a) Heaviside function (1 if a>0 and 0 otherwise)
|
|
|
|
|
// nior Index of refraction (IOR) of an interface
|
|
|
|
|
// ⟨n⋅l⟩ Dot product clamped to [0..1]
|
|
|
|
|
// ⟨a⟩ Saturated value (clamped to [0..1])
|
|
|
|
|
|
|
|
|
|
// The Bidirectional Reflectance Distribution Function (BRDF) describes the surface response of a standard material
|
|
|
|
|
// and consists of two components, the diffuse component (f_d) and the specular component (f_r):
|
|
|
|
|
// f(v,l) = f_d(v,l) + f_r(v,l)
|
|
|
|
|
//
|
|
|
|
|
// The form of the microfacet model is the same for diffuse and specular
|
|
|
|
|
// f_r(v,l) = f_d(v,l) = 1 / { |n⋅v||n⋅l| } ∫_Ω D(m,α) G(v,l,m) f_m(v,l,m) (v⋅m) (l⋅m) dm
|
|
|
|
|
//
|
|
|
|
|
// In which:
|
|
|
|
|
// D, also called the Normal Distribution Function (NDF) models the distribution of the microfacets
|
|
|
|
|
// G models the visibility (or occlusion or shadow-masking) of the microfacets
|
|
|
|
|
// f_m is the microfacet BRDF and differs between specular and diffuse components
|
|
|
|
|
//
|
|
|
|
|
// The above integration needs to be approximated.
|
|
|
|
|
|
Shader Imports. Decouple Mesh logic from PBR (#3137)
## Shader Imports
This adds "whole file" shader imports. These come in two flavors:
### Asset Path Imports
```rust
// /assets/shaders/custom.wgsl
#import "shaders/custom_material.wgsl"
[[stage(fragment)]]
fn fragment() -> [[location(0)]] vec4<f32> {
return get_color();
}
```
```rust
// /assets/shaders/custom_material.wgsl
[[block]]
struct CustomMaterial {
color: vec4<f32>;
};
[[group(1), binding(0)]]
var<uniform> material: CustomMaterial;
```
### Custom Path Imports
Enables defining custom import paths. These are intended to be used by crates to export shader functionality:
```rust
// bevy_pbr2/src/render/pbr.wgsl
#import bevy_pbr::mesh_view_bind_group
#import bevy_pbr::mesh_bind_group
[[block]]
struct StandardMaterial {
base_color: vec4<f32>;
emissive: vec4<f32>;
perceptual_roughness: f32;
metallic: f32;
reflectance: f32;
flags: u32;
};
/* rest of PBR fragment shader here */
```
```rust
impl Plugin for MeshRenderPlugin {
fn build(&self, app: &mut bevy_app::App) {
let mut shaders = app.world.get_resource_mut::<Assets<Shader>>().unwrap();
shaders.set_untracked(
MESH_BIND_GROUP_HANDLE,
Shader::from_wgsl(include_str!("mesh_bind_group.wgsl"))
.with_import_path("bevy_pbr::mesh_bind_group"),
);
shaders.set_untracked(
MESH_VIEW_BIND_GROUP_HANDLE,
Shader::from_wgsl(include_str!("mesh_view_bind_group.wgsl"))
.with_import_path("bevy_pbr::mesh_view_bind_group"),
);
```
By convention these should use rust-style module paths that start with the crate name. Ultimately we might enforce this convention.
Note that this feature implements _run time_ import resolution. Ultimately we should move the import logic into an asset preprocessor once Bevy gets support for that.
## Decouple Mesh Logic from PBR Logic via MeshRenderPlugin
This breaks out mesh rendering code from PBR material code, which improves the legibility of the code, decouples mesh logic from PBR logic, and opens the door for a future `MaterialPlugin<T: Material>` that handles all of the pipeline setup for arbitrary shader materials.
## Removed `RenderAsset<Shader>` in favor of extracting shaders into RenderPipelineCache
This simplifies the shader import implementation and removes the need to pass around `RenderAssets<Shader>`.
## RenderCommands are now fallible
This allows us to cleanly handle pipelines+shaders not being ready yet. We can abort a render command early in these cases, preventing bevy from trying to bind group / do draw calls for pipelines that couldn't be bound. This could also be used in the future for things like "components not existing on entities yet".
# Next Steps
* Investigate using Naga for "partial typed imports" (ex: `#import bevy_pbr::material::StandardMaterial`, which would import only the StandardMaterial struct)
* Implement `MaterialPlugin<T: Material>` for low-boilerplate custom material shaders
* Move shader import logic into the asset preprocessor once bevy gets support for that.
Fixes #3132
2021-11-18 03:45:02 +00:00
|
|
|
|
#import bevy_pbr::mesh_view_bind_group
|
|
|
|
|
#import bevy_pbr::mesh_struct
|
|
|
|
|
|
|
|
|
|
[[group(2), binding(0)]]
|
|
|
|
|
var<uniform> mesh: Mesh;
|
|
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
|
struct StandardMaterial {
|
|
|
|
|
base_color: vec4<f32>;
|
|
|
|
|
emissive: vec4<f32>;
|
|
|
|
|
perceptual_roughness: f32;
|
|
|
|
|
metallic: f32;
|
|
|
|
|
reflectance: f32;
|
2021-08-25 19:44:20 +00:00
|
|
|
|
// 'flags' is a bit field indicating various options. u32 is 32 bits so we have up to 32 options.
|
2021-06-28 22:36:50 +00:00
|
|
|
|
flags: u32;
|
Add support for opaque, alpha mask, and alpha blend modes (#3072)
# Objective
Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.
## Solution
NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂
There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!
* An `AlphaMode` component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
* Two depth prepasses are added:
* Opaque does a plain vertex stage
* Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
* Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
* Three main passes are added:
* Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
* The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
* The opaque fragment shading is as before except that alpha is explicitly set to 1.0
* Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
* Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
* Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-11-16 03:03:27 +00:00
|
|
|
|
alpha_cutoff: f32;
|
2021-06-28 22:36:50 +00:00
|
|
|
|
};
|
|
|
|
|
|
2021-08-25 19:44:20 +00:00
|
|
|
|
let STANDARD_MATERIAL_FLAGS_BASE_COLOR_TEXTURE_BIT: u32 = 1u;
|
|
|
|
|
let STANDARD_MATERIAL_FLAGS_EMISSIVE_TEXTURE_BIT: u32 = 2u;
|
|
|
|
|
let STANDARD_MATERIAL_FLAGS_METALLIC_ROUGHNESS_TEXTURE_BIT: u32 = 4u;
|
|
|
|
|
let STANDARD_MATERIAL_FLAGS_OCCLUSION_TEXTURE_BIT: u32 = 8u;
|
|
|
|
|
let STANDARD_MATERIAL_FLAGS_DOUBLE_SIDED_BIT: u32 = 16u;
|
|
|
|
|
let STANDARD_MATERIAL_FLAGS_UNLIT_BIT: u32 = 32u;
|
Add support for opaque, alpha mask, and alpha blend modes (#3072)
# Objective
Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.
## Solution
NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂
There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!
* An `AlphaMode` component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
* Two depth prepasses are added:
* Opaque does a plain vertex stage
* Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
* Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
* Three main passes are added:
* Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
* The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
* The opaque fragment shading is as before except that alpha is explicitly set to 1.0
* Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
* Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
* Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-11-16 03:03:27 +00:00
|
|
|
|
let STANDARD_MATERIAL_FLAGS_ALPHA_MODE_OPAQUE: u32 = 64u;
|
|
|
|
|
let STANDARD_MATERIAL_FLAGS_ALPHA_MODE_MASK: u32 = 128u;
|
|
|
|
|
let STANDARD_MATERIAL_FLAGS_ALPHA_MODE_BLEND: u32 = 256u;
|
2022-03-15 22:26:46 +00:00
|
|
|
|
let STANDARD_MATERIAL_FLAGS_TWO_COMPONENT_NORMAL_MAP: u32 = 512u;
|
2022-04-07 15:50:14 +00:00
|
|
|
|
let STANDARD_MATERIAL_FLAGS_FLIP_NORMAL_MAP_Y: u32 = 1024u;
|
2021-08-25 19:44:20 +00:00
|
|
|
|
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(0)]]
|
2021-10-08 19:55:24 +00:00
|
|
|
|
var<uniform> material: StandardMaterial;
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(1)]]
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var base_color_texture: texture_2d<f32>;
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(2)]]
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var base_color_sampler: sampler;
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(3)]]
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var emissive_texture: texture_2d<f32>;
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(4)]]
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var emissive_sampler: sampler;
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(5)]]
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var metallic_roughness_texture: texture_2d<f32>;
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(6)]]
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var metallic_roughness_sampler: sampler;
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(7)]]
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var occlusion_texture: texture_2d<f32>;
|
2021-08-25 20:10:43 +00:00
|
|
|
|
[[group(1), binding(8)]]
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var occlusion_sampler: sampler;
|
2021-11-04 21:47:57 +00:00
|
|
|
|
[[group(1), binding(9)]]
|
|
|
|
|
var normal_map_texture: texture_2d<f32>;
|
|
|
|
|
[[group(1), binding(10)]]
|
|
|
|
|
var normal_map_sampler: sampler;
|
2021-06-28 22:36:50 +00:00
|
|
|
|
|
|
|
|
|
let PI: f32 = 3.141592653589793;
|
|
|
|
|
|
|
|
|
|
fn saturate(value: f32) -> f32 {
|
|
|
|
|
return clamp(value, 0.0, 1.0);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// distanceAttenuation is simply the square falloff of light intensity
|
|
|
|
|
// combined with a smooth attenuation at the edge of the light radius
|
|
|
|
|
//
|
|
|
|
|
// light radius is a non-physical construct for efficiency purposes,
|
|
|
|
|
// because otherwise every light affects every fragment in the scene
|
|
|
|
|
fn getDistanceAttenuation(distanceSquare: f32, inverseRangeSquared: f32) -> f32 {
|
|
|
|
|
let factor = distanceSquare * inverseRangeSquared;
|
|
|
|
|
let smoothFactor = saturate(1.0 - factor * factor);
|
|
|
|
|
let attenuation = smoothFactor * smoothFactor;
|
|
|
|
|
return attenuation * 1.0 / max(distanceSquare, 0.0001);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Normal distribution function (specular D)
|
|
|
|
|
// Based on https://google.github.io/filament/Filament.html#citation-walter07
|
|
|
|
|
|
|
|
|
|
// D_GGX(h,α) = α^2 / { π ((n⋅h)^2 (α2−1) + 1)^2 }
|
|
|
|
|
|
|
|
|
|
// Simple implementation, has precision problems when using fp16 instead of fp32
|
|
|
|
|
// see https://google.github.io/filament/Filament.html#listing_speculardfp16
|
|
|
|
|
fn D_GGX(roughness: f32, NoH: f32, h: vec3<f32>) -> f32 {
|
|
|
|
|
let oneMinusNoHSquared = 1.0 - NoH * NoH;
|
|
|
|
|
let a = NoH * roughness;
|
|
|
|
|
let k = roughness / (oneMinusNoHSquared + a * a);
|
|
|
|
|
let d = k * k * (1.0 / PI);
|
|
|
|
|
return d;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Visibility function (Specular G)
|
|
|
|
|
// V(v,l,a) = G(v,l,α) / { 4 (n⋅v) (n⋅l) }
|
|
|
|
|
// such that f_r becomes
|
|
|
|
|
// f_r(v,l) = D(h,α) V(v,l,α) F(v,h,f0)
|
|
|
|
|
// where
|
|
|
|
|
// V(v,l,α) = 0.5 / { n⋅l sqrt((n⋅v)^2 (1−α2) + α2) + n⋅v sqrt((n⋅l)^2 (1−α2) + α2) }
|
|
|
|
|
// Note the two sqrt's, that may be slow on mobile, see https://google.github.io/filament/Filament.html#listing_approximatedspecularv
|
|
|
|
|
fn V_SmithGGXCorrelated(roughness: f32, NoV: f32, NoL: f32) -> f32 {
|
|
|
|
|
let a2 = roughness * roughness;
|
|
|
|
|
let lambdaV = NoL * sqrt((NoV - a2 * NoV) * NoV + a2);
|
|
|
|
|
let lambdaL = NoV * sqrt((NoL - a2 * NoL) * NoL + a2);
|
|
|
|
|
let v = 0.5 / (lambdaV + lambdaL);
|
|
|
|
|
return v;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Fresnel function
|
|
|
|
|
// see https://google.github.io/filament/Filament.html#citation-schlick94
|
|
|
|
|
// F_Schlick(v,h,f_0,f_90) = f_0 + (f_90 − f_0) (1 − v⋅h)^5
|
|
|
|
|
fn F_Schlick_vec(f0: vec3<f32>, f90: f32, VoH: f32) -> vec3<f32> {
|
|
|
|
|
// not using mix to keep the vec3 and float versions identical
|
|
|
|
|
return f0 + (f90 - f0) * pow(1.0 - VoH, 5.0);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fn F_Schlick(f0: f32, f90: f32, VoH: f32) -> f32 {
|
|
|
|
|
// not using mix to keep the vec3 and float versions identical
|
|
|
|
|
return f0 + (f90 - f0) * pow(1.0 - VoH, 5.0);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fn fresnel(f0: vec3<f32>, LoH: f32) -> vec3<f32> {
|
|
|
|
|
// f_90 suitable for ambient occlusion
|
|
|
|
|
// see https://google.github.io/filament/Filament.html#lighting/occlusion
|
|
|
|
|
let f90 = saturate(dot(f0, vec3<f32>(50.0 * 0.33)));
|
|
|
|
|
return F_Schlick_vec(f0, f90, LoH);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Specular BRDF
|
|
|
|
|
// https://google.github.io/filament/Filament.html#materialsystem/specularbrdf
|
|
|
|
|
|
|
|
|
|
// Cook-Torrance approximation of the microfacet model integration using Fresnel law F to model f_m
|
|
|
|
|
// f_r(v,l) = { D(h,α) G(v,l,α) F(v,h,f0) } / { 4 (n⋅v) (n⋅l) }
|
|
|
|
|
fn specular(f0: vec3<f32>, roughness: f32, h: vec3<f32>, NoV: f32, NoL: f32,
|
|
|
|
|
NoH: f32, LoH: f32, specularIntensity: f32) -> vec3<f32> {
|
|
|
|
|
let D = D_GGX(roughness, NoH, h);
|
|
|
|
|
let V = V_SmithGGXCorrelated(roughness, NoV, NoL);
|
|
|
|
|
let F = fresnel(f0, LoH);
|
|
|
|
|
|
|
|
|
|
return (specularIntensity * D * V) * F;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Diffuse BRDF
|
|
|
|
|
// https://google.github.io/filament/Filament.html#materialsystem/diffusebrdf
|
|
|
|
|
// fd(v,l) = σ/π * 1 / { |n⋅v||n⋅l| } ∫Ω D(m,α) G(v,l,m) (v⋅m) (l⋅m) dm
|
|
|
|
|
//
|
|
|
|
|
// simplest approximation
|
|
|
|
|
// float Fd_Lambert() {
|
|
|
|
|
// return 1.0 / PI;
|
|
|
|
|
// }
|
|
|
|
|
//
|
|
|
|
|
// vec3 Fd = diffuseColor * Fd_Lambert();
|
|
|
|
|
//
|
|
|
|
|
// Disney approximation
|
|
|
|
|
// See https://google.github.io/filament/Filament.html#citation-burley12
|
|
|
|
|
// minimal quality difference
|
|
|
|
|
fn Fd_Burley(roughness: f32, NoV: f32, NoL: f32, LoH: f32) -> f32 {
|
|
|
|
|
let f90 = 0.5 + 2.0 * roughness * LoH * LoH;
|
|
|
|
|
let lightScatter = F_Schlick(1.0, f90, NoL);
|
|
|
|
|
let viewScatter = F_Schlick(1.0, f90, NoV);
|
|
|
|
|
return lightScatter * viewScatter * (1.0 / PI);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// From https://www.unrealengine.com/en-US/blog/physically-based-shading-on-mobile
|
|
|
|
|
fn EnvBRDFApprox(f0: vec3<f32>, perceptual_roughness: f32, NoV: f32) -> vec3<f32> {
|
|
|
|
|
let c0 = vec4<f32>(-1.0, -0.0275, -0.572, 0.022);
|
|
|
|
|
let c1 = vec4<f32>(1.0, 0.0425, 1.04, -0.04);
|
|
|
|
|
let r = perceptual_roughness * c0 + c1;
|
|
|
|
|
let a004 = min(r.x * r.x, exp2(-9.28 * NoV)) * r.x + r.y;
|
|
|
|
|
let AB = vec2<f32>(-1.04, 1.04) * a004 + r.zw;
|
|
|
|
|
return f0 * AB.x + AB.y;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fn perceptualRoughnessToRoughness(perceptualRoughness: f32) -> f32 {
|
|
|
|
|
// clamp perceptual roughness to prevent precision problems
|
|
|
|
|
// According to Filament design 0.089 is recommended for mobile
|
|
|
|
|
// Filament uses 0.045 for non-mobile
|
|
|
|
|
let clampedPerceptualRoughness = clamp(perceptualRoughness, 0.089, 1.0);
|
|
|
|
|
return clampedPerceptualRoughness * clampedPerceptualRoughness;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// from https://64.github.io/tonemapping/
|
|
|
|
|
// reinhard on RGB oversaturates colors
|
|
|
|
|
fn reinhard(color: vec3<f32>) -> vec3<f32> {
|
|
|
|
|
return color / (1.0 + color);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fn reinhard_extended(color: vec3<f32>, max_white: f32) -> vec3<f32> {
|
2021-10-08 19:55:24 +00:00
|
|
|
|
let numerator = color * (1.0 + (color / vec3<f32>(max_white * max_white)));
|
2021-06-28 22:36:50 +00:00
|
|
|
|
return numerator / (1.0 + color);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// luminance coefficients from Rec. 709.
|
|
|
|
|
// https://en.wikipedia.org/wiki/Rec._709
|
|
|
|
|
fn luminance(v: vec3<f32>) -> f32 {
|
|
|
|
|
return dot(v, vec3<f32>(0.2126, 0.7152, 0.0722));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fn change_luminance(c_in: vec3<f32>, l_out: f32) -> vec3<f32> {
|
|
|
|
|
let l_in = luminance(c_in);
|
|
|
|
|
return c_in * (l_out / l_in);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fn reinhard_luminance(color: vec3<f32>) -> vec3<f32> {
|
|
|
|
|
let l_old = luminance(color);
|
2021-10-08 19:55:24 +00:00
|
|
|
|
let l_new = l_old / (1.0 + l_old);
|
2021-06-28 22:36:50 +00:00
|
|
|
|
return change_luminance(color, l_new);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fn reinhard_extended_luminance(color: vec3<f32>, max_white_l: f32) -> vec3<f32> {
|
|
|
|
|
let l_old = luminance(color);
|
2021-10-08 19:55:24 +00:00
|
|
|
|
let numerator = l_old * (1.0 + (l_old / (max_white_l * max_white_l)));
|
|
|
|
|
let l_new = numerator / (1.0 + l_old);
|
2021-06-28 22:36:50 +00:00
|
|
|
|
return change_luminance(color, l_new);
|
|
|
|
|
}
|
|
|
|
|
|
2022-04-15 02:53:20 +00:00
|
|
|
|
// NOTE: Keep in sync with bevy_pbr/src/light.rs
|
2021-12-14 23:42:35 +00:00
|
|
|
|
fn view_z_to_z_slice(view_z: f32, is_orthographic: bool) -> u32 {
|
2022-04-15 02:53:20 +00:00
|
|
|
|
var z_slice: u32 = 0u;
|
2021-12-14 23:42:35 +00:00
|
|
|
|
if (is_orthographic) {
|
|
|
|
|
// NOTE: view_z is correct in the orthographic case
|
2022-04-15 02:53:20 +00:00
|
|
|
|
z_slice = u32(floor((view_z - lights.cluster_factors.z) * lights.cluster_factors.w));
|
2021-12-14 23:42:35 +00:00
|
|
|
|
} else {
|
|
|
|
|
// NOTE: had to use -view_z to make it positive else log(negative) is nan
|
2022-04-15 02:53:20 +00:00
|
|
|
|
z_slice = u32(log(-view_z) * lights.cluster_factors.z - lights.cluster_factors.w + 1.0);
|
2021-12-14 23:42:35 +00:00
|
|
|
|
}
|
2022-04-15 02:53:20 +00:00
|
|
|
|
// NOTE: We use min as we may limit the far z plane used for clustering to be closeer than
|
|
|
|
|
// the furthest thing being drawn. This means that we need to limit to the maximum cluster.
|
|
|
|
|
return min(z_slice, lights.cluster_dimensions.z - 1u);
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
}
|
|
|
|
|
|
2021-12-14 23:42:35 +00:00
|
|
|
|
fn fragment_cluster_index(frag_coord: vec2<f32>, view_z: f32, is_orthographic: bool) -> u32 {
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let xy = vec2<u32>(floor(frag_coord * lights.cluster_factors.xy));
|
2021-12-14 23:42:35 +00:00
|
|
|
|
let z_slice = view_z_to_z_slice(view_z, is_orthographic);
|
2022-01-07 21:25:59 +00:00
|
|
|
|
// NOTE: Restricting cluster index to avoid undefined behavior when accessing uniform buffer
|
|
|
|
|
// arrays based on the cluster index.
|
|
|
|
|
return min(
|
|
|
|
|
(xy.y * lights.cluster_dimensions.x + xy.x) * lights.cluster_dimensions.z + z_slice,
|
|
|
|
|
lights.cluster_dimensions.w - 1u
|
|
|
|
|
);
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
}
|
|
|
|
|
|
2022-03-01 10:17:41 +00:00
|
|
|
|
// this must match CLUSTER_COUNT_SIZE in light.rs
|
|
|
|
|
let CLUSTER_COUNT_SIZE = 13u;
|
2022-04-07 16:16:35 +00:00
|
|
|
|
fn unpack_offset_and_count(cluster_index: u32) -> vec2<u32> {
|
|
|
|
|
#ifdef NO_STORAGE_BUFFERS_SUPPORT
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let offset_and_count = cluster_offsets_and_counts.data[cluster_index >> 2u][cluster_index & ((1u << 2u) - 1u)];
|
2022-04-07 16:16:35 +00:00
|
|
|
|
return vec2<u32>(
|
|
|
|
|
// The offset is stored in the upper 32 - CLUSTER_COUNT_SIZE = 19 bits
|
|
|
|
|
(offset_and_count >> CLUSTER_COUNT_SIZE) & ((1u << 32u - CLUSTER_COUNT_SIZE) - 1u),
|
|
|
|
|
// The count is stored in the lower CLUSTER_COUNT_SIZE = 13 bits
|
|
|
|
|
offset_and_count & ((1u << CLUSTER_COUNT_SIZE) - 1u)
|
|
|
|
|
);
|
|
|
|
|
#else
|
|
|
|
|
return cluster_offsets_and_counts.data[cluster_index];
|
|
|
|
|
#endif
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
fn get_light_id(index: u32) -> u32 {
|
2022-04-07 16:16:35 +00:00
|
|
|
|
#ifdef NO_STORAGE_BUFFERS_SUPPORT
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
// The index is correct but in cluster_light_index_lists we pack 4 u8s into a u32
|
|
|
|
|
// This means the index into cluster_light_index_lists is index / 4
|
|
|
|
|
let indices = cluster_light_index_lists.data[index >> 4u][(index >> 2u) & ((1u << 2u) - 1u)];
|
|
|
|
|
// And index % 4 gives the sub-index of the u8 within the u32 so we shift by 8 * sub-index
|
|
|
|
|
return (indices >> (8u * (index & ((1u << 2u) - 1u)))) & ((1u << 8u) - 1u);
|
2022-04-07 16:16:35 +00:00
|
|
|
|
#else
|
|
|
|
|
return cluster_light_index_lists.data[index];
|
|
|
|
|
#endif
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
}
|
|
|
|
|
|
2021-07-01 23:54:58 +00:00
|
|
|
|
fn point_light(
|
|
|
|
|
world_position: vec3<f32>, light: PointLight, roughness: f32, NdotV: f32, N: vec3<f32>, V: vec3<f32>,
|
2021-06-28 22:36:50 +00:00
|
|
|
|
R: vec3<f32>, F0: vec3<f32>, diffuseColor: vec3<f32>
|
|
|
|
|
) -> vec3<f32> {
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let light_to_frag = light.position_radius.xyz - world_position.xyz;
|
2021-06-28 22:36:50 +00:00
|
|
|
|
let distance_square = dot(light_to_frag, light_to_frag);
|
|
|
|
|
let rangeAttenuation =
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
getDistanceAttenuation(distance_square, light.color_inverse_square_range.w);
|
2021-06-28 22:36:50 +00:00
|
|
|
|
|
|
|
|
|
// Specular.
|
|
|
|
|
// Representative Point Area Lights.
|
|
|
|
|
// see http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_notes_v2.pdf p14-16
|
|
|
|
|
let a = roughness;
|
|
|
|
|
let centerToRay = dot(light_to_frag, R) * R - light_to_frag;
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let closestPoint = light_to_frag + centerToRay * saturate(light.position_radius.w * inverseSqrt(dot(centerToRay, centerToRay)));
|
2021-06-28 22:36:50 +00:00
|
|
|
|
let LspecLengthInverse = inverseSqrt(dot(closestPoint, closestPoint));
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let normalizationFactor = a / saturate(a + (light.position_radius.w * 0.5 * LspecLengthInverse));
|
2021-06-28 22:36:50 +00:00
|
|
|
|
let specularIntensity = normalizationFactor * normalizationFactor;
|
|
|
|
|
|
|
|
|
|
var L: vec3<f32> = closestPoint * LspecLengthInverse; // normalize() equivalent?
|
|
|
|
|
var H: vec3<f32> = normalize(L + V);
|
|
|
|
|
var NoL: f32 = saturate(dot(N, L));
|
|
|
|
|
var NoH: f32 = saturate(dot(N, H));
|
|
|
|
|
var LoH: f32 = saturate(dot(L, H));
|
|
|
|
|
|
|
|
|
|
let specular_light = specular(F0, roughness, H, NdotV, NoL, NoH, LoH, specularIntensity);
|
|
|
|
|
|
|
|
|
|
// Diffuse.
|
|
|
|
|
// Comes after specular since its NoL is used in the lighting equation.
|
|
|
|
|
L = normalize(light_to_frag);
|
|
|
|
|
H = normalize(L + V);
|
|
|
|
|
NoL = saturate(dot(N, L));
|
|
|
|
|
NoH = saturate(dot(N, H));
|
|
|
|
|
LoH = saturate(dot(L, H));
|
|
|
|
|
|
|
|
|
|
let diffuse = diffuseColor * Fd_Burley(roughness, NdotV, NoL, LoH);
|
|
|
|
|
|
bevy_pbr2: Improve lighting units and documentation (#2704)
# Objective
A question was raised on Discord about the units of the `PointLight` `intensity` member.
After digging around in the bevy_pbr2 source code and [Google Filament documentation](https://google.github.io/filament/Filament.html#mjx-eqn-pointLightLuminousPower) I discovered that the intention by Filament was that the 'intensity' value for point lights would be in lumens. This makes a lot of sense as these are quite relatable units given basically all light bulbs I've seen sold over the past years are rated in lumens as people move away from thinking about how bright a bulb is relative to a non-halogen incandescent bulb.
However, it seems that the derivation of the conversion between luminous power (lumens, denoted `Φ` in the Filament formulae) and luminous intensity (lumens per steradian, `I` in the Filament formulae) was missed and I can see why as it is tucked right under equation 58 at the link above. As such, while the formula states that for a point light, `I = Φ / 4 π` we have been using `intensity` as if it were luminous intensity `I`.
Before this PR, the intensity field is luminous intensity in lumens per steradian. After this PR, the intensity field is luminous power in lumens, [as suggested by Filament](https://google.github.io/filament/Filament.html#table_lighttypesunits) (unfortunately the link jumps to the table's caption so scroll up to see the actual table).
I appreciate that it may be confusing to call this an intensity, but I think this is intended as more of a non-scientific, human-relatable general term with a bit of hand waving so that most light types can just have an intensity field and for most of them it works in the same way or at least with some relatable value. I'm inclined to think this is reasonable rather than throwing terms like luminous power, luminous intensity, blah at users.
## Solution
- Documented the `PointLight` `intensity` member as 'luminous power' in units of lumens.
- Added a table of examples relating from various types of household lighting to lumen values.
- Added in the mapping from luminous power to luminous intensity when premultiplying the intensity into the colour before it is made into a graphics uniform.
- Updated the documentation in `pbr.wgsl` to clarify the earlier confusion about the missing `/ 4 π`.
- Bumped the intensity of the point lights in `3d_scene_pipelined` to 1600 lumens.
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-08-23 23:48:11 +00:00
|
|
|
|
// See https://google.github.io/filament/Filament.html#mjx-eqn-pointLightLuminanceEquation
|
2021-06-28 22:36:50 +00:00
|
|
|
|
// Lout = f(v,l) Φ / { 4 π d^2 }⟨n⋅l⟩
|
|
|
|
|
// where
|
|
|
|
|
// f(v,l) = (f_d(v,l) + f_r(v,l)) * light_color
|
bevy_pbr2: Improve lighting units and documentation (#2704)
# Objective
A question was raised on Discord about the units of the `PointLight` `intensity` member.
After digging around in the bevy_pbr2 source code and [Google Filament documentation](https://google.github.io/filament/Filament.html#mjx-eqn-pointLightLuminousPower) I discovered that the intention by Filament was that the 'intensity' value for point lights would be in lumens. This makes a lot of sense as these are quite relatable units given basically all light bulbs I've seen sold over the past years are rated in lumens as people move away from thinking about how bright a bulb is relative to a non-halogen incandescent bulb.
However, it seems that the derivation of the conversion between luminous power (lumens, denoted `Φ` in the Filament formulae) and luminous intensity (lumens per steradian, `I` in the Filament formulae) was missed and I can see why as it is tucked right under equation 58 at the link above. As such, while the formula states that for a point light, `I = Φ / 4 π` we have been using `intensity` as if it were luminous intensity `I`.
Before this PR, the intensity field is luminous intensity in lumens per steradian. After this PR, the intensity field is luminous power in lumens, [as suggested by Filament](https://google.github.io/filament/Filament.html#table_lighttypesunits) (unfortunately the link jumps to the table's caption so scroll up to see the actual table).
I appreciate that it may be confusing to call this an intensity, but I think this is intended as more of a non-scientific, human-relatable general term with a bit of hand waving so that most light types can just have an intensity field and for most of them it works in the same way or at least with some relatable value. I'm inclined to think this is reasonable rather than throwing terms like luminous power, luminous intensity, blah at users.
## Solution
- Documented the `PointLight` `intensity` member as 'luminous power' in units of lumens.
- Added a table of examples relating from various types of household lighting to lumen values.
- Added in the mapping from luminous power to luminous intensity when premultiplying the intensity into the colour before it is made into a graphics uniform.
- Updated the documentation in `pbr.wgsl` to clarify the earlier confusion about the missing `/ 4 π`.
- Bumped the intensity of the point lights in `3d_scene_pipelined` to 1600 lumens.
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-08-23 23:48:11 +00:00
|
|
|
|
// Φ is luminous power in lumens
|
2021-06-28 22:36:50 +00:00
|
|
|
|
// our rangeAttentuation = 1 / d^2 multiplied with an attenuation factor for smoothing at the edge of the non-physical maximum light radius
|
|
|
|
|
|
bevy_pbr2: Improve lighting units and documentation (#2704)
# Objective
A question was raised on Discord about the units of the `PointLight` `intensity` member.
After digging around in the bevy_pbr2 source code and [Google Filament documentation](https://google.github.io/filament/Filament.html#mjx-eqn-pointLightLuminousPower) I discovered that the intention by Filament was that the 'intensity' value for point lights would be in lumens. This makes a lot of sense as these are quite relatable units given basically all light bulbs I've seen sold over the past years are rated in lumens as people move away from thinking about how bright a bulb is relative to a non-halogen incandescent bulb.
However, it seems that the derivation of the conversion between luminous power (lumens, denoted `Φ` in the Filament formulae) and luminous intensity (lumens per steradian, `I` in the Filament formulae) was missed and I can see why as it is tucked right under equation 58 at the link above. As such, while the formula states that for a point light, `I = Φ / 4 π` we have been using `intensity` as if it were luminous intensity `I`.
Before this PR, the intensity field is luminous intensity in lumens per steradian. After this PR, the intensity field is luminous power in lumens, [as suggested by Filament](https://google.github.io/filament/Filament.html#table_lighttypesunits) (unfortunately the link jumps to the table's caption so scroll up to see the actual table).
I appreciate that it may be confusing to call this an intensity, but I think this is intended as more of a non-scientific, human-relatable general term with a bit of hand waving so that most light types can just have an intensity field and for most of them it works in the same way or at least with some relatable value. I'm inclined to think this is reasonable rather than throwing terms like luminous power, luminous intensity, blah at users.
## Solution
- Documented the `PointLight` `intensity` member as 'luminous power' in units of lumens.
- Added a table of examples relating from various types of household lighting to lumen values.
- Added in the mapping from luminous power to luminous intensity when premultiplying the intensity into the colour before it is made into a graphics uniform.
- Updated the documentation in `pbr.wgsl` to clarify the earlier confusion about the missing `/ 4 π`.
- Bumped the intensity of the point lights in `3d_scene_pipelined` to 1600 lumens.
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-08-23 23:48:11 +00:00
|
|
|
|
// For a point light, luminous intensity, I, in lumens per steradian is given by:
|
|
|
|
|
// I = Φ / 4 π
|
|
|
|
|
// The derivation of this can be seen here: https://google.github.io/filament/Filament.html#mjx-eqn-pointLightLuminousPower
|
|
|
|
|
|
|
|
|
|
// NOTE: light.color.rgb is premultiplied with light.intensity / 4 π (which would be the luminous intensity) on the CPU
|
|
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
|
// TODO compensate for energy loss https://google.github.io/filament/Filament.html#materialsystem/improvingthebrdfs/energylossinspecularreflectance
|
|
|
|
|
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
return ((diffuse + specular_light) * light.color_inverse_square_range.rgb) * (rangeAttenuation * NoL);
|
2021-06-28 22:36:50 +00:00
|
|
|
|
}
|
|
|
|
|
|
2021-07-08 02:49:33 +00:00
|
|
|
|
fn directional_light(light: DirectionalLight, roughness: f32, NdotV: f32, normal: vec3<f32>, view: vec3<f32>, R: vec3<f32>, F0: vec3<f32>, diffuseColor: vec3<f32>) -> vec3<f32> {
|
|
|
|
|
let incident_light = light.direction_to_light.xyz;
|
|
|
|
|
|
|
|
|
|
let half_vector = normalize(incident_light + view);
|
|
|
|
|
let NoL = saturate(dot(normal, incident_light));
|
|
|
|
|
let NoH = saturate(dot(normal, half_vector));
|
|
|
|
|
let LoH = saturate(dot(incident_light, half_vector));
|
|
|
|
|
|
|
|
|
|
let diffuse = diffuseColor * Fd_Burley(roughness, NdotV, NoL, LoH);
|
|
|
|
|
let specularIntensity = 1.0;
|
|
|
|
|
let specular_light = specular(F0, roughness, half_vector, NdotV, NoL, NoH, LoH, specularIntensity);
|
|
|
|
|
|
|
|
|
|
return (specular_light + diffuse) * light.color.rgb * NoL;
|
|
|
|
|
}
|
|
|
|
|
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
fn fetch_point_shadow(light_id: u32, frag_position: vec4<f32>, surface_normal: vec3<f32>) -> f32 {
|
|
|
|
|
let light = point_lights.data[light_id];
|
2021-07-01 23:48:55 +00:00
|
|
|
|
|
|
|
|
|
// because the shadow maps align with the axes and the frustum planes are at 45 degrees
|
|
|
|
|
// we can get the worldspace depth by taking the largest absolute axis
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let surface_to_light = light.position_radius.xyz - frag_position.xyz;
|
2021-07-19 19:20:59 +00:00
|
|
|
|
let surface_to_light_abs = abs(surface_to_light);
|
|
|
|
|
let distance_to_light = max(surface_to_light_abs.x, max(surface_to_light_abs.y, surface_to_light_abs.z));
|
|
|
|
|
|
|
|
|
|
// The normal bias here is already scaled by the texel size at 1 world unit from the light.
|
|
|
|
|
// The texel size increases proportionally with distance from the light so multiplying by
|
|
|
|
|
// distance to light scales the normal bias to the texel size at the fragment distance.
|
|
|
|
|
let normal_offset = light.shadow_normal_bias * distance_to_light * surface_normal.xyz;
|
|
|
|
|
let depth_offset = light.shadow_depth_bias * normalize(surface_to_light.xyz);
|
|
|
|
|
let offset_position = frag_position.xyz + normal_offset + depth_offset;
|
|
|
|
|
|
|
|
|
|
// similar largest-absolute-axis trick as above, but now with the offset fragment position
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let frag_ls = light.position_radius.xyz - offset_position.xyz;
|
2021-07-01 23:48:55 +00:00
|
|
|
|
let abs_position_ls = abs(frag_ls);
|
|
|
|
|
let major_axis_magnitude = max(abs_position_ls.x, max(abs_position_ls.y, abs_position_ls.z));
|
|
|
|
|
|
Use the infinite reverse right-handed perspective projection (#2543)
# Objective
Forward perspective projections have poor floating point precision distribution over the depth range. Reverse projections fair much better, and instead of having to have a far plane, with the reverse projection, using an infinite far plane is not a problem. The infinite reverse perspective projection has become the industry standard. The renderer rework is a great time to migrate to it.
## Solution
All perspective projections, including point lights, have been moved to using `glam::Mat4::perspective_infinite_reverse_rh()` and so have no far plane. As various depth textures are shared between orthographic and perspective projections, a quirk of this PR is that the near and far planes of the orthographic projection are swapped when the Mat4 is computed. This has no impact on 2D/3D orthographic projection usage, and provides consistency in shaders, texture clear values, etc. throughout the codebase.
## Known issues
For some reason, when looking along -Z, all geometry is black. The camera can be translated up/down / strafed left/right and geometry will still be black. Moving forward/backward or rotating the camera away from looking exactly along -Z causes everything to work as expected.
I have tried to debug this issue but both in macOS and Windows I get crashes when doing pixel debugging. If anyone could reproduce this and debug it I would be very grateful. Otherwise I will have to try to debug it further without pixel debugging, though the projections and such all looked fine to me.
2021-08-27 20:15:09 +00:00
|
|
|
|
// NOTE: These simplifications come from multiplying:
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
// projection * vec4(0, 0, -major_axis_magnitude, 1.0)
|
|
|
|
|
// and keeping only the terms that have any impact on the depth.
|
Use the infinite reverse right-handed perspective projection (#2543)
# Objective
Forward perspective projections have poor floating point precision distribution over the depth range. Reverse projections fair much better, and instead of having to have a far plane, with the reverse projection, using an infinite far plane is not a problem. The infinite reverse perspective projection has become the industry standard. The renderer rework is a great time to migrate to it.
## Solution
All perspective projections, including point lights, have been moved to using `glam::Mat4::perspective_infinite_reverse_rh()` and so have no far plane. As various depth textures are shared between orthographic and perspective projections, a quirk of this PR is that the near and far planes of the orthographic projection are swapped when the Mat4 is computed. This has no impact on 2D/3D orthographic projection usage, and provides consistency in shaders, texture clear values, etc. throughout the codebase.
## Known issues
For some reason, when looking along -Z, all geometry is black. The camera can be translated up/down / strafed left/right and geometry will still be black. Moving forward/backward or rotating the camera away from looking exactly along -Z causes everything to work as expected.
I have tried to debug this issue but both in macOS and Windows I get crashes when doing pixel debugging. If anyone could reproduce this and debug it I would be very grateful. Otherwise I will have to try to debug it further without pixel debugging, though the projections and such all looked fine to me.
2021-08-27 20:15:09 +00:00
|
|
|
|
// Projection-agnostic approach:
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let zw = -major_axis_magnitude * light.projection_lr.xy + light.projection_lr.zw;
|
|
|
|
|
let depth = zw.x / zw.y;
|
2021-07-01 23:48:55 +00:00
|
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
|
// do the lookup, using HW PCF and comparison
|
2021-07-01 19:46:11 +00:00
|
|
|
|
// NOTE: Due to the non-uniform control flow above, we must use the Level variant of
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
// textureSampleCompare to avoid undefined behaviour due to some of the fragments in
|
|
|
|
|
// a quad (2x2 fragments) being processed not being sampled, and this messing with
|
|
|
|
|
// mip-mapping functionality. The shadow maps have no mipmaps so Level just samples
|
|
|
|
|
// from LOD 0.
|
2021-12-22 20:59:48 +00:00
|
|
|
|
#ifdef NO_ARRAY_TEXTURES_SUPPORT
|
|
|
|
|
return textureSampleCompare(point_shadow_textures, point_shadow_textures_sampler, frag_ls, depth);
|
|
|
|
|
#else
|
2021-07-16 22:41:56 +00:00
|
|
|
|
return textureSampleCompareLevel(point_shadow_textures, point_shadow_textures_sampler, frag_ls, i32(light_id), depth);
|
2021-12-22 20:59:48 +00:00
|
|
|
|
#endif
|
2021-07-08 02:49:33 +00:00
|
|
|
|
}
|
|
|
|
|
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
fn fetch_directional_shadow(light_id: u32, frag_position: vec4<f32>, surface_normal: vec3<f32>) -> f32 {
|
2021-07-16 22:41:56 +00:00
|
|
|
|
let light = lights.directional_lights[light_id];
|
2021-07-19 19:20:59 +00:00
|
|
|
|
|
|
|
|
|
// The normal bias is scaled to the texel size.
|
|
|
|
|
let normal_offset = light.shadow_normal_bias * surface_normal.xyz;
|
|
|
|
|
let depth_offset = light.shadow_depth_bias * light.direction_to_light.xyz;
|
|
|
|
|
let offset_position = vec4<f32>(frag_position.xyz + normal_offset + depth_offset, frag_position.w);
|
|
|
|
|
|
|
|
|
|
let offset_position_clip = light.view_projection * offset_position;
|
|
|
|
|
if (offset_position_clip.w <= 0.0) {
|
|
|
|
|
return 1.0;
|
|
|
|
|
}
|
|
|
|
|
let offset_position_ndc = offset_position_clip.xyz / offset_position_clip.w;
|
|
|
|
|
// No shadow outside the orthographic projection volume
|
|
|
|
|
if (any(offset_position_ndc.xy < vec2<f32>(-1.0)) || offset_position_ndc.z < 0.0
|
|
|
|
|
|| any(offset_position_ndc > vec3<f32>(1.0))) {
|
2021-07-08 02:49:33 +00:00
|
|
|
|
return 1.0;
|
|
|
|
|
}
|
2021-07-19 19:20:59 +00:00
|
|
|
|
|
|
|
|
|
// compute texture coordinates for shadow lookup, compensating for the Y-flip difference
|
|
|
|
|
// between the NDC and texture coordinates
|
2021-07-08 02:49:33 +00:00
|
|
|
|
let flip_correction = vec2<f32>(0.5, -0.5);
|
2021-07-19 19:20:59 +00:00
|
|
|
|
let light_local = offset_position_ndc.xy * flip_correction + vec2<f32>(0.5, 0.5);
|
|
|
|
|
|
|
|
|
|
let depth = offset_position_ndc.z;
|
2021-07-08 02:49:33 +00:00
|
|
|
|
// do the lookup, using HW PCF and comparison
|
|
|
|
|
// NOTE: Due to non-uniform control flow above, we must use the level variant of the texture
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
// sampler to avoid use of implicit derivatives causing possible undefined behavior.
|
2021-12-22 20:59:48 +00:00
|
|
|
|
#ifdef NO_ARRAY_TEXTURES_SUPPORT
|
|
|
|
|
return textureSampleCompareLevel(directional_shadow_textures, directional_shadow_textures_sampler, light_local, depth);
|
|
|
|
|
#else
|
2021-07-19 19:20:59 +00:00
|
|
|
|
return textureSampleCompareLevel(directional_shadow_textures, directional_shadow_textures_sampler, light_local, i32(light_id), depth);
|
2021-12-22 20:59:48 +00:00
|
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
|
}
|
|
|
|
|
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
fn hsv2rgb(hue: f32, saturation: f32, value: f32) -> vec3<f32> {
|
|
|
|
|
let rgb = clamp(
|
|
|
|
|
abs(
|
|
|
|
|
((hue * 6.0 + vec3<f32>(0.0, 4.0, 2.0)) % 6.0) - 3.0
|
|
|
|
|
) - 1.0,
|
|
|
|
|
vec3<f32>(0.0),
|
|
|
|
|
vec3<f32>(1.0)
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
return value * mix( vec3<f32>(1.0), rgb, vec3<f32>(saturation));
|
|
|
|
|
}
|
|
|
|
|
|
2021-12-14 23:42:35 +00:00
|
|
|
|
fn random1D(s: f32) -> f32 {
|
|
|
|
|
return fract(sin(s * 12.9898) * 43758.5453123);
|
|
|
|
|
}
|
|
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
|
struct FragmentInput {
|
|
|
|
|
[[builtin(front_facing)]] is_front: bool;
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
[[builtin(position)]] frag_coord: vec4<f32>;
|
2021-06-29 23:56:45 +00:00
|
|
|
|
[[location(0)]] world_position: vec4<f32>;
|
|
|
|
|
[[location(1)]] world_normal: vec3<f32>;
|
|
|
|
|
[[location(2)]] uv: vec2<f32>;
|
2021-11-04 21:47:57 +00:00
|
|
|
|
#ifdef VERTEX_TANGENTS
|
|
|
|
|
[[location(3)]] world_tangent: vec4<f32>;
|
|
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
[[stage(fragment)]]
|
|
|
|
|
fn fragment(in: FragmentInput) -> [[location(0)]] vec4<f32> {
|
|
|
|
|
var output_color: vec4<f32> = material.base_color;
|
2021-08-25 19:44:20 +00:00
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_BASE_COLOR_TEXTURE_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
|
output_color = output_color * textureSample(base_color_texture, base_color_sampler, in.uv);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// // NOTE: Unlit bit not set means == 0 is true, so the true case is if lit
|
2021-08-25 19:44:20 +00:00
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_UNLIT_BIT) == 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
|
// TODO use .a for exposure compensation in HDR
|
|
|
|
|
var emissive: vec4<f32> = material.emissive;
|
2021-08-25 19:44:20 +00:00
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_EMISSIVE_TEXTURE_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
|
emissive = vec4<f32>(emissive.rgb * textureSample(emissive_texture, emissive_sampler, in.uv).rgb, 1.0);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// calculate non-linear roughness from linear perceptualRoughness
|
|
|
|
|
var metallic: f32 = material.metallic;
|
|
|
|
|
var perceptual_roughness: f32 = material.perceptual_roughness;
|
2021-08-25 19:44:20 +00:00
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_METALLIC_ROUGHNESS_TEXTURE_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
|
let metallic_roughness = textureSample(metallic_roughness_texture, metallic_roughness_sampler, in.uv);
|
|
|
|
|
// Sampling from GLTF standard channels for now
|
|
|
|
|
metallic = metallic * metallic_roughness.b;
|
|
|
|
|
perceptual_roughness = perceptual_roughness * metallic_roughness.g;
|
|
|
|
|
}
|
|
|
|
|
let roughness = perceptualRoughnessToRoughness(perceptual_roughness);
|
|
|
|
|
|
|
|
|
|
var occlusion: f32 = 1.0;
|
2021-08-25 19:44:20 +00:00
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_OCCLUSION_TEXTURE_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
|
occlusion = textureSample(occlusion_texture, occlusion_sampler, in.uv).r;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var N: vec3<f32> = normalize(in.world_normal);
|
|
|
|
|
|
2021-11-04 21:47:57 +00:00
|
|
|
|
#ifdef VERTEX_TANGENTS
|
|
|
|
|
#ifdef STANDARDMATERIAL_NORMAL_MAP
|
|
|
|
|
var T: vec3<f32> = normalize(in.world_tangent.xyz - N * dot(in.world_tangent.xyz, N));
|
|
|
|
|
var B: vec3<f32> = cross(N, T) * in.world_tangent.w;
|
|
|
|
|
#endif
|
|
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
|
|
2021-08-25 19:44:20 +00:00
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_DOUBLE_SIDED_BIT) != 0u) {
|
2021-06-28 22:36:50 +00:00
|
|
|
|
if (!in.is_front) {
|
|
|
|
|
N = -N;
|
2021-11-04 21:47:57 +00:00
|
|
|
|
#ifdef VERTEX_TANGENTS
|
|
|
|
|
#ifdef STANDARDMATERIAL_NORMAL_MAP
|
|
|
|
|
T = -T;
|
|
|
|
|
B = -B;
|
|
|
|
|
#endif
|
|
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-11-04 21:47:57 +00:00
|
|
|
|
#ifdef VERTEX_TANGENTS
|
|
|
|
|
#ifdef STANDARDMATERIAL_NORMAL_MAP
|
|
|
|
|
let TBN = mat3x3<f32>(T, B, N);
|
2022-03-15 22:26:46 +00:00
|
|
|
|
// Nt is the tangent-space normal.
|
|
|
|
|
var Nt: vec3<f32>;
|
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_TWO_COMPONENT_NORMAL_MAP) != 0u) {
|
|
|
|
|
// Only use the xy components and derive z for 2-component normal maps.
|
|
|
|
|
Nt = vec3<f32>(textureSample(normal_map_texture, normal_map_sampler, in.uv).rg * 2.0 - 1.0, 0.0);
|
|
|
|
|
Nt.z = sqrt(1.0 - Nt.x * Nt.x - Nt.y * Nt.y);
|
|
|
|
|
} else {
|
|
|
|
|
Nt = textureSample(normal_map_texture, normal_map_sampler, in.uv).rgb * 2.0 - 1.0;
|
|
|
|
|
}
|
2022-04-07 15:50:14 +00:00
|
|
|
|
// Normal maps authored for DirectX require flipping the y component
|
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_FLIP_NORMAL_MAP_Y) != 0u) {
|
|
|
|
|
Nt.y = -Nt.y;
|
|
|
|
|
}
|
2022-03-15 22:26:46 +00:00
|
|
|
|
N = normalize(TBN * Nt);
|
2021-11-04 21:47:57 +00:00
|
|
|
|
#endif
|
|
|
|
|
#endif
|
2021-06-28 22:36:50 +00:00
|
|
|
|
|
Add support for opaque, alpha mask, and alpha blend modes (#3072)
# Objective
Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.
## Solution
NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂
There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!
* An `AlphaMode` component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
* Two depth prepasses are added:
* Opaque does a plain vertex stage
* Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
* Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
* Three main passes are added:
* Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
* The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
* The opaque fragment shading is as before except that alpha is explicitly set to 1.0
* Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
* Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
* Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-11-16 03:03:27 +00:00
|
|
|
|
if ((material.flags & STANDARD_MATERIAL_FLAGS_ALPHA_MODE_OPAQUE) != 0u) {
|
|
|
|
|
// NOTE: If rendering as opaque, alpha should be ignored so set to 1.0
|
|
|
|
|
output_color.a = 1.0;
|
2021-12-19 03:03:06 +00:00
|
|
|
|
} else if ((material.flags & STANDARD_MATERIAL_FLAGS_ALPHA_MODE_MASK) != 0u) {
|
Add support for opaque, alpha mask, and alpha blend modes (#3072)
# Objective
Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.
## Solution
NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂
There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!
* An `AlphaMode` component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
* Two depth prepasses are added:
* Opaque does a plain vertex stage
* Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
* Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
* Three main passes are added:
* Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
* The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
* The opaque fragment shading is as before except that alpha is explicitly set to 1.0
* Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
* Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
* Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-11-16 03:03:27 +00:00
|
|
|
|
if (output_color.a >= material.alpha_cutoff) {
|
|
|
|
|
// NOTE: If rendering as masked alpha and >= the cutoff, render as fully opaque
|
|
|
|
|
output_color.a = 1.0;
|
|
|
|
|
} else {
|
|
|
|
|
// NOTE: output_color.a < material.alpha_cutoff should not is not rendered
|
|
|
|
|
// NOTE: This and any other discards mean that early-z testing cannot be done!
|
|
|
|
|
discard;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
|
var V: vec3<f32>;
|
2021-12-14 23:42:35 +00:00
|
|
|
|
// If the projection is not orthographic
|
|
|
|
|
let is_orthographic = view.projection[3].w == 1.0;
|
|
|
|
|
if (is_orthographic) {
|
|
|
|
|
// Orthographic view vector
|
|
|
|
|
V = normalize(vec3<f32>(view.view_proj[0].z, view.view_proj[1].z, view.view_proj[2].z));
|
|
|
|
|
} else {
|
2021-06-28 22:36:50 +00:00
|
|
|
|
// Only valid for a perpective projection
|
|
|
|
|
V = normalize(view.world_position.xyz - in.world_position.xyz);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Neubelt and Pettineo 2013, "Crafting a Next-gen Material Pipeline for The Order: 1886"
|
|
|
|
|
let NdotV = max(dot(N, V), 0.0001);
|
|
|
|
|
|
|
|
|
|
// Remapping [0,1] reflectance to F0
|
|
|
|
|
// See https://google.github.io/filament/Filament.html#materialsystem/parameterization/remapping
|
|
|
|
|
let reflectance = material.reflectance;
|
|
|
|
|
let F0 = 0.16 * reflectance * reflectance * (1.0 - metallic) + output_color.rgb * metallic;
|
|
|
|
|
|
|
|
|
|
// Diffuse strength inversely related to metallicity
|
|
|
|
|
let diffuse_color = output_color.rgb * (1.0 - metallic);
|
|
|
|
|
|
|
|
|
|
let R = reflect(-V, N);
|
|
|
|
|
|
|
|
|
|
// accumulate color
|
|
|
|
|
var light_accum: vec3<f32> = vec3<f32>(0.0);
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
|
|
|
|
|
let view_z = dot(vec4<f32>(
|
|
|
|
|
view.inverse_view[0].z,
|
|
|
|
|
view.inverse_view[1].z,
|
|
|
|
|
view.inverse_view[2].z,
|
|
|
|
|
view.inverse_view[3].z
|
|
|
|
|
), in.world_position);
|
2021-12-14 23:42:35 +00:00
|
|
|
|
let cluster_index = fragment_cluster_index(in.frag_coord.xy, view_z, is_orthographic);
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let offset_and_count = unpack_offset_and_count(cluster_index);
|
2022-04-07 16:16:35 +00:00
|
|
|
|
for (var i: u32 = offset_and_count[0]; i < offset_and_count[0] + offset_and_count[1]; i = i + 1u) {
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
let light_id = get_light_id(i);
|
|
|
|
|
let light = point_lights.data[light_id];
|
|
|
|
|
var shadow: f32 = 1.0;
|
2021-11-19 21:16:58 +00:00
|
|
|
|
if ((mesh.flags & MESH_FLAGS_SHADOW_RECEIVER_BIT) != 0u
|
2021-11-26 13:16:11 +00:00
|
|
|
|
&& (light.flags & POINT_LIGHT_FLAGS_SHADOWS_ENABLED_BIT) != 0u) {
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
shadow = fetch_point_shadow(light_id, in.world_position, in.world_normal);
|
2021-08-25 19:44:20 +00:00
|
|
|
|
}
|
2021-07-16 22:41:56 +00:00
|
|
|
|
let light_contrib = point_light(in.world_position.xyz, light, roughness, NdotV, N, V, R, F0, diffuse_color);
|
2021-07-08 02:49:33 +00:00
|
|
|
|
light_accum = light_accum + light_contrib * shadow;
|
|
|
|
|
}
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
|
|
|
|
|
let n_directional_lights = lights.n_directional_lights;
|
|
|
|
|
for (var i: u32 = 0u; i < n_directional_lights; i = i + 1u) {
|
2021-07-08 02:49:33 +00:00
|
|
|
|
let light = lights.directional_lights[i];
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
var shadow: f32 = 1.0;
|
2021-11-19 21:16:58 +00:00
|
|
|
|
if ((mesh.flags & MESH_FLAGS_SHADOW_RECEIVER_BIT) != 0u
|
2021-11-26 13:16:11 +00:00
|
|
|
|
&& (light.flags & DIRECTIONAL_LIGHT_FLAGS_SHADOWS_ENABLED_BIT) != 0u) {
|
2021-08-25 19:44:20 +00:00
|
|
|
|
shadow = fetch_directional_shadow(i, in.world_position, in.world_normal);
|
|
|
|
|
}
|
2021-07-08 02:49:33 +00:00
|
|
|
|
let light_contrib = directional_light(light, roughness, NdotV, N, V, R, F0, diffuse_color);
|
2021-06-28 22:36:50 +00:00
|
|
|
|
light_accum = light_accum + light_contrib * shadow;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
let diffuse_ambient = EnvBRDFApprox(diffuse_color, 1.0, NdotV);
|
|
|
|
|
let specular_ambient = EnvBRDFApprox(F0, perceptual_roughness, NdotV);
|
|
|
|
|
|
|
|
|
|
output_color = vec4<f32>(
|
|
|
|
|
light_accum +
|
|
|
|
|
(diffuse_ambient + specular_ambient) * lights.ambient_color.rgb * occlusion +
|
|
|
|
|
emissive.rgb * output_color.a,
|
|
|
|
|
output_color.a);
|
|
|
|
|
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
// Cluster allocation debug (using 'over' alpha blending)
|
2021-12-14 23:42:35 +00:00
|
|
|
|
#ifdef CLUSTERED_FORWARD_DEBUG_Z_SLICES
|
|
|
|
|
// NOTE: This debug mode visualises the z-slices
|
|
|
|
|
let cluster_overlay_alpha = 0.1;
|
|
|
|
|
var z_slice: u32 = view_z_to_z_slice(view_z, is_orthographic);
|
|
|
|
|
// A hack to make the colors alternate a bit more
|
|
|
|
|
if ((z_slice & 1u) == 1u) {
|
|
|
|
|
z_slice = z_slice + lights.cluster_dimensions.z / 2u;
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
}
|
2021-12-14 23:42:35 +00:00
|
|
|
|
let slice_color = hsv2rgb(f32(z_slice) / f32(lights.cluster_dimensions.z + 1u), 1.0, 0.5);
|
|
|
|
|
output_color = vec4<f32>(
|
|
|
|
|
(1.0 - cluster_overlay_alpha) * output_color.rgb + cluster_overlay_alpha * slice_color,
|
|
|
|
|
output_color.a
|
|
|
|
|
);
|
|
|
|
|
#endif // CLUSTERED_FORWARD_DEBUG_Z_SLICES
|
|
|
|
|
#ifdef CLUSTERED_FORWARD_DEBUG_CLUSTER_LIGHT_COMPLEXITY
|
|
|
|
|
// NOTE: This debug mode visualises the number of lights within the cluster that contains
|
|
|
|
|
// the fragment. It shows a sort of lighting complexity measure.
|
|
|
|
|
let cluster_overlay_alpha = 0.1;
|
|
|
|
|
let max_light_complexity_per_cluster = 64.0;
|
|
|
|
|
output_color.r = (1.0 - cluster_overlay_alpha) * output_color.r
|
2022-04-07 16:16:35 +00:00
|
|
|
|
+ cluster_overlay_alpha * smoothStep(0.0, max_light_complexity_per_cluster, f32(offset_and_count[1]));
|
2021-12-14 23:42:35 +00:00
|
|
|
|
output_color.g = (1.0 - cluster_overlay_alpha) * output_color.g
|
2022-04-07 16:16:35 +00:00
|
|
|
|
+ cluster_overlay_alpha * (1.0 - smoothStep(0.0, max_light_complexity_per_cluster, f32(offset_and_count[1])));
|
2021-12-14 23:42:35 +00:00
|
|
|
|
#endif // CLUSTERED_FORWARD_DEBUG_CLUSTER_LIGHT_COMPLEXITY
|
|
|
|
|
#ifdef CLUSTERED_FORWARD_DEBUG_CLUSTER_COHERENCY
|
|
|
|
|
// NOTE: Visualizes the cluster to which the fragment belongs
|
|
|
|
|
let cluster_overlay_alpha = 0.1;
|
|
|
|
|
let cluster_color = hsv2rgb(random1D(f32(cluster_index)), 1.0, 0.5);
|
|
|
|
|
output_color = vec4<f32>(
|
|
|
|
|
(1.0 - cluster_overlay_alpha) * output_color.rgb + cluster_overlay_alpha * cluster_color,
|
|
|
|
|
output_color.a
|
|
|
|
|
);
|
|
|
|
|
#endif // CLUSTERED_FORWARD_DEBUG_CLUSTER_COHERENCY
|
Clustered forward rendering (#3153)
# Objective
Implement clustered-forward rendering.
## Solution
~~FIXME - in the interest of keeping the merge train moving, I'm submitting this PR now before the description is ready. I want to add in some comments into the code with references for the various bits and pieces and I want to describe some of the key decisions I made here. I'll do that as soon as I can.~~ Anyone reviewing is welcome to add review comments where you want to know more about how something or other works.
* The summary of the technique is that the view frustum is divided into a grid of sub-volumes called clusters, point lights are tested against each of the clusters to see if they would affect that volume within the scene and if so, added to a list of lights affecting that cluster. Then when shading a fragment which is a point on the surface of a mesh within the scene, the point is mapped to a cluster and only the lights affecting that clusters are used in lighting calculations. This brings huge performance and scalability benefits as most of the time lights are placed so that there are not that many that overlap each other in terms of their sphere of influence, but there may be many distinct point lights visible in the scene. Doing all the lighting calculations for all visible lights in the scene for every pixel on the screen quickly becomes a performance limitation. Clustered forward rendering allows us to make an approximate list of lights that affect each pixel, indeed each surface in the scene (as it works along the view z axis too, unlike tiled/forward+).
* WebGL2 is a platform we want to support and it does not support storage buffers. Uniform buffer bindings are limited to a maximum of 16384 bytes per binding. I used bit shifting and masking to pack the cluster light lists and various indices into a uniform buffer and the 16kB limit is very likely the first bottleneck in scaling the number of lights in a scene at the moment if the lights can affect many clusters due to their range or proximity to the camera (there are a lot of clusters close to the camera, which is an area for improvement). We could store the information in textures instead of uniform buffers to remove this bottleneck though I don’t know if there are performance implications to reading from textures instead if uniform buffers.
* Because of the uniform buffer binding size limitations we can support a maximum of 256 lights with the current size of the PointLight struct
* The z-slicing method (i.e. the mapping from view space z to a depth slice which defines the near and far planes of a cluster) is using the Doom 2016 method. I need to add comments with references to this. It’s an exponential function that simplifies well for the purposes of optimising the fragment shader. xy grid divisions are regular in screen space.
* Some optimisation work was done on the allocation of lights to clusters, which involves intersection tests, and for this number of clusters and lights the system has insignificant cost using a fairly naïve algorithm. I think for more lights / finer-grained clusters we could use a BVH, but at some point it would be just much better to use compute shaders and storage buffers.
* Something else to note is that it is absolutely infeasible to use plain cube map point light shadow mapping for many lights. It does not scale in terms of performance nor memory usage. There are some interesting methods I saw discussed in reference material that I will add a link to which render and update shadow maps piece-wise, but they also need compute shaders to work well. Basically for now you need to sacrifice point light shadows for all but a handful of point lights if you don’t want to kill performance. I set the limit to 10 but that’s just what we had from before where 10 was the maximum number of point lights before this PR.
* I added a couple of debug visualisations behind a shader def that were useful for seeing performance impact of light distribution - I should make the debug mode configurable without modifying the shader code. One mode shows the number of lights affecting each cluster by tinting toward red for few lights or green for many lights (maxes out at 16, but not sure that’s a reasonable max). The other shows which cluster the surface at a fragment belongs to by tinting it with a randomish colour. This can help to understand deeper performance issues due to screen space tiles spanning multiple clusters in depth with divergent shader execution times.
Also, there are more things that could be done as improvements, and I will document those somewhere (I'm not sure where will be the best place... in a todo alongside the code, a GitHub issue, somewhere else?) but I think it works well enough and brings significant performance and scalability benefits that it's worth integrating already now and then iterating on.
* Calculate the light’s effective range based on its intensity and physical falloff and either just use this, or take the minimum of the user-supplied range and this. This would avoid unnecessary lighting calculations for clusters that cannot be affected. This would need to take into account HDR tone mapping as in my not-fully-understanding-the-details understanding, the threshold is relative to how bright the scene is.
* Improve the z-slicing to use a larger first slice.
* More gracefully handle the cluster light list uniform buffer binding size limitations by prioritising which lights are included (some heuristic for most significant like closest to the camera, brightest, affecting the most pixels, …)
* Switch to using a texture instead of uniform buffer
* Figure out the / a better story for shadows
I will also probably add an example that demonstrates some of the issues:
* What situations exhaust the space available in the uniform buffers
* Light range too large making lights affect many clusters and so exhausting the space for the lists of lights that affect clusters
* Light range set to be too small producing visible artifacts where clusters the light would physically affect are not affected by the light
* Perhaps some performance issues
* How many lights can be closely packed or affect large portions of the view before performance drops?
2021-12-09 03:08:54 +00:00
|
|
|
|
|
2021-06-28 22:36:50 +00:00
|
|
|
|
// tone_mapping
|
|
|
|
|
output_color = vec4<f32>(reinhard_luminance(output_color.rgb), output_color.a);
|
|
|
|
|
// Gamma correction.
|
|
|
|
|
// Not needed with sRGB buffer
|
|
|
|
|
// output_color.rgb = pow(output_color.rgb, vec3(1.0 / 2.2));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return output_color;
|
2021-07-01 23:48:55 +00:00
|
|
|
|
}
|