bevy/crates/bevy_render/src/maths.wgsl

96 lines
2.9 KiB
WebGPU Shading Language
Raw Normal View History

#define_import_path bevy_render::maths
const PI: f32 = 3.141592653589793; // π
const PI_2: f32 = 6.283185307179586; // 2π
const HALF_PI: f32 = 1.57079632679; // π/2
const FRAC_PI_3: f32 = 1.0471975512; // π/3
const E: f32 = 2.718281828459045; // exp(1)
Add support for KHR_texture_transform (#11904) Adopted #8266, so copy-pasting the description from there: # Objective Support the KHR_texture_transform extension for the glTF loader. - Fixes #6335 - Fixes #11869 - Implements part of #11350 - Implements the GLTF part of #399 ## Solution As is, this only supports a single transform. Looking at Godot's source, they support one transform with an optional second one for detail, AO, and emission. glTF specifies one per texture. The public domain materials I looked at seem to share the same transform. So maybe having just one is acceptable for now. I tried to include a warning if multiple different transforms exist for the same material. Note the gltf crate doesn't expose the texture transform for the normal and occlusion textures, which it should, so I just ignored those for now. (note by @janhohenheim: this is still the case) Via `cargo run --release --example scene_viewer ~/src/clone/glTF-Sample-Models/2.0/TextureTransformTest/glTF/TextureTransformTest.gltf`: ![texture_transform](https://user-images.githubusercontent.com/283864/228938298-aa2ef524-555b-411d-9637-fd0dac226fb0.png) ## Changelog Support for the [KHR_texture_transform](https://github.com/KhronosGroup/glTF/tree/main/extensions/2.0/Khronos/KHR_texture_transform) extension added. Texture UVs that were scaled, rotated, or offset in a GLTF are now properly handled. --------- Co-authored-by: Al McElrath <hello@yrns.org> Co-authored-by: Kanabenki <lucien.menassol@gmail.com>
2024-02-21 01:11:28 +00:00
fn affine2_to_square(affine: mat3x2<f32>) -> mat3x3<f32> {
return mat3x3<f32>(
vec3<f32>(affine[0].xy, 0.0),
vec3<f32>(affine[1].xy, 0.0),
vec3<f32>(affine[2].xy, 1.0),
);
}
fn affine3_to_square(affine: mat3x4<f32>) -> mat4x4<f32> {
return transpose(mat4x4<f32>(
affine[0],
affine[1],
affine[2],
vec4<f32>(0.0, 0.0, 0.0, 1.0),
));
}
fn mat2x4_f32_to_mat3x3_unpack(
a: mat2x4<f32>,
b: f32,
) -> mat3x3<f32> {
return mat3x3<f32>(
a[0].xyz,
vec3<f32>(a[0].w, a[1].xy),
vec3<f32>(a[1].zw, b),
);
}
Generate `MeshUniform`s on the GPU via compute shader where available. (#12773) Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a *mesh input uniform* and adding a *mesh uniform building* compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the *extraction* phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. | Benchmark | This branch | `main` | Speedup | |------------------------|-------------|---------|---------| | `many_cubes -nfc` | 17.259 | 24.529 | 42.12% | | `many_cubes -nfc -vpi` | 302.116 | 312.123 | 3.31% | | `many_foxes` | 3.227 | 3.515 | 8.92% | Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`: ![Screenshot 2024-04-02 124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10) It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`: ![Screenshot 2024-04-02 124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94) There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`: ![Screenshot 2024-04-02 125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435) There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`: ![Screenshot 2024-04-02 125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2) There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`.
2024-04-10 05:33:32 +00:00
// Extracts the square portion of an affine matrix: i.e. discards the
// translation.
fn affine3_to_mat3x3(affine: mat4x3<f32>) -> mat3x3<f32> {
return mat3x3<f32>(affine[0].xyz, affine[1].xyz, affine[2].xyz);
}
// Returns the inverse of a 3x3 matrix.
fn inverse_mat3x3(matrix: mat3x3<f32>) -> mat3x3<f32> {
let tmp0 = cross(matrix[1], matrix[2]);
let tmp1 = cross(matrix[2], matrix[0]);
let tmp2 = cross(matrix[0], matrix[1]);
let inv_det = 1.0 / dot(matrix[2], tmp2);
return transpose(mat3x3<f32>(tmp0 * inv_det, tmp1 * inv_det, tmp2 * inv_det));
}
// Returns the inverse of an affine matrix.
//
// https://en.wikipedia.org/wiki/Affine_transformation#Groups
fn inverse_affine3(affine: mat4x3<f32>) -> mat4x3<f32> {
let matrix3 = affine3_to_mat3x3(affine);
let inv_matrix3 = inverse_mat3x3(matrix3);
return mat4x3<f32>(inv_matrix3[0], inv_matrix3[1], inv_matrix3[2], -(inv_matrix3 * affine[3]));
}
Implement GPU frustum culling. (#12889) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
// Extracts the upper 3x3 portion of a 4x4 matrix.
fn mat4x4_to_mat3x3(m: mat4x4<f32>) -> mat3x3<f32> {
return mat3x3<f32>(m[0].xyz, m[1].xyz, m[2].xyz);
}
// Creates an orthonormal basis given a Z vector and an up vector (which becomes
// Y after orthonormalization).
//
// The results are equivalent to the Gram-Schmidt process [1].
//
// [1]: https://math.stackexchange.com/a/1849294
fn orthonormalize(z_unnormalized: vec3<f32>, up: vec3<f32>) -> mat3x3<f32> {
let z_basis = normalize(z_unnormalized);
let x_basis = normalize(cross(z_basis, up));
let y_basis = cross(z_basis, x_basis);
return mat3x3(x_basis, y_basis, z_basis);
}
Implement GPU frustum culling. (#12889) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
// Returns true if any part of a sphere is on the positive side of a plane.
//
// `sphere_center.w` should be 1.0.
//
// This is used for frustum culling.
fn sphere_intersects_plane_half_space(
plane: vec4<f32>,
sphere_center: vec4<f32>,
sphere_radius: f32
) -> bool {
return dot(plane, sphere_center) + sphere_radius > 0.0;
}
add tonemapping LUT bindings for sprite and mesh2d pipelines (#13262) Fixes #13118 If you use `Sprite` or `Mesh2d` and create `Camera` with * hdr=false * any tonemapper You would get ``` wgpu error: Validation Error Caused by: In Device::create_render_pipeline note: label = `sprite_pipeline` Error matching ShaderStages(FRAGMENT) shader requirements against the pipeline Shader global ResourceBinding { group: 0, binding: 19 } is not available in the pipeline layout Binding is missing from the pipeline layout ``` Because of missing tonemapping LUT bindings ## Solution Add missing bindings for tonemapping LUT's to `SpritePipeline` & `Mesh2dPipeline` ## Testing I checked that * `tonemapping` * `color_grading` * `sprite_animations` * `2d_shapes` * `meshlet` * `deferred_rendering` examples are still working 2d cases I checked with this code: ``` use bevy::{ color::palettes::css::PURPLE, core_pipeline::tonemapping::Tonemapping, prelude::*, sprite::MaterialMesh2dBundle, }; fn main() { App::new() .add_plugins(DefaultPlugins) .add_systems(Startup, setup) .add_systems(Update, toggle_tonemapping_method) .run(); } fn setup( mut commands: Commands, mut meshes: ResMut<Assets<Mesh>>, mut materials: ResMut<Assets<ColorMaterial>>, asset_server: Res<AssetServer>, ) { commands.spawn(Camera2dBundle { camera: Camera { hdr: false, ..default() }, tonemapping: Tonemapping::BlenderFilmic, ..default() }); commands.spawn(MaterialMesh2dBundle { mesh: meshes.add(Rectangle::default()).into(), transform: Transform::default().with_scale(Vec3::splat(128.)), material: materials.add(Color::from(PURPLE)), ..default() }); commands.spawn(SpriteBundle { texture: asset_server.load("asd.png"), ..default() }); } fn toggle_tonemapping_method( keys: Res<ButtonInput<KeyCode>>, mut tonemapping: Query<&mut Tonemapping>, ) { let mut method = tonemapping.single_mut(); if keys.just_pressed(KeyCode::Digit1) { *method = Tonemapping::None; } else if keys.just_pressed(KeyCode::Digit2) { *method = Tonemapping::Reinhard; } else if keys.just_pressed(KeyCode::Digit3) { *method = Tonemapping::ReinhardLuminance; } else if keys.just_pressed(KeyCode::Digit4) { *method = Tonemapping::AcesFitted; } else if keys.just_pressed(KeyCode::Digit5) { *method = Tonemapping::AgX; } else if keys.just_pressed(KeyCode::Digit6) { *method = Tonemapping::SomewhatBoringDisplayTransform; } else if keys.just_pressed(KeyCode::Digit7) { *method = Tonemapping::TonyMcMapface; } else if keys.just_pressed(KeyCode::Digit8) { *method = Tonemapping::BlenderFilmic; } } ``` --- ## Changelog Fix the bug which led to the crash when user uses any tonemapper without hdr for rendering sprites and 2d meshes.
2024-05-28 12:09:26 +00:00
// pow() but safe for NaNs/negatives
fn powsafe(color: vec3<f32>, power: f32) -> vec3<f32> {
return pow(abs(color), vec3(power)) * sign(color);
}