2022-06-14 00:32:33 +00:00
|
|
|
#define_import_path bevy_pbr::mesh_functions
|
|
|
|
|
2023-10-21 11:51:58 +00:00
|
|
|
#import bevy_pbr::{
|
|
|
|
mesh_view_bindings::view,
|
|
|
|
mesh_bindings::mesh,
|
|
|
|
mesh_types::MESH_FLAGS_SIGN_DETERMINANT_MODEL_3X3_BIT,
|
2023-10-24 21:26:19 +00:00
|
|
|
view_transformations::position_world_to_clip,
|
2023-10-21 11:51:58 +00:00
|
|
|
}
|
|
|
|
#import bevy_render::{
|
|
|
|
instance_index::get_instance_index,
|
|
|
|
maths::{affine_to_square, mat2x4_f32_to_mat3x3_unpack},
|
|
|
|
}
|
Reduce the size of MeshUniform to improve performance (#9416)
# Objective
- Significantly reduce the size of MeshUniform by only including
necessary data.
## Solution
Local to world, model transforms are affine. This means they only need a
4x3 matrix to represent them.
`MeshUniform` stores the current, and previous model transforms, and the
inverse transpose of the current model transform, all as 4x4 matrices.
Instead we can store the current, and previous model transforms as 4x3
matrices, and we only need the upper-left 3x3 part of the inverse
transpose of the current model transform. This change allows us to
reduce the serialized MeshUniform size from 208 bytes to 144 bytes,
which is over a 30% saving in data to serialize, and VRAM bandwidth and
space.
## Benchmarks
On an M1 Max, running `many_cubes -- sphere`, main is in yellow, this PR
is in red:
<img width="1484" alt="Screenshot 2023-08-11 at 02 36 43"
src="https://github.com/bevyengine/bevy/assets/302146/7d99c7b3-f2bb-4004-a8d0-4c00f755cb0d">
A reduction in frame time of ~14%.
---
## Changelog
- Changed: Redefined `MeshUniform` to improve performance by using 4x3
affine transforms and reconstructing 4x4 matrices in the shader. Helper
functions were added to `bevy_pbr::mesh_functions` to unpack the data.
`affine_to_square` converts the packed 4x3 in 3x4 matrix data to a 4x4
matrix. `mat2x4_f32_to_mat3x3` converts the 3x3 in mat2x4 + f32 matrix
data back into a 3x3.
## Migration Guide
Shader code before:
```
var model = mesh[instance_index].model;
```
Shader code after:
```
#import bevy_pbr::mesh_functions affine_to_square
var model = affine_to_square(mesh[instance_index].model);
```
2023-08-15 06:00:23 +00:00
|
|
|
|
|
|
|
fn get_model_matrix(instance_index: u32) -> mat4x4<f32> {
|
|
|
|
return affine_to_square(mesh[get_instance_index(instance_index)].model);
|
|
|
|
}
|
|
|
|
|
|
|
|
fn get_previous_model_matrix(instance_index: u32) -> mat4x4<f32> {
|
|
|
|
return affine_to_square(mesh[get_instance_index(instance_index)].previous_model);
|
|
|
|
}
|
improve shader import model (#5703)
# Objective
operate on naga IR directly to improve handling of shader modules.
- give codespan reporting into imported modules
- allow glsl to be used from wgsl and vice-versa
the ultimate objective is to make it possible to
- provide user hooks for core shader functions (to modify light
behaviour within the standard pbr pipeline, for example)
- make automatic binding slot allocation possible
but ... since this is already big, adds some value and (i think) is at
feature parity with the existing code, i wanted to push this now.
## Solution
i made a crate called naga_oil (https://github.com/robtfm/naga_oil -
unpublished for now, could be part of bevy) which manages modules by
- building each module independantly to naga IR
- creating "header" files for each supported language, which are used to
build dependent modules/shaders
- make final shaders by combining the shader IR with the IR for imported
modules
then integrated this into bevy, replacing some of the existing shader
processing stuff. also reworked examples to reflect this.
## Migration Guide
shaders that don't use `#import` directives should work without changes.
the most notable user-facing difference is that imported
functions/variables/etc need to be qualified at point of use, and
there's no "leakage" of visible stuff into your shader scope from the
imports of your imports, so if you used things imported by your imports,
you now need to import them directly and qualify them.
the current strategy of including/'spreading' `mesh_vertex_output`
directly into a struct doesn't work any more, so these need to be
modified as per the examples (e.g. color_material.wgsl, or many others).
mesh data is assumed to be in bindgroup 2 by default, if mesh data is
bound into bindgroup 1 instead then the shader def `MESH_BINDGROUP_1`
needs to be added to the pipeline shader_defs.
2023-06-27 00:29:22 +00:00
|
|
|
|
2022-06-14 00:32:33 +00:00
|
|
|
fn mesh_position_local_to_world(model: mat4x4<f32>, vertex_position: vec4<f32>) -> vec4<f32> {
|
|
|
|
return model * vertex_position;
|
|
|
|
}
|
|
|
|
|
|
|
|
// NOTE: The intermediate world_position assignment is important
|
|
|
|
// for precision purposes when using the 'equals' depth comparison
|
|
|
|
// function.
|
|
|
|
fn mesh_position_local_to_clip(model: mat4x4<f32>, vertex_position: vec4<f32>) -> vec4<f32> {
|
|
|
|
let world_position = mesh_position_local_to_world(model, vertex_position);
|
2023-10-24 21:26:19 +00:00
|
|
|
return position_world_to_clip(world_position.xyz);
|
2022-06-14 00:32:33 +00:00
|
|
|
}
|
|
|
|
|
Use GpuArrayBuffer for MeshUniform (#9254)
# Objective
- Reduce the number of rebindings to enable batching of draw commands
## Solution
- Use the new `GpuArrayBuffer` for `MeshUniform` data to store all
`MeshUniform` data in arrays within fewer bindings
- Sort opaque/alpha mask prepass, opaque/alpha mask main, and shadow
phases also by the batch per-object data binding dynamic offset to
improve performance on WebGL2.
---
## Changelog
- Changed: Per-object `MeshUniform` data is now managed by
`GpuArrayBuffer` as arrays in buffers that need to be indexed into.
## Migration Guide
Accessing the `model` member of an individual mesh object's shader
`Mesh` struct the old way where each `MeshUniform` was stored at its own
dynamic offset:
```rust
struct Vertex {
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh.model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
The new way where one needs to index into the array of `Mesh`es for the
batch:
```rust
struct Vertex {
@builtin(instance_index) instance_index: u32,
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh[vertex.instance_index].model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
Note that using the instance_index is the default way to pass the
per-object index into the shader, but if you wish to do custom rendering
approaches you can pass it in however you like.
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
2023-07-30 13:17:08 +00:00
|
|
|
fn mesh_normal_local_to_world(vertex_normal: vec3<f32>, instance_index: u32) -> vec3<f32> {
|
2022-08-18 21:54:40 +00:00
|
|
|
// NOTE: The mikktspace method of normal mapping requires that the world normal is
|
|
|
|
// re-normalized in the vertex shader to match the way mikktspace bakes vertex tangents
|
|
|
|
// and normal maps so that the exact inverse process is applied when shading. Blender, Unity,
|
|
|
|
// Unreal Engine, Godot, and more all use the mikktspace method. Do not change this code
|
|
|
|
// unless you really know what you are doing.
|
|
|
|
// http://www.mikktspace.com/
|
|
|
|
return normalize(
|
Reduce the size of MeshUniform to improve performance (#9416)
# Objective
- Significantly reduce the size of MeshUniform by only including
necessary data.
## Solution
Local to world, model transforms are affine. This means they only need a
4x3 matrix to represent them.
`MeshUniform` stores the current, and previous model transforms, and the
inverse transpose of the current model transform, all as 4x4 matrices.
Instead we can store the current, and previous model transforms as 4x3
matrices, and we only need the upper-left 3x3 part of the inverse
transpose of the current model transform. This change allows us to
reduce the serialized MeshUniform size from 208 bytes to 144 bytes,
which is over a 30% saving in data to serialize, and VRAM bandwidth and
space.
## Benchmarks
On an M1 Max, running `many_cubes -- sphere`, main is in yellow, this PR
is in red:
<img width="1484" alt="Screenshot 2023-08-11 at 02 36 43"
src="https://github.com/bevyengine/bevy/assets/302146/7d99c7b3-f2bb-4004-a8d0-4c00f755cb0d">
A reduction in frame time of ~14%.
---
## Changelog
- Changed: Redefined `MeshUniform` to improve performance by using 4x3
affine transforms and reconstructing 4x4 matrices in the shader. Helper
functions were added to `bevy_pbr::mesh_functions` to unpack the data.
`affine_to_square` converts the packed 4x3 in 3x4 matrix data to a 4x4
matrix. `mat2x4_f32_to_mat3x3` converts the 3x3 in mat2x4 + f32 matrix
data back into a 3x3.
## Migration Guide
Shader code before:
```
var model = mesh[instance_index].model;
```
Shader code after:
```
#import bevy_pbr::mesh_functions affine_to_square
var model = affine_to_square(mesh[instance_index].model);
```
2023-08-15 06:00:23 +00:00
|
|
|
mat2x4_f32_to_mat3x3_unpack(
|
|
|
|
mesh[instance_index].inverse_transpose_model_a,
|
|
|
|
mesh[instance_index].inverse_transpose_model_b,
|
2022-08-18 21:54:40 +00:00
|
|
|
) * vertex_normal
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Calculates the sign of the determinant of the 3x3 model matrix based on a
|
|
|
|
// mesh flag
|
Use GpuArrayBuffer for MeshUniform (#9254)
# Objective
- Reduce the number of rebindings to enable batching of draw commands
## Solution
- Use the new `GpuArrayBuffer` for `MeshUniform` data to store all
`MeshUniform` data in arrays within fewer bindings
- Sort opaque/alpha mask prepass, opaque/alpha mask main, and shadow
phases also by the batch per-object data binding dynamic offset to
improve performance on WebGL2.
---
## Changelog
- Changed: Per-object `MeshUniform` data is now managed by
`GpuArrayBuffer` as arrays in buffers that need to be indexed into.
## Migration Guide
Accessing the `model` member of an individual mesh object's shader
`Mesh` struct the old way where each `MeshUniform` was stored at its own
dynamic offset:
```rust
struct Vertex {
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh.model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
The new way where one needs to index into the array of `Mesh`es for the
batch:
```rust
struct Vertex {
@builtin(instance_index) instance_index: u32,
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh[vertex.instance_index].model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
Note that using the instance_index is the default way to pass the
per-object index into the shader, but if you wish to do custom rendering
approaches you can pass it in however you like.
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
2023-07-30 13:17:08 +00:00
|
|
|
fn sign_determinant_model_3x3m(instance_index: u32) -> f32 {
|
2022-08-18 21:54:40 +00:00
|
|
|
// bool(u32) is false if 0u else true
|
|
|
|
// f32(bool) is 1.0 if true else 0.0
|
|
|
|
// * 2.0 - 1.0 remaps 0.0 or 1.0 to -1.0 or 1.0 respectively
|
Use GpuArrayBuffer for MeshUniform (#9254)
# Objective
- Reduce the number of rebindings to enable batching of draw commands
## Solution
- Use the new `GpuArrayBuffer` for `MeshUniform` data to store all
`MeshUniform` data in arrays within fewer bindings
- Sort opaque/alpha mask prepass, opaque/alpha mask main, and shadow
phases also by the batch per-object data binding dynamic offset to
improve performance on WebGL2.
---
## Changelog
- Changed: Per-object `MeshUniform` data is now managed by
`GpuArrayBuffer` as arrays in buffers that need to be indexed into.
## Migration Guide
Accessing the `model` member of an individual mesh object's shader
`Mesh` struct the old way where each `MeshUniform` was stored at its own
dynamic offset:
```rust
struct Vertex {
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh.model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
The new way where one needs to index into the array of `Mesh`es for the
batch:
```rust
struct Vertex {
@builtin(instance_index) instance_index: u32,
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh[vertex.instance_index].model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
Note that using the instance_index is the default way to pass the
per-object index into the shader, but if you wish to do custom rendering
approaches you can pass it in however you like.
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
2023-07-30 13:17:08 +00:00
|
|
|
return f32(bool(mesh[instance_index].flags & MESH_FLAGS_SIGN_DETERMINANT_MODEL_3X3_BIT)) * 2.0 - 1.0;
|
2022-06-14 00:32:33 +00:00
|
|
|
}
|
|
|
|
|
Use GpuArrayBuffer for MeshUniform (#9254)
# Objective
- Reduce the number of rebindings to enable batching of draw commands
## Solution
- Use the new `GpuArrayBuffer` for `MeshUniform` data to store all
`MeshUniform` data in arrays within fewer bindings
- Sort opaque/alpha mask prepass, opaque/alpha mask main, and shadow
phases also by the batch per-object data binding dynamic offset to
improve performance on WebGL2.
---
## Changelog
- Changed: Per-object `MeshUniform` data is now managed by
`GpuArrayBuffer` as arrays in buffers that need to be indexed into.
## Migration Guide
Accessing the `model` member of an individual mesh object's shader
`Mesh` struct the old way where each `MeshUniform` was stored at its own
dynamic offset:
```rust
struct Vertex {
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh.model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
The new way where one needs to index into the array of `Mesh`es for the
batch:
```rust
struct Vertex {
@builtin(instance_index) instance_index: u32,
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh[vertex.instance_index].model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
Note that using the instance_index is the default way to pass the
per-object index into the shader, but if you wish to do custom rendering
approaches you can pass it in however you like.
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
2023-07-30 13:17:08 +00:00
|
|
|
fn mesh_tangent_local_to_world(model: mat4x4<f32>, vertex_tangent: vec4<f32>, instance_index: u32) -> vec4<f32> {
|
2022-08-18 21:54:40 +00:00
|
|
|
// NOTE: The mikktspace method of normal mapping requires that the world tangent is
|
|
|
|
// re-normalized in the vertex shader to match the way mikktspace bakes vertex tangents
|
|
|
|
// and normal maps so that the exact inverse process is applied when shading. Blender, Unity,
|
|
|
|
// Unreal Engine, Godot, and more all use the mikktspace method. Do not change this code
|
|
|
|
// unless you really know what you are doing.
|
|
|
|
// http://www.mikktspace.com/
|
2022-06-14 00:32:33 +00:00
|
|
|
return vec4<f32>(
|
2022-08-18 21:54:40 +00:00
|
|
|
normalize(
|
|
|
|
mat3x3<f32>(
|
|
|
|
model[0].xyz,
|
|
|
|
model[1].xyz,
|
|
|
|
model[2].xyz
|
|
|
|
) * vertex_tangent.xyz
|
|
|
|
),
|
|
|
|
// NOTE: Multiplying by the sign of the determinant of the 3x3 model matrix accounts for
|
|
|
|
// situations such as negative scaling.
|
Use GpuArrayBuffer for MeshUniform (#9254)
# Objective
- Reduce the number of rebindings to enable batching of draw commands
## Solution
- Use the new `GpuArrayBuffer` for `MeshUniform` data to store all
`MeshUniform` data in arrays within fewer bindings
- Sort opaque/alpha mask prepass, opaque/alpha mask main, and shadow
phases also by the batch per-object data binding dynamic offset to
improve performance on WebGL2.
---
## Changelog
- Changed: Per-object `MeshUniform` data is now managed by
`GpuArrayBuffer` as arrays in buffers that need to be indexed into.
## Migration Guide
Accessing the `model` member of an individual mesh object's shader
`Mesh` struct the old way where each `MeshUniform` was stored at its own
dynamic offset:
```rust
struct Vertex {
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh.model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
The new way where one needs to index into the array of `Mesh`es for the
batch:
```rust
struct Vertex {
@builtin(instance_index) instance_index: u32,
@location(0) position: vec3<f32>,
};
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
out.clip_position = mesh_position_local_to_clip(
mesh[vertex.instance_index].model,
vec4<f32>(vertex.position, 1.0)
);
return out;
}
```
Note that using the instance_index is the default way to pass the
per-object index into the shader, but if you wish to do custom rendering
approaches you can pass it in however you like.
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
2023-07-30 13:17:08 +00:00
|
|
|
vertex_tangent.w * sign_determinant_model_3x3m(instance_index)
|
2022-06-14 00:32:33 +00:00
|
|
|
);
|
|
|
|
}
|