bevy/examples/2d/mesh2d_manual.rs
Robert Swain 5c884c5a15
Automatic batching/instancing of draw commands (#9685)
# Objective

- Implement the foundations of automatic batching/instancing of draw
commands as the next step from #89
- NOTE: More performance improvements will come when more data is
managed and bound in ways that do not require rebinding such as mesh,
material, and texture data.

## Solution

- The core idea for batching of draw commands is to check whether any of
the information that has to be passed when encoding a draw command
changes between two things that are being drawn according to the sorted
render phase order. These should be things like the pipeline, bind
groups and their dynamic offsets, index/vertex buffers, and so on.
  - The following assumptions have been made:
- Only entities with prepared assets (pipelines, materials, meshes) are
queued to phases
- View bindings are constant across a phase for a given draw function as
phases are per-view
- `batch_and_prepare_render_phase` is the only system that performs this
batching and has sole responsibility for preparing the per-object data.
As such the mesh binding and dynamic offsets are assumed to only vary as
a result of the `batch_and_prepare_render_phase` system, e.g. due to
having to split data across separate uniform bindings within the same
buffer due to the maximum uniform buffer binding size.
- Implement `GpuArrayBuffer` for `Mesh2dUniform` to store Mesh2dUniform
in arrays in GPU buffers rather than each one being at a dynamic offset
in a uniform buffer. This is the same optimisation that was made for 3D
not long ago.
- Change batch size for a range in `PhaseItem`, adding API for getting
or mutating the range. This is more flexible than a size as the length
of the range can be used in place of the size, but the start and end can
be otherwise whatever is needed.
- Add an optional mesh bind group dynamic offset to `PhaseItem`. This
avoids having to do a massive table move just to insert
`GpuArrayBufferIndex` components.

## Benchmarks

All tests have been run on an M1 Max on AC power. `bevymark` and
`many_cubes` were modified to use 1920x1080 with a scale factor of 1. I
run a script that runs a separate Tracy capture process, and then runs
the bevy example with `--features bevy_ci_testing,trace_tracy` and
`CI_TESTING_CONFIG=../benchmark.ron` with the contents of
`../benchmark.ron`:
```rust
(
    exit_after: Some(1500)
)
```
...in order to run each test for 1500 frames.

The recent changes to `many_cubes` and `bevymark` added reproducible
random number generation so that with the same settings, the same rng
will occur. They also added benchmark modes that use a fixed delta time
for animations. Combined this means that the same frames should be
rendered both on main and on the branch.

The graphs compare main (yellow) to this PR (red).

### 3D Mesh `many_cubes --benchmark`

<img width="1411" alt="Screenshot 2023-09-03 at 23 42 10"
src="https://github.com/bevyengine/bevy/assets/302146/2088716a-c918-486c-8129-090b26fd2bc4">
The mesh and material are the same for all instances. This is basically
the best case for the initial batching implementation as it results in 1
draw for the ~11.7k visible meshes. It gives a ~30% reduction in median
frame time.

The 1000th frame is identical using the flip tool:

![flip many_cubes-main-mesh3d many_cubes-batching-mesh3d 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/2511f37a-6df8-481a-932f-706ca4de7643)

```
     Mean: 0.000000
     Weighted median: 0.000000
     1st weighted quartile: 0.000000
     3rd weighted quartile: 0.000000
     Min: 0.000000
     Max: 0.000000
     Evaluation time: 0.4615 seconds
```

### 3D Mesh `many_cubes --benchmark --material-texture-count 10`

<img width="1404" alt="Screenshot 2023-09-03 at 23 45 18"
src="https://github.com/bevyengine/bevy/assets/302146/5ee9c447-5bd2-45c6-9706-ac5ff8916daf">
This run uses 10 different materials by varying their textures. The
materials are randomly selected, and there is no sorting by material
bind group for opaque 3D so any batching is 'random'. The PR produces a
~5% reduction in median frame time. If we were to sort the opaque phase
by the material bind group, then this should be a lot faster. This
produces about 10.5k draws for the 11.7k visible entities. This makes
sense as randomly selecting from 10 materials gives a chance that two
adjacent entities randomly select the same material and can be batched.

The 1000th frame is identical in flip:

![flip many_cubes-main-mesh3d-mtc10 many_cubes-batching-mesh3d-mtc10
67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/2b3a8614-9466-4ed8-b50c-d4aa71615dbb)

```
     Mean: 0.000000
     Weighted median: 0.000000
     1st weighted quartile: 0.000000
     3rd weighted quartile: 0.000000
     Min: 0.000000
     Max: 0.000000
     Evaluation time: 0.4537 seconds
```

### 3D Mesh `many_cubes --benchmark --vary-per-instance`

<img width="1394" alt="Screenshot 2023-09-03 at 23 48 44"
src="https://github.com/bevyengine/bevy/assets/302146/f02a816b-a444-4c18-a96a-63b5436f3b7f">
This run varies the material data per instance by randomly-generating
its colour. This is the worst case for batching and that it performs
about the same as `main` is a good thing as it demonstrates that the
batching has minimal overhead when dealing with ~11k visible mesh
entities.

The 1000th frame is identical according to flip:

![flip many_cubes-main-mesh3d-vpi many_cubes-batching-mesh3d-vpi 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/ac5f5c14-9bda-4d1a-8219-7577d4aac68c)

```
     Mean: 0.000000
     Weighted median: 0.000000
     1st weighted quartile: 0.000000
     3rd weighted quartile: 0.000000
     Min: 0.000000
     Max: 0.000000
     Evaluation time: 0.4568 seconds
```

### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d`

<img width="1412" alt="Screenshot 2023-09-03 at 23 59 56"
src="https://github.com/bevyengine/bevy/assets/302146/cb02ae07-237b-4646-ae9f-fda4dafcbad4">
This spawns 160 waves of 1000 quad meshes that are shaded with
ColorMaterial. Each wave has a different material so 160 waves currently
should result in 160 batches. This results in a 50% reduction in median
frame time.

Capturing a screenshot of the 1000th frame main vs PR gives:

![flip bevymark-main-mesh2d bevymark-batching-mesh2d 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/80102728-1217-4059-87af-14d05044df40)

```
     Mean: 0.001222
     Weighted median: 0.750432
     1st weighted quartile: 0.453494
     3rd weighted quartile: 0.969758
     Min: 0.000000
     Max: 0.990296
     Evaluation time: 0.4255 seconds
```

So they seem to produce the same results. I also double-checked the
number of draws. `main` does 160000 draws, and the PR does 160, as
expected.

### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d --material-texture-count 10`

<img width="1392" alt="Screenshot 2023-09-04 at 00 09 22"
src="https://github.com/bevyengine/bevy/assets/302146/4358da2e-ce32-4134-82df-3ab74c40849c">
This generates 10 textures and generates materials for each of those and
then selects one material per wave. The median frame time is reduced by
50%. Similar to the plain run above, this produces 160 draws on the PR
and 160000 on `main` and the 1000th frame is identical (ignoring the fps
counter text overlay).

![flip bevymark-main-mesh2d-mtc10 bevymark-batching-mesh2d-mtc10 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/ebed2822-dce7-426a-858b-b77dc45b986f)

```
     Mean: 0.002877
     Weighted median: 0.964980
     1st weighted quartile: 0.668871
     3rd weighted quartile: 0.982749
     Min: 0.000000
     Max: 0.992377
     Evaluation time: 0.4301 seconds
```

### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d --vary-per-instance`

<img width="1396" alt="Screenshot 2023-09-04 at 00 13 53"
src="https://github.com/bevyengine/bevy/assets/302146/b2198b18-3439-47ad-919a-cdabe190facb">
This creates unique materials per instance by randomly-generating the
material's colour. This is the worst case for 2D batching. Somehow, this
PR manages a 7% reduction in median frame time. Both main and this PR
issue 160000 draws.

The 1000th frame is the same:

![flip bevymark-main-mesh2d-vpi bevymark-batching-mesh2d-vpi 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/a2ec471c-f576-4a36-a23b-b24b22578b97)

```
     Mean: 0.001214
     Weighted median: 0.937499
     1st weighted quartile: 0.635467
     3rd weighted quartile: 0.979085
     Min: 0.000000
     Max: 0.988971
     Evaluation time: 0.4462 seconds
```

### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite`

<img width="1396" alt="Screenshot 2023-09-04 at 12 21 12"
src="https://github.com/bevyengine/bevy/assets/302146/8b31e915-d6be-4cac-abf5-c6a4da9c3d43">
This just spawns 160 waves of 1000 sprites. There should be and is no
notable difference between main and the PR.

### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite --material-texture-count 10`

<img width="1389" alt="Screenshot 2023-09-04 at 12 36 08"
src="https://github.com/bevyengine/bevy/assets/302146/45fe8d6d-c901-4062-a349-3693dd044413">
This spawns the sprites selecting a texture at random per instance from
the 10 generated textures. This has no significant change vs main and
shouldn't.

### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite --vary-per-instance`

<img width="1401" alt="Screenshot 2023-09-04 at 12 29 52"
src="https://github.com/bevyengine/bevy/assets/302146/762c5c60-352e-471f-8dbe-bbf10e24ebd6">
This sets the sprite colour as being unique per instance. This can still
all be drawn using one batch. There should be no difference but the PR
produces median frame times that are 4% higher. Investigation showed no
clear sources of cost, rather a mix of give and take that should not
happen. It seems like noise in the results.

### Summary

| Benchmark  | % change in median frame time |
| ------------- | ------------- |
| many_cubes  | 🟩 -30%  |
| many_cubes 10 materials  | 🟩 -5%  |
| many_cubes unique materials  | 🟩 ~0%  |
| bevymark mesh2d  | 🟩 -50%  |
| bevymark mesh2d 10 materials  | 🟩 -50%  |
| bevymark mesh2d unique materials  | 🟩 -7%  |
| bevymark sprite  | 🟥 2%  |
| bevymark sprite 10 materials  | 🟥 0.6%  |
| bevymark sprite unique materials  | 🟥 4.1%  |

---

## Changelog

- Added: 2D and 3D mesh entities that share the same mesh and material
(same textures, same data) are now batched into the same draw command
for better performance.

---------

Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Nicola Papale <nico@nicopap.ch>
2023-09-21 22:12:34 +00:00

368 lines
14 KiB
Rust

//! This example shows how to manually render 2d items using "mid level render apis" with a custom
//! pipeline for 2d meshes.
//! It doesn't use the [`Material2d`] abstraction, but changes the vertex buffer to include vertex color.
//! Check out the "mesh2d" example for simpler / higher level 2d meshes.
use bevy::{
core_pipeline::core_2d::Transparent2d,
prelude::*,
render::{
mesh::{Indices, MeshVertexAttribute},
render_asset::RenderAssets,
render_phase::{AddRenderCommand, DrawFunctions, RenderPhase, SetItemPipeline},
render_resource::{
BlendState, ColorTargetState, ColorWrites, Face, FragmentState, FrontFace,
MultisampleState, PipelineCache, PolygonMode, PrimitiveState, PrimitiveTopology,
RenderPipelineDescriptor, SpecializedRenderPipeline, SpecializedRenderPipelines,
TextureFormat, VertexBufferLayout, VertexFormat, VertexState, VertexStepMode,
},
texture::BevyDefault,
view::{ExtractedView, ViewTarget, VisibleEntities},
Extract, Render, RenderApp, RenderSet,
},
sprite::{
DrawMesh2d, Mesh2dHandle, Mesh2dPipeline, Mesh2dPipelineKey, Mesh2dTransforms,
SetMesh2dBindGroup, SetMesh2dViewBindGroup,
},
utils::FloatOrd,
};
use std::f32::consts::PI;
fn main() {
App::new()
.add_plugins((DefaultPlugins, ColoredMesh2dPlugin))
.add_systems(Startup, star)
.run();
}
fn star(
mut commands: Commands,
// We will add a new Mesh for the star being created
mut meshes: ResMut<Assets<Mesh>>,
) {
// Let's define the mesh for the object we want to draw: a nice star.
// We will specify here what kind of topology is used to define the mesh,
// that is, how triangles are built from the vertices. We will use a
// triangle list, meaning that each vertex of the triangle has to be
// specified.
let mut star = Mesh::new(PrimitiveTopology::TriangleList);
// Vertices need to have a position attribute. We will use the following
// vertices (I hope you can spot the star in the schema).
//
// 1
//
// 10 2
// 9 0 3
// 8 4
// 6
// 7 5
//
// These vertices are specified in 3D space.
let mut v_pos = vec![[0.0, 0.0, 0.0]];
for i in 0..10 {
// The angle between each vertex is 1/10 of a full rotation.
let a = i as f32 * PI / 5.0;
// The radius of inner vertices (even indices) is 100. For outer vertices (odd indices) it's 200.
let r = (1 - i % 2) as f32 * 100.0 + 100.0;
// Add the vertex position.
v_pos.push([r * a.sin(), r * a.cos(), 0.0]);
}
// Set the position attribute
star.insert_attribute(Mesh::ATTRIBUTE_POSITION, v_pos);
// And a RGB color attribute as well
let mut v_color: Vec<u32> = vec![Color::BLACK.as_linear_rgba_u32()];
v_color.extend_from_slice(&[Color::YELLOW.as_linear_rgba_u32(); 10]);
star.insert_attribute(
MeshVertexAttribute::new("Vertex_Color", 1, VertexFormat::Uint32),
v_color,
);
// Now, we specify the indices of the vertex that are going to compose the
// triangles in our star. Vertices in triangles have to be specified in CCW
// winding (that will be the front face, colored). Since we are using
// triangle list, we will specify each triangle as 3 vertices
// First triangle: 0, 2, 1
// Second triangle: 0, 3, 2
// Third triangle: 0, 4, 3
// etc
// Last triangle: 0, 1, 10
let mut indices = vec![0, 1, 10];
for i in 2..=10 {
indices.extend_from_slice(&[0, i, i - 1]);
}
star.set_indices(Some(Indices::U32(indices)));
// We can now spawn the entities for the star and the camera
commands.spawn((
// We use a marker component to identify the custom colored meshes
ColoredMesh2d,
// The `Handle<Mesh>` needs to be wrapped in a `Mesh2dHandle` to use 2d rendering instead of 3d
Mesh2dHandle(meshes.add(star)),
// This bundle's components are needed for something to be rendered
SpatialBundle::INHERITED_IDENTITY,
));
// Spawn the camera
commands.spawn(Camera2dBundle::default());
}
/// A marker component for colored 2d meshes
#[derive(Component, Default)]
pub struct ColoredMesh2d;
/// Custom pipeline for 2d meshes with vertex colors
#[derive(Resource)]
pub struct ColoredMesh2dPipeline {
/// this pipeline wraps the standard [`Mesh2dPipeline`]
mesh2d_pipeline: Mesh2dPipeline,
}
impl FromWorld for ColoredMesh2dPipeline {
fn from_world(world: &mut World) -> Self {
Self {
mesh2d_pipeline: Mesh2dPipeline::from_world(world),
}
}
}
// We implement `SpecializedPipeline` to customize the default rendering from `Mesh2dPipeline`
impl SpecializedRenderPipeline for ColoredMesh2dPipeline {
type Key = Mesh2dPipelineKey;
fn specialize(&self, key: Self::Key) -> RenderPipelineDescriptor {
// Customize how to store the meshes' vertex attributes in the vertex buffer
// Our meshes only have position and color
let formats = vec![
// Position
VertexFormat::Float32x3,
// Color
VertexFormat::Uint32,
];
let vertex_layout =
VertexBufferLayout::from_vertex_formats(VertexStepMode::Vertex, formats);
let format = match key.contains(Mesh2dPipelineKey::HDR) {
true => ViewTarget::TEXTURE_FORMAT_HDR,
false => TextureFormat::bevy_default(),
};
// Meshes typically live in bind group 2. Because we are using bind group 1
// we need to add the MESH_BINDGROUP_1 shader def so that the bindings are correctly
// linked in the shader.
let shader_defs = vec!["MESH_BINDGROUP_1".into()];
RenderPipelineDescriptor {
vertex: VertexState {
// Use our custom shader
shader: COLORED_MESH2D_SHADER_HANDLE,
entry_point: "vertex".into(),
shader_defs: shader_defs.clone(),
// Use our custom vertex buffer
buffers: vec![vertex_layout],
},
fragment: Some(FragmentState {
// Use our custom shader
shader: COLORED_MESH2D_SHADER_HANDLE,
shader_defs,
entry_point: "fragment".into(),
targets: vec![Some(ColorTargetState {
format,
blend: Some(BlendState::ALPHA_BLENDING),
write_mask: ColorWrites::ALL,
})],
}),
// Use the two standard uniforms for 2d meshes
layout: vec![
// Bind group 0 is the view uniform
self.mesh2d_pipeline.view_layout.clone(),
// Bind group 1 is the mesh uniform
self.mesh2d_pipeline.mesh_layout.clone(),
],
push_constant_ranges: Vec::new(),
primitive: PrimitiveState {
front_face: FrontFace::Ccw,
cull_mode: Some(Face::Back),
unclipped_depth: false,
polygon_mode: PolygonMode::Fill,
conservative: false,
topology: key.primitive_topology(),
strip_index_format: None,
},
depth_stencil: None,
multisample: MultisampleState {
count: key.msaa_samples(),
mask: !0,
alpha_to_coverage_enabled: false,
},
label: Some("colored_mesh2d_pipeline".into()),
}
}
}
// This specifies how to render a colored 2d mesh
type DrawColoredMesh2d = (
// Set the pipeline
SetItemPipeline,
// Set the view uniform as bind group 0
SetMesh2dViewBindGroup<0>,
// Set the mesh uniform as bind group 1
SetMesh2dBindGroup<1>,
// Draw the mesh
DrawMesh2d,
);
// The custom shader can be inline like here, included from another file at build time
// using `include_str!()`, or loaded like any other asset with `asset_server.load()`.
const COLORED_MESH2D_SHADER: &str = r"
// Import the standard 2d mesh uniforms and set their bind groups
#import bevy_sprite::mesh2d_bindings mesh
#import bevy_sprite::mesh2d_functions as MeshFunctions
// The structure of the vertex buffer is as specified in `specialize()`
struct Vertex {
@builtin(instance_index) instance_index: u32,
@location(0) position: vec3<f32>,
@location(1) color: u32,
};
struct VertexOutput {
// The vertex shader must set the on-screen position of the vertex
@builtin(position) clip_position: vec4<f32>,
// We pass the vertex color to the fragment shader in location 0
@location(0) color: vec4<f32>,
};
/// Entry point for the vertex shader
@vertex
fn vertex(vertex: Vertex) -> VertexOutput {
var out: VertexOutput;
// Project the world position of the mesh into screen position
let model = MeshFunctions::get_model_matrix(vertex.instance_index);
out.clip_position = MeshFunctions::mesh2d_position_local_to_clip(model, vec4<f32>(vertex.position, 1.0));
// Unpack the `u32` from the vertex buffer into the `vec4<f32>` used by the fragment shader
out.color = vec4<f32>((vec4<u32>(vertex.color) >> vec4<u32>(0u, 8u, 16u, 24u)) & vec4<u32>(255u)) / 255.0;
return out;
}
// The input of the fragment shader must correspond to the output of the vertex shader for all `location`s
struct FragmentInput {
// The color is interpolated between vertices by default
@location(0) color: vec4<f32>,
};
/// Entry point for the fragment shader
@fragment
fn fragment(in: FragmentInput) -> @location(0) vec4<f32> {
return in.color;
}
";
/// Plugin that renders [`ColoredMesh2d`]s
pub struct ColoredMesh2dPlugin;
/// Handle to the custom shader with a unique random ID
pub const COLORED_MESH2D_SHADER_HANDLE: Handle<Shader> =
Handle::weak_from_u128(13828845428412094821);
impl Plugin for ColoredMesh2dPlugin {
fn build(&self, app: &mut App) {
// Load our custom shader
let mut shaders = app.world.resource_mut::<Assets<Shader>>();
shaders.insert(
COLORED_MESH2D_SHADER_HANDLE,
Shader::from_wgsl(COLORED_MESH2D_SHADER, file!()),
);
// Register our custom draw function, and add our render systems
app.get_sub_app_mut(RenderApp)
.unwrap()
.add_render_command::<Transparent2d, DrawColoredMesh2d>()
.init_resource::<SpecializedRenderPipelines<ColoredMesh2dPipeline>>()
.add_systems(ExtractSchedule, extract_colored_mesh2d)
.add_systems(Render, queue_colored_mesh2d.in_set(RenderSet::QueueMeshes));
}
fn finish(&self, app: &mut App) {
// Register our custom pipeline
app.get_sub_app_mut(RenderApp)
.unwrap()
.init_resource::<ColoredMesh2dPipeline>();
}
}
/// Extract the [`ColoredMesh2d`] marker component into the render app
pub fn extract_colored_mesh2d(
mut commands: Commands,
mut previous_len: Local<usize>,
// When extracting, you must use `Extract` to mark the `SystemParam`s
// which should be taken from the main world.
query: Extract<Query<(Entity, &ViewVisibility), With<ColoredMesh2d>>>,
) {
let mut values = Vec::with_capacity(*previous_len);
for (entity, view_visibility) in &query {
if !view_visibility.get() {
continue;
}
values.push((entity, ColoredMesh2d));
}
*previous_len = values.len();
commands.insert_or_spawn_batch(values);
}
/// Queue the 2d meshes marked with [`ColoredMesh2d`] using our custom pipeline and draw function
#[allow(clippy::too_many_arguments)]
pub fn queue_colored_mesh2d(
transparent_draw_functions: Res<DrawFunctions<Transparent2d>>,
colored_mesh2d_pipeline: Res<ColoredMesh2dPipeline>,
mut pipelines: ResMut<SpecializedRenderPipelines<ColoredMesh2dPipeline>>,
pipeline_cache: Res<PipelineCache>,
msaa: Res<Msaa>,
render_meshes: Res<RenderAssets<Mesh>>,
colored_mesh2d: Query<(&Mesh2dHandle, &Mesh2dTransforms), With<ColoredMesh2d>>,
mut views: Query<(
&VisibleEntities,
&mut RenderPhase<Transparent2d>,
&ExtractedView,
)>,
) {
if colored_mesh2d.is_empty() {
return;
}
// Iterate each view (a camera is a view)
for (visible_entities, mut transparent_phase, view) in &mut views {
let draw_colored_mesh2d = transparent_draw_functions.read().id::<DrawColoredMesh2d>();
let mesh_key = Mesh2dPipelineKey::from_msaa_samples(msaa.samples())
| Mesh2dPipelineKey::from_hdr(view.hdr);
// Queue all entities visible to that view
for visible_entity in &visible_entities.entities {
if let Ok((mesh2d_handle, mesh2d_transforms)) = colored_mesh2d.get(*visible_entity) {
// Get our specialized pipeline
let mut mesh2d_key = mesh_key;
if let Some(mesh) = render_meshes.get(&mesh2d_handle.0) {
mesh2d_key |=
Mesh2dPipelineKey::from_primitive_topology(mesh.primitive_topology);
}
let pipeline_id =
pipelines.specialize(&pipeline_cache, &colored_mesh2d_pipeline, mesh2d_key);
let mesh_z = mesh2d_transforms.transform.translation.z;
transparent_phase.add(Transparent2d {
entity: *visible_entity,
draw_function: draw_colored_mesh2d,
pipeline: pipeline_id,
// The 2d render items are sorted according to their z value before rendering,
// in order to get correct transparency
sort_key: FloatOrd(mesh_z),
// This material is not batched
batch_range: 0..1,
dynamic_offset: None,
});
}
}
}
}