bevy/crates/bevy_pbr/src/meshlet/asset.rs
JMS55 6cc96f4c1f
Meshlet software raster + start of cleanup (#14623)
# Objective
- Faster meshlet rasterization path for small triangles
- Avoid having to allocate and write out a triangle buffer
- Refactor gpu_scene.rs

## Solution
- Replace the 32bit visbuffer texture with a 64bit visbuffer buffer,
where the left 32 bits encode depth, and the right 32 bits encode the
existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga
doesn't support atomic ops on textures yet.
- Instead of writing out a buffer of packed cluster + triangle IDs (per
triangle) to raster, the culling pass now writes out a buffer of just
cluster IDs (per cluster, so less memory allocated, cheaper to write
out).
  - Clusters for software raster are allocated from the left side
- Clusters for hardware raster are allocated in the same buffer, from
the right side
- The buffer size is fixed at MeshletPlugin build time, and should be
set to a reasonable value for your scene (no warning on overflow, and no
good way to determine what value you need outside of renderdoc - I plan
to fix this in a future PR adding a meshlet stats overlay)
- Currently I don't have a heuristic for software vs hardware raster
selection for each cluster. The existing code is just a placeholder. I
need to profile on a release scene and come up with a heuristic,
probably in a future PR.
- The culling shader is getting pretty hard to follow at this point, but
I don't want to spend time improving it as the entire shader/pass is
getting rewritten/replaced in the near future.
- Software raster is a compute workgroup per-cluster. Each workgroup
loads and transforms the <=64 vertices of the cluster, and then
rasterizes the <=64 triangles of the cluster.
- Two variants are implemented: Scanline for clusters with any larger
triangles (still smaller than hardware is good at), and brute-force for
very very tiny triangles
- Once the shader determines that a pixel should be filled in, it does
an atomicMax() on the visbuffer to store the results, copying how Nanite
works
- On devices with a low max workgroups per dispatch limit, an extra
compute pass is inserted before software raster to convert from a 1d to
2d dispatch (I don't think 3d would ever be necessary).
- I haven't implemented the top-left rule or subpixel precision yet, I'm
leaving that for a future PR since I get usable results without it for
now
- Resources used:
https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters
6-8 of
https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index
- Hardware raster now spawns 64*3 vertex invocations per meshlet,
instead of the actual meshlet vertex count. Extra invocations just
early-exit.
- While this is slower than the existing system, hardware draws should
be rare now that software raster is usable, and it saves a ton of memory
using the unified cluster ID buffer. This would be fixed if wgpu had
support for mesh shaders.
- Instead of writing to a color+depth attachment, the hardware raster
pass also does the same atomic visbuffer writes that software raster
uses.
- We have to bind a dummy render target anyways, as wgpu doesn't
currently support render passes without any attachments
- Material IDs are no longer written out during the main rasterization
passes.
- If we had async compute queues, we could overlap the software and
hardware raster passes.
- New material and depth resolve passes run at the end of the visbuffer
node, and write out view depth and material ID depth textures

### Misc changes
- Fixed cluster culling importing, but never actually using the previous
view uniforms when doing occlusion culling
- Fixed incorrectly adding the LOD error twice when building the meshlet
mesh
- Splitup gpu_scene module into meshlet_mesh_manager, instance_manager,
and resource_manager
- resource_manager is still too complex and inefficient (extract and
prepare are way too expensive). I plan on improving this in a future PR,
but for now ResourceManager is mostly a 1:1 port of the leftover
MeshletGpuScene bits.
- Material draw passes have been renamed to the more accurate material
shade pass, as well as some other misc renaming (in the future, these
will be compute shaders even, and not actual draw calls)

---

## Migration Guide
- TBD (ask me at the end of the release for meshlet changes as a whole)

---------

Co-authored-by: vero <email@atlasdostal.com>
2024-08-26 17:54:34 +00:00

230 lines
8 KiB
Rust

use bevy_asset::{
io::{Reader, Writer},
saver::{AssetSaver, SavedAsset},
Asset, AssetLoader, AsyncReadExt, AsyncWriteExt, LoadContext,
};
use bevy_math::Vec3;
use bevy_reflect::TypePath;
use bevy_tasks::block_on;
use bytemuck::{Pod, Zeroable};
use lz4_flex::frame::{FrameDecoder, FrameEncoder};
use std::{
io::{Read, Write},
sync::Arc,
};
/// Unique identifier for the [`MeshletMesh`] asset format.
const MESHLET_MESH_ASSET_MAGIC: u64 = 1717551717668;
/// The current version of the [`MeshletMesh`] asset format.
pub const MESHLET_MESH_ASSET_VERSION: u64 = 1;
/// A mesh that has been pre-processed into multiple small clusters of triangles called meshlets.
///
/// A [`bevy_render::mesh::Mesh`] can be converted to a [`MeshletMesh`] using `MeshletMesh::from_mesh` when the `meshlet_processor` cargo feature is enabled.
/// The conversion step is very slow, and is meant to be ran once ahead of time, and not during runtime. This type of mesh is not suitable for
/// dynamically generated geometry.
///
/// There are restrictions on the [`crate::Material`] functionality that can be used with this type of mesh.
/// * Materials have no control over the vertex shader or vertex attributes.
/// * Materials must be opaque. Transparent, alpha masked, and transmissive materials are not supported.
/// * Materials must use the [`crate::Material::meshlet_mesh_fragment_shader`] method (and similar variants for prepass/deferred shaders)
/// which requires certain shader patterns that differ from the regular material shaders.
/// * Limited control over [`bevy_render::render_resource::RenderPipelineDescriptor`] attributes.
///
/// See also [`super::MaterialMeshletMeshBundle`] and [`super::MeshletPlugin`].
#[derive(Asset, TypePath, Clone)]
pub struct MeshletMesh {
/// Raw vertex data bytes for the overall mesh.
pub(crate) vertex_data: Arc<[u8]>,
/// Indices into `vertex_data`.
pub(crate) vertex_ids: Arc<[u32]>,
/// Indices into `vertex_ids`.
pub(crate) indices: Arc<[u8]>,
/// The list of meshlets making up this mesh.
pub(crate) meshlets: Arc<[Meshlet]>,
/// Spherical bounding volumes.
pub(crate) bounding_spheres: Arc<[MeshletBoundingSpheres]>,
}
/// A single meshlet within a [`MeshletMesh`].
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct Meshlet {
/// The offset within the parent mesh's [`MeshletMesh::vertex_ids`] buffer where the indices for this meshlet begin.
pub start_vertex_id: u32,
/// The offset within the parent mesh's [`MeshletMesh::indices`] buffer where the indices for this meshlet begin.
pub start_index_id: u32,
/// The amount of vertices in this meshlet.
pub vertex_count: u32,
/// The amount of triangles in this meshlet.
pub triangle_count: u32,
}
/// Bounding spheres used for culling and choosing level of detail for a [`Meshlet`].
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct MeshletBoundingSpheres {
/// The bounding sphere used for frustum and occlusion culling for this meshlet.
pub self_culling: MeshletBoundingSphere,
/// The bounding sphere used for determining if this meshlet is at the correct level of detail for a given view.
pub self_lod: MeshletBoundingSphere,
/// The bounding sphere used for determining if this meshlet's parent is at the correct level of detail for a given view.
pub parent_lod: MeshletBoundingSphere,
}
/// A spherical bounding volume used for a [`Meshlet`].
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct MeshletBoundingSphere {
pub center: Vec3,
pub radius: f32,
}
/// An [`AssetLoader`] and [`AssetSaver`] for `.meshlet_mesh` [`MeshletMesh`] assets.
pub struct MeshletMeshSaverLoader;
impl AssetSaver for MeshletMeshSaverLoader {
type Asset = MeshletMesh;
type Settings = ();
type OutputLoader = Self;
type Error = MeshletMeshSaveOrLoadError;
async fn save<'a>(
&'a self,
writer: &'a mut Writer,
asset: SavedAsset<'a, MeshletMesh>,
_settings: &'a (),
) -> Result<(), MeshletMeshSaveOrLoadError> {
// Write asset magic number
writer
.write_all(&MESHLET_MESH_ASSET_MAGIC.to_le_bytes())
.await?;
// Write asset version
writer
.write_all(&MESHLET_MESH_ASSET_VERSION.to_le_bytes())
.await?;
// Compress and write asset data
let mut writer = FrameEncoder::new(AsyncWriteSyncAdapter(writer));
write_slice(&asset.vertex_data, &mut writer)?;
write_slice(&asset.vertex_ids, &mut writer)?;
write_slice(&asset.indices, &mut writer)?;
write_slice(&asset.meshlets, &mut writer)?;
write_slice(&asset.bounding_spheres, &mut writer)?;
writer.finish()?;
Ok(())
}
}
impl AssetLoader for MeshletMeshSaverLoader {
type Asset = MeshletMesh;
type Settings = ();
type Error = MeshletMeshSaveOrLoadError;
async fn load<'a>(
&'a self,
reader: &'a mut dyn Reader,
_settings: &'a (),
_load_context: &'a mut LoadContext<'_>,
) -> Result<MeshletMesh, MeshletMeshSaveOrLoadError> {
// Load and check magic number
let magic = async_read_u64(reader).await?;
if magic != MESHLET_MESH_ASSET_MAGIC {
return Err(MeshletMeshSaveOrLoadError::WrongFileType);
}
// Load and check asset version
let version = async_read_u64(reader).await?;
if version != MESHLET_MESH_ASSET_VERSION {
return Err(MeshletMeshSaveOrLoadError::WrongVersion { found: version });
}
// Load and decompress asset data
let reader = &mut FrameDecoder::new(AsyncReadSyncAdapter(reader));
let vertex_data = read_slice(reader)?;
let vertex_ids = read_slice(reader)?;
let indices = read_slice(reader)?;
let meshlets = read_slice(reader)?;
let bounding_spheres = read_slice(reader)?;
Ok(MeshletMesh {
vertex_data,
vertex_ids,
indices,
meshlets,
bounding_spheres,
})
}
fn extensions(&self) -> &[&str] {
&["meshlet_mesh"]
}
}
#[derive(thiserror::Error, Debug)]
pub enum MeshletMeshSaveOrLoadError {
#[error("file was not a MeshletMesh asset")]
WrongFileType,
#[error("expected asset version {MESHLET_MESH_ASSET_VERSION} but found version {found}")]
WrongVersion { found: u64 },
#[error("failed to compress or decompress asset data")]
CompressionOrDecompression(#[from] lz4_flex::frame::Error),
#[error("failed to read or write asset data")]
Io(#[from] std::io::Error),
}
async fn async_read_u64(reader: &mut dyn Reader) -> Result<u64, std::io::Error> {
let mut bytes = [0u8; 8];
reader.read_exact(&mut bytes).await?;
Ok(u64::from_le_bytes(bytes))
}
fn read_u64(reader: &mut dyn Read) -> Result<u64, std::io::Error> {
let mut bytes = [0u8; 8];
reader.read_exact(&mut bytes)?;
Ok(u64::from_le_bytes(bytes))
}
fn write_slice<T: Pod>(
field: &[T],
writer: &mut dyn Write,
) -> Result<(), MeshletMeshSaveOrLoadError> {
writer.write_all(&(field.len() as u64).to_le_bytes())?;
writer.write_all(bytemuck::cast_slice(field))?;
Ok(())
}
fn read_slice<T: Pod>(reader: &mut dyn Read) -> Result<Arc<[T]>, std::io::Error> {
let len = read_u64(reader)? as usize;
let mut data: Arc<[T]> = std::iter::repeat_with(T::zeroed).take(len).collect();
let slice = Arc::get_mut(&mut data).unwrap();
reader.read_exact(bytemuck::cast_slice_mut(slice))?;
Ok(data)
}
// TODO: Use async for everything and get rid of this adapter
struct AsyncWriteSyncAdapter<'a>(&'a mut Writer);
impl Write for AsyncWriteSyncAdapter<'_> {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
block_on(self.0.write(buf))
}
fn flush(&mut self) -> std::io::Result<()> {
block_on(self.0.flush())
}
}
// TODO: Use async for everything and get rid of this adapter
struct AsyncReadSyncAdapter<'a>(&'a mut dyn Reader);
impl Read for AsyncReadSyncAdapter<'_> {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
block_on(self.0.read(buf))
}
}