bevy/pipelined/bevy_render2/src/lib.rs
Robert Swain bc5916cce7 Frustum culling (#2861)
# Objective

Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M.

macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes.

## Solution

- Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc.
- I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps.
  - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](c91bd5fcfd/rafx-visibility/src/geometry/frustum.rs (L64-L92)) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations.
- When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free.
- For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB.
- The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice.
- `RenderLayers` were copied over from the current renderer.
- `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras.
- `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false.
- The `ComputedVisibility` is used to decide which meshes to extract.
- I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win.

I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00

293 lines
11 KiB
Rust

pub mod camera;
pub mod color;
pub mod mesh;
pub mod primitives;
pub mod render_asset;
pub mod render_component;
pub mod render_graph;
pub mod render_phase;
pub mod render_resource;
pub mod renderer;
pub mod texture;
pub mod view;
pub use once_cell;
use crate::{
camera::CameraPlugin,
mesh::MeshPlugin,
render_asset::RenderAssetPlugin,
render_graph::RenderGraph,
render_resource::{RenderPipelineCache, Shader, ShaderLoader},
renderer::render_system,
texture::ImagePlugin,
view::{ViewPlugin, WindowRenderPlugin},
};
use bevy_app::{App, AppLabel, Plugin};
use bevy_asset::{AddAsset, AssetServer};
use bevy_ecs::prelude::*;
use std::ops::{Deref, DerefMut};
use wgpu::Backends;
#[derive(Default)]
pub struct RenderPlugin;
/// The names of the default App stages
#[derive(Debug, Hash, PartialEq, Eq, Clone, StageLabel)]
pub enum RenderStage {
/// Extract data from "app world" and insert it into "render world". This step should be kept
/// as short as possible to increase the "pipelining potential" for running the next frame
/// while rendering the current frame.
Extract,
/// Prepare render resources from extracted data.
Prepare,
/// Create Bind Groups that depend on Prepare data and queue up draw calls to run during the Render stage.
Queue,
// TODO: This could probably be moved in favor of a system ordering abstraction in Render or Queue
/// Sort RenderPhases here
PhaseSort,
/// Actual rendering happens here. In most cases, only the render backend should insert resources here
Render,
/// Cleanup render resources here.
Cleanup,
}
/// The Render App World. This is only available as a resource during the Extract step.
#[derive(Default)]
pub struct RenderWorld(World);
impl Deref for RenderWorld {
type Target = World;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl DerefMut for RenderWorld {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.0
}
}
/// Label for the rendering sub-app
#[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, AppLabel)]
pub struct RenderApp;
/// A "scratch" world used to avoid allocating new worlds every frame when
// swapping out the Render World.
#[derive(Default)]
struct ScratchRenderWorld(World);
impl Plugin for RenderPlugin {
fn build(&self, app: &mut App) {
let default_backend = if cfg!(not(target_arch = "wasm32")) {
Backends::PRIMARY
} else {
Backends::GL
};
let backends = wgpu::util::backend_bits_from_env().unwrap_or(default_backend);
let instance = wgpu::Instance::new(backends);
let surface = {
let world = app.world.cell();
let windows = world.get_resource_mut::<bevy_window::Windows>().unwrap();
let raw_handle = windows.get_primary().map(|window| unsafe {
let handle = window.raw_window_handle().get_handle();
instance.create_surface(&handle)
});
raw_handle
};
let (device, queue) = futures_lite::future::block_on(renderer::initialize_renderer(
&instance,
&wgpu::RequestAdapterOptions {
power_preference: wgpu::PowerPreference::HighPerformance,
compatible_surface: surface.as_ref(),
..Default::default()
},
&wgpu::DeviceDescriptor {
features: wgpu::Features::TEXTURE_ADAPTER_SPECIFIC_FORMAT_FEATURES,
#[cfg(not(target_arch = "wasm32"))]
limits: wgpu::Limits::default(),
#[cfg(target_arch = "wasm32")]
limits: wgpu::Limits {
..wgpu::Limits::downlevel_webgl2_defaults()
},
..Default::default()
},
));
app.insert_resource(device.clone())
.insert_resource(queue.clone())
.add_asset::<Shader>()
.init_asset_loader::<ShaderLoader>()
.init_resource::<ScratchRenderWorld>();
let render_pipeline_cache = RenderPipelineCache::new(device.clone());
let asset_server = app.world.get_resource::<AssetServer>().unwrap().clone();
let mut render_app = App::empty();
let mut extract_stage =
SystemStage::parallel().with_system(RenderPipelineCache::extract_dirty_shaders);
// don't apply buffers when the stage finishes running
// extract stage runs on the app world, but the buffers are applied to the render world
extract_stage.set_apply_buffers(false);
render_app
.add_stage(RenderStage::Extract, extract_stage)
.add_stage(RenderStage::Prepare, SystemStage::parallel())
.add_stage(RenderStage::Queue, SystemStage::parallel())
.add_stage(RenderStage::PhaseSort, SystemStage::parallel())
.add_stage(
RenderStage::Render,
SystemStage::parallel()
.with_system(RenderPipelineCache::process_pipeline_queue_system)
.with_system(render_system.exclusive_system().at_end()),
)
.add_stage(RenderStage::Cleanup, SystemStage::parallel())
.insert_resource(instance)
.insert_resource(device)
.insert_resource(queue)
.insert_resource(render_pipeline_cache)
.insert_resource(asset_server)
.init_resource::<RenderGraph>();
app.add_sub_app(RenderApp, render_app, move |app_world, render_app| {
#[cfg(feature = "trace")]
let render_span = bevy_utils::tracing::info_span!("renderer subapp");
#[cfg(feature = "trace")]
let _render_guard = render_span.enter();
{
#[cfg(feature = "trace")]
let stage_span =
bevy_utils::tracing::info_span!("stage", name = "reserve_and_flush");
#[cfg(feature = "trace")]
let _stage_guard = stage_span.enter();
// reserve all existing app entities for use in render_app
// they can only be spawned using `get_or_spawn()`
let meta_len = app_world.entities().meta.len();
render_app
.world
.entities()
.reserve_entities(meta_len as u32);
// flushing as "invalid" ensures that app world entities aren't added as "empty archetype" entities by default
// these entities cannot be accessed without spawning directly onto them
// this _only_ works as expected because clear_entities() is called at the end of every frame.
render_app.world.entities_mut().flush_as_invalid();
}
{
#[cfg(feature = "trace")]
let stage_span = bevy_utils::tracing::info_span!("stage", name = "extract");
#[cfg(feature = "trace")]
let _stage_guard = stage_span.enter();
// extract
extract(app_world, render_app);
}
{
#[cfg(feature = "trace")]
let stage_span = bevy_utils::tracing::info_span!("stage", name = "prepare");
#[cfg(feature = "trace")]
let _stage_guard = stage_span.enter();
// prepare
let prepare = render_app
.schedule
.get_stage_mut::<SystemStage>(&RenderStage::Prepare)
.unwrap();
prepare.run(&mut render_app.world);
}
{
#[cfg(feature = "trace")]
let stage_span = bevy_utils::tracing::info_span!("stage", name = "queue");
#[cfg(feature = "trace")]
let _stage_guard = stage_span.enter();
// queue
let queue = render_app
.schedule
.get_stage_mut::<SystemStage>(&RenderStage::Queue)
.unwrap();
queue.run(&mut render_app.world);
}
{
#[cfg(feature = "trace")]
let stage_span = bevy_utils::tracing::info_span!("stage", name = "sort");
#[cfg(feature = "trace")]
let _stage_guard = stage_span.enter();
// phase sort
let phase_sort = render_app
.schedule
.get_stage_mut::<SystemStage>(&RenderStage::PhaseSort)
.unwrap();
phase_sort.run(&mut render_app.world);
}
{
#[cfg(feature = "trace")]
let stage_span = bevy_utils::tracing::info_span!("stage", name = "render");
#[cfg(feature = "trace")]
let _stage_guard = stage_span.enter();
// render
let render = render_app
.schedule
.get_stage_mut::<SystemStage>(&RenderStage::Render)
.unwrap();
render.run(&mut render_app.world);
}
{
#[cfg(feature = "trace")]
let stage_span = bevy_utils::tracing::info_span!("stage", name = "cleanup");
#[cfg(feature = "trace")]
let _stage_guard = stage_span.enter();
// cleanup
let cleanup = render_app
.schedule
.get_stage_mut::<SystemStage>(&RenderStage::Cleanup)
.unwrap();
cleanup.run(&mut render_app.world);
render_app.world.clear_entities();
}
});
app.add_plugin(WindowRenderPlugin)
.add_plugin(CameraPlugin)
.add_plugin(ViewPlugin)
.add_plugin(MeshPlugin)
.add_plugin(ImagePlugin)
.add_plugin(RenderAssetPlugin::<Shader>::default());
}
}
fn extract(app_world: &mut World, render_app: &mut App) {
let extract = render_app
.schedule
.get_stage_mut::<SystemStage>(&RenderStage::Extract)
.unwrap();
// temporarily add the render world to the app world as a resource
let scratch_world = app_world.remove_resource::<ScratchRenderWorld>().unwrap();
let render_world = std::mem::replace(&mut render_app.world, scratch_world.0);
app_world.insert_resource(RenderWorld(render_world));
extract.run(app_world);
// add the render world back to the render app
let render_world = app_world.remove_resource::<RenderWorld>().unwrap();
let scratch_world = std::mem::replace(&mut render_app.world, render_world.0);
app_world.insert_resource(ScratchRenderWorld(scratch_world));
extract.apply_buffers(&mut render_app.world);
}