bevy/crates/bevy_core_pipeline at 3af8526786d4b9596d3b425b0f8310d4b994267f - Mirrors/bevy

mirror of https://github.com/bevyengine/bevy synced 2024-11-22 12:43:34 +00:00

History

JMS55 f4dab8a4e8 Multithreaded render command encoding (#9172 ) # Objective - Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a `wgpu::CommandEncoder` is very expensive, and takes a long time. - To improve performance, we want to perform the command encoding for these heavy passes in parallel. ## Solution - `RenderContext` can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called. - When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the `ComputeTaskPool` to produce their corresponding command buffers. - The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution. ## Nodes Parallelized - `MainOpaquePass3dNode` - `PrepassNode` - `DeferredGBufferPrepassNode` - `ShadowPassNode` (One task per view) ## Future Work - For large number of draws calls, might be worth further subdividing passes into 2+ tasks. - Extend this to UI, 2d, transparent, and transmissive nodes? - Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items. - All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run. - There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier. ## Performance Improvement Thanks to @Elabajaba for testing on Bistro. ![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f) TLDR: Without shadow mapping, this PR has no impact. _With_ shadow mapping, this PR gives ~40 more fps than main. --- ## Changelog - `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`, and each shadow map within `ShadowPassNode` are now encoded in parallel, giving _greatly_ increased CPU performance, mainly when shadow mapping is enabled. - Does not work on WASM or AMD+Windows+Vulkan. - Added `RenderContext::add_command_buffer_generation_task()`. - `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. ## Migration Guide `RenderContext::new()` now takes adapter info - Some render graph and Node related types and methods now have additional lifetime constraints. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: François <mockersf@gmail.com>	2024-02-09 07:35:35 +00:00
..
src	Multithreaded render command encoding (#9172 )	2024-02-09 07:35:35 +00:00
Cargo.toml	Update to wgpu 0.19 and raw-window-handle 0.6 (#11280 )	2024-01-26 18:14:21 +00:00

Multithreaded render command encoding (#9172 )

# Objective
- Encoding many GPU commands (such as in a renderpass with many draws,
such as the main opaque pass) onto a `wgpu::CommandEncoder` is very
expensive, and takes a long time.
- To improve performance, we want to perform the command encoding for
these heavy passes in parallel.

## Solution
- `RenderContext` can now queue up "command buffer generation tasks"
which are closures that will generate a command buffer when called.
- When finalizing the render context to produce the final list of
command buffers, these tasks are run in parallel on the
`ComputeTaskPool` to produce their corresponding command buffers.
- The general idea is that the node graph will run in serial, but in a
node, instead of doing rendering work, you can add tasks to do render
work in parallel with other node's tasks that get ran at the end of the
graph execution.

## Nodes Parallelized
- `MainOpaquePass3dNode`
- `PrepassNode`
- `DeferredGBufferPrepassNode`
- `ShadowPassNode` (One task per view)

## Future Work
- For large number of draws calls, might be worth further subdividing
passes into 2+ tasks.
- Extend this to UI, 2d, transparent, and transmissive nodes?
- Needs testing - small command buffers are inefficient - it may be
worth reverting to the serial command encoder usage for render phases
with few items.
- All "serial" (traditional) rendering work must finish before parallel
rendering tasks (the new stuff) can start to run.
- There is still only one submission to the graphics queue at the end of
the graph execution. There is still no ability to submit work earlier.

## Performance Improvement
Thanks to @Elabajaba for testing on Bistro.

![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f)

TLDR: Without shadow mapping, this PR has no impact. _With_ shadow
mapping, this PR gives **~40 more fps** than main.

---

## Changelog
- `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`,
and each shadow map within `ShadowPassNode` are now encoded in parallel,
giving _greatly_ increased CPU performance, mainly when shadow mapping
is enabled.
- Does not work on WASM or AMD+Windows+Vulkan.
- Added `RenderContext::add_command_buffer_generation_task()`.
- `RenderContext::new()` now takes adapter info
- Some render graph and Node related types and methods now have
additional lifetime constraints.

## Migration Guide
`RenderContext::new()` now takes adapter info
- Some render graph and Node related types and methods now have
additional lifetime constraints.

---------

Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
Co-authored-by: François <mockersf@gmail.com>

2024-02-09 07:35:35 +00:00

src

Multithreaded render command encoding (#9172 )

2024-02-09 07:35:35 +00:00

Cargo.toml

Update to wgpu 0.19 and raw-window-handle 0.6 (#11280 )

2024-01-26 18:14:21 +00:00