Commit graph

154 commits

Author SHA1 Message Date
Robert Swain
c2a9d5843d Faster assign lights to clusters (#4345)
# Objective

- Fixes #4234
- Fixes #4473 
- Built on top of #3989
- Improve performance of `assign_lights_to_clusters`

## Solution

- Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs
- Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf
- Adapt to also support orthographic projections
- Add `many_lights -- orthographic` for testing many lights using an orthographic projection

## Results

- `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system.

---

## Changelog

- Changed: Improved cluster light assignment performance

Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-04-15 02:53:20 +00:00
Alice Cecile
c747cc526b Group stress test examples (#4289)
# Objective

- Several examples are useful for qualitative tests of Bevy's performance
- By contrast, these are less useful for learning material: they are often relatively complex and have large amounts of setup and are performance optimized.

## Solution

- Move bevymark, many_sprites and many_cubes into the new stress_tests example folder
- Move contributors into the games folder: unlike the remaining examples in the 2d folder, it is not focused on demonstrating a clear feature.
2022-04-10 02:05:21 +00:00
Robert Swain
c5963b4fd5 Use storage buffers for clustered forward point lights (#3989)
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2022-04-07 16:16:35 +00:00
Alex
54fbaf4b4c Add transform hierarchy stress test (#4170)
## Objective

There recently was a discussion on Discord about a possible test case for stress-testing transform hierarchies.

## Solution

Create a test case for stress testing transform propagation.

*Edit:* I have scrapped my previous example and built something more functional and less focused on visuals.

There are three test setups:

- `TestCase::Tree` recursively creates a tree with a specified depth and branch width
- `TestCase::NonUniformTree` is the same as `Tree` but omits nodes in a way that makes the tree "lean" towards one side, like this:
  <details>
  <summary></summary>

  ![image](https://user-images.githubusercontent.com/3957610/158069737-2ddf4e4a-7d5c-4ee5-8566-424a54a06723.png)
  </details>
- `TestCase::Humanoids` creates one or more separate hierarchies based on the structure of common humanoid rigs
  - this can both insert `active` and `inactive` instances of the human rig

It's possible to parameterize which parts of the hierarchy get updated (transform change) and which remain unchanged. This is based on @james7132 suggestion:
There's a probability to decide which entities should remain static. On top of that these changes can be limited to a certain range in the hierarchy (min_depth..max_depth).
2022-03-21 20:36:46 +00:00