Commit graph

19 commits

Author SHA1 Message Date
dataphract
1076a8f2b5 Document the new pipelined renderer (#3094)
This is a squash-and-rebase of @Ku95's documentation of the new renderer onto the latest `pipelined-rendering` branch.

Original PR is #2884.

Co-authored-by: dataphract <dataphract@gmail.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-11-16 03:37:48 +00:00
Robert Swain
213839f503 Add support for opaque, alpha mask, and alpha blend modes (#3072)
# Objective

Add depth prepass and support for opaque, alpha mask, and alpha blend modes for the 3D PBR target.

## Solution

NOTE: This is based on top of #2861 frustum culling. Just lining it up to keep @cart loaded with the review train. 🚂 

There are a lot of important details here. Big thanks to @cwfitzgerald of wgpu, naga, and rend3 fame for explaining how to do it properly!

* An `AlphaMode` component is added that defines whether a material should be considered opaque, an alpha mask (with a cutoff value that defaults to 0.5, the same as glTF), or transparent and should be alpha blended
* Two depth prepasses are added:
  * Opaque does a plain vertex stage
  * Alpha mask does the vertex stage but also a fragment stage that samples the colour for the fragment and discards if its alpha value is below the cutoff value
  * Both are sorted front to back, not that it matters for these passes. (Maybe there should be a way to skip sorting?)
* Three main passes are added:
  * Opaque and alpha mask passes use a depth comparison function of Equal such that only the geometry that was closest is processed further, due to early-z testing
  * The transparent pass uses the Greater depth comparison function so that only transparent objects that are closer than anything opaque are rendered
  * The opaque fragment shading is as before except that alpha is explicitly set to 1.0
  * Alpha mask fragment shading sets the alpha value to 1.0 if it is equal to or above the cutoff, as defined by glTF
  * Opaque and alpha mask are sorted front to back (again not that it matters as we will skip anything that is not equal... maybe sorting is no longer needed here?)
  * Transparent is sorted back to front. Transparent fragment shading uses the alpha blending over operator

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-11-16 03:03:27 +00:00
Robert Swain
bc5916cce7 Frustum culling (#2861)
# Objective

Implement frustum culling for much better performance on more complex scenes. With the Amazon Lumberyard Bistro scene, I was getting roughly 15fps without frustum culling and 60+fps with frustum culling on a MacBook Pro 16 with i9 9980HK 8c/16t CPU and Radeon Pro 5500M.

macOS does weird things with vsync so even though vsync was off, it really looked like sometimes other applications or the desktop window compositor were interfering, but the difference could be even more as I even saw up to 90+fps sometimes.

## Solution

- Until the https://github.com/bevyengine/rfcs/pull/12 RFC is completed, I wanted to implement at least some of the bounding volume functionality we needed to be able to unblock a bunch of rendering features and optimisations such as frustum culling, fitting the directional light orthographic projection to the relevant meshes in the view, clustered forward rendering, etc.
- I have added `Aabb`, `Frustum`, and `Sphere` types with only the necessary intersection tests for the algorithms used. I also added `CubemapFrusta` which contains a `[Frustum; 6]` and can be used by cube maps such as environment maps, and point light shadow maps.
  - I did do a bit of benchmarking and optimisation on the intersection tests. I compared the [rafx parallel-comparison bitmask approach](c91bd5fcfd/rafx-visibility/src/geometry/frustum.rs (L64-L92)) with a naïve loop that has an early-out in case of a bounding volume being outside of any one of the `Frustum` planes and found them to be very similar, so I chose the simpler and more readable option. I also compared using Vec3 and Vec3A and it turned out that promoting Vec3s to Vec3A improved performance of the culling significantly due to Vec3A operations using SIMD optimisations where Vec3 uses plain scalar operations.
- When loading glTF models, the vertex attribute accessors generally store the minimum and maximum values, which allows for adding AABBs to meshes loaded from glTF for free.
- For meshes without an AABB (`PbrBundle` deliberately does not have an AABB by default), a system is executed that scans over the vertex positions to find the minimum and maximum values along each axis. This is used to construct the AABB.
- The `Frustum::intersects_obb` and `Sphere::insersects_obb` algorithm is from Foundations of Game Engine Development 2: Rendering by Eric Lengyel. There is no OBB type, yet, rather an AABB and the model matrix are passed in as arguments. This calculates a 'relative radius' of the AABB with respect to the plane normal (the plane normal in the Sphere case being something I came up with as the direction pointing from the centre of the sphere to the centre of the AABB) such that it can then do a sphere-sphere intersection test in practice.
- `RenderLayers` were copied over from the current renderer.
- `VisibleEntities` was copied over from the current renderer and a `CubemapVisibleEntities` was added to support `PointLight`s for now. `VisibleEntities` are added to views (cameras and lights) and contain a `Vec<Entity>` that is populated by culling/visibility systems that run in PostUpdate of the app world, and are iterated over in the render world for, for example, queuing up meshes to be drawn by lights for shadow maps and the main pass for cameras.
- `Visibility` and `ComputedVisibility` components were added. The `Visibility` component is user-facing so that, for example, the entity can be marked as not visible in an editor. `ComputedVisibility` on the other hand is the result of the culling/visibility systems and takes `Visibility` into account. So if an entity is marked as not being visible in its `Visibility` component, that will skip culling/visibility intersection tests and just mark the `ComputedVisibility` as false.
- The `ComputedVisibility` is used to decide which meshes to extract.
- I had to add a way to get the far plane from the `CameraProjection` in order to define an explicit far frustum plane for culling. This should perhaps be optional as it is not always desired and in that case, testing 5 planes instead of 6 is a performance win.

I think that's about all. I discussed some of the design with @cart on Discord already so hopefully it's not too far from being mergeable. It works well at least. 😄
2021-11-07 21:45:52 +00:00
Carter Anderson
85487707ef Sprite Batching (#3060)
This implements the following:

* **Sprite Batching**: Collects sprites in a vertex buffer to draw many sprites with a single draw call. Sprites are batched by their `Handle<Image>` within a specific z-level. When possible, sprites are opportunistically batched _across_ z-levels (when no sprites with a different texture exist between two sprites with the same texture on different z levels). With these changes, I can now get ~130,000 sprites at 60fps on the `bevymark_pipelined` example.
* **Sprite Color Tints**: The `Sprite` type now has a `color` field. Non-white color tints result in a specialized render pipeline that passes the color in as a vertex attribute. I chose to specialize this because passing vertex colors has a measurable price (without colors I get ~130,000 sprites on bevymark, with colors I get ~100,000 sprites). "Colored" sprites cannot be batched with "uncolored" sprites, but I think this is fine because the chance of a "colored" sprite needing to batch with other "colored" sprites is generally probably way higher than an "uncolored" sprite needing to batch with a "colored" sprite.
* **Sprite Flipping**: Sprites can be flipped on their x or y axis using `Sprite::flip_x` and `Sprite::flip_y`. This is also true for `TextureAtlasSprite`.
* **Simpler BufferVec/UniformVec/DynamicUniformVec Clearing**:  improved the clearing interface by removing the need to know the size of the final buffer at the initial clear.

![image](https://user-images.githubusercontent.com/2694663/140001821-99be0d96-025d-489e-9bfa-ba19c1dc9548.png)


Note that this moves sprites away from entity-driven rendering and back to extracted lists. We _could_ use entities here, but it necessitates that an intermediate list is allocated / populated to collect and sort extracted sprites. This redundant copy, combined with the normal overhead of spawning extracted sprite entities, brings bevymark down to ~80,000 sprites at 60fps. I think making sprites a bit more fixed (by default) is worth it. I view this as acceptable because batching makes normal entity-driven rendering pretty useless anyway (and we would want to batch most custom materials too). We can still support custom shaders with custom bindings, we'll just need to define a specific interface for it.
2021-11-04 20:28:53 +00:00
Carter Anderson
c5af1335eb Add MSAA to new renderer (#3042)
Adds support for MSAA to the new renderer. This is done using the new [pipeline specialization](#3031) support to specialize on sample count. This is an alternative implementation to #2541 that cuts out the need for complicated render graph edge management by moving the relevant target information into View entities. This reuses @superdump's clever MSAA bitflag range code from #2541.

Note that wgpu currently only supports 1 or 4 samples due to those being the values supported by WebGPU. However they do plan on exposing ways to [enable/query for natively supported sample counts](https://github.com/gfx-rs/wgpu/issues/1832). When this happens we should integrate
2021-10-29 05:37:43 +00:00
Jakob Hellermann
432ce72faf fix window resize after wgpu 0.11 upgrade (#2953)
The fix originally got introduced in [#2858](https://github.com/bevyengine/bevy/pull/2858/files#diff-0f34eeda7ac2fe1f9e9b27de92d9290e0b360ffa6f032770aff22b5fef4eaa63R137-R143) but got lost in the upgrade to wgpu 0.11 at https://github.com/bevyengine/bevy/pull/2933
2021-10-15 23:27:58 +00:00
Carter Anderson
43e8a156fb Upgrade to wgpu 0.11 (#2933)
Upgrades both the old and new renderer to wgpu 0.11 (and naga 0.7). This builds on @zicklag's work here #2556.

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2021-10-08 19:55:24 +00:00
Robert Swain
fb33d591df Fix window size change panic (#2858)
# Objective

Using fullscreen or trying to resize a window caused a panic. Fix that.

## Solution

- Don't wholesale overwrite the ExtractedWindows resource when extracting windows
  - This could cause an accumulation of unused windows that are holding onto swap chain frames?
- Check the if width and/or height changed since the last frame
- If the size changed, recreate the swap chain
- Ensure dimensions are >= 1 to avoid panics due to any dimension being 0
2021-09-23 20:55:18 +00:00
Carter Anderson
08969a24b8 Modular Rendering (#2831)
This changes how render logic is composed to make it much more modular. Previously, all extraction logic was centralized for a given "type" of rendered thing. For example, we extracted meshes into a vector of ExtractedMesh, which contained the mesh and material asset handles, the transform, etc. We looked up bindings for "drawn things" using their index in the `Vec<ExtractedMesh>`. This worked fine for built in rendering, but made it hard to reuse logic for "custom" rendering. It also prevented us from reusing things like "extracted transforms" across contexts.

To make rendering more modular, I made a number of changes:

* Entities now drive rendering:
  * We extract "render components" from "app components" and store them _on_ entities. No more centralized uber lists! We now have true "ECS-driven rendering"
  * To make this perform well, I implemented #2673 in upstream Bevy for fast batch insertions into specific entities. This was merged into the `pipelined-rendering` branch here: #2815
* Reworked the `Draw` abstraction:
  * Generic `PhaseItems`: each draw phase can define its own type of "rendered thing", which can define its own "sort key"
  * Ported the 2d, 3d, and shadow phases to the new PhaseItem impl (currently Transparent2d, Transparent3d, and Shadow PhaseItems)
  * `Draw` trait and and `DrawFunctions` are now generic on PhaseItem
  * Modular / Ergonomic `DrawFunctions` via `RenderCommands`
    * RenderCommand is a trait that runs an ECS query and produces one or more RenderPass calls. Types implementing this trait can be composed to create a final DrawFunction. For example the DrawPbr DrawFunction is created from the following DrawCommand tuple. Const generics are used to set specific bind group locations:
        ```rust
         pub type DrawPbr = (
            SetPbrPipeline,
            SetMeshViewBindGroup<0>,
            SetStandardMaterialBindGroup<1>,
            SetTransformBindGroup<2>,
            DrawMesh,
        );
        ```
    * The new `custom_shader_pipelined` example illustrates how the commands above can be reused to create a custom draw function:
       ```rust
       type DrawCustom = (
           SetCustomMaterialPipeline,
           SetMeshViewBindGroup<0>,
           SetTransformBindGroup<2>,
           DrawMesh,
       );
       ``` 
* ExtractComponentPlugin and UniformComponentPlugin:
  * Simple, standardized ways to easily extract individual components and write them to GPU buffers
* Ported PBR and Sprite rendering to the new primitives above.
* Removed staging buffer from UniformVec in favor of direct Queue usage
  * Makes UniformVec much easier to use and more ergonomic. Completely removes the need for custom render graph nodes in these contexts (see the PbrNode and view Node removals and the much simpler call patterns in the relevant Prepare systems).
* Added a many_cubes_pipelined example to benchmark baseline 3d rendering performance and ensure there were no major regressions during this port. Avoiding regressions was challenging given that the old approach of extracting into centralized vectors is basically the "optimal" approach. However thanks to a various ECS optimizations and render logic rephrasing, we pretty much break even on this benchmark!
* Lifetimeless SystemParams: this will be a bit divisive, but as we continue to embrace "trait driven systems" (ex: ExtractComponentPlugin, UniformComponentPlugin, DrawCommand), the ergonomics of `(Query<'static, 'static, (&'static A, &'static B, &'static)>, Res<'static, C>)` were getting very hard to bear. As a compromise, I added "static type aliases" for the relevant SystemParams. The previous example can now be expressed like this: `(SQuery<(Read<A>, Read<B>)>, SRes<C>)`. If anyone has better ideas / conflicting opinions, please let me know!
* RunSystem trait: a way to define Systems via a trait with a SystemParam associated type. This is used to implement the various plugins mentioned above. I also added SystemParamItem and QueryItem type aliases to make "trait stye" ecs interactions nicer on the eyes (and fingers).
* RenderAsset retrying: ensures that render assets are only created when they are "ready" and allows us to create bind groups directly inside render assets (which significantly simplified the StandardMaterial code). I think ultimately we should swap this out on "asset dependency" events to wait for dependencies to load, but this will require significant asset system changes.
* Updated some built in shaders to account for missing MeshUniform fields
2021-09-23 06:16:11 +00:00
Robert Swain
045f324e97 Use the infinite reverse right-handed perspective projection (#2543)
# Objective

Forward perspective projections have poor floating point precision distribution over the depth range. Reverse projections fair much better, and instead of having to have a far plane, with the reverse projection, using an infinite far plane is not a problem. The infinite reverse perspective projection has become the industry standard. The renderer rework is a great time to migrate to it.

## Solution

All perspective projections, including point lights, have been moved to using `glam::Mat4::perspective_infinite_reverse_rh()` and so have no far plane. As various depth textures are shared between orthographic and perspective projections, a quirk of this PR is that the near and far planes of the orthographic projection are swapped when the Mat4 is computed. This has no impact on 2D/3D orthographic projection usage, and provides consistency in shaders, texture clear values, etc. throughout the codebase.

## Known issues

For some reason, when looking along -Z, all geometry is black. The camera can be translated up/down / strafed left/right and geometry will still be black. Moving forward/backward or rotating the camera away from looking exactly along -Z causes everything to work as expected.

I have tried to debug this issue but both in macOS and Windows I get crashes when doing pixel debugging. If anyone could reproduce this and debug it I would be very grateful. Otherwise I will have to try to debug it further without pixel debugging, though the projections and such all looked fine to me.
2021-08-27 20:15:09 +00:00
Carter Anderson
9898469e9e Sub app label changes (#2717)
Makes some tweaks to the SubApp labeling introduced in #2695:

* Ergonomics improvements
* Removes unnecessary allocation when retrieving subapp label
* Removes the newly added "app macros" crate in favor of bevy_derive
* renamed RenderSubApp to RenderApp

@zicklag (for reference)
2021-08-24 06:37:28 +00:00
Zicklag
e290a7e29c Implement Sub-App Labels (#2695)
This is a rather simple but wide change, and it involves adding a new `bevy_app_macros` crate. Let me know if there is a better way to do any of this!

---

# Objective

- Allow adding and accessing sub-apps by using a label instead of an index

## Solution

- Migrate the bevy label implementation and derive code to the `bevy_utils` and `bevy_macro_utils` crates and then add a new `SubAppLabel` trait to the `bevy_app` crate that is used when adding or getting a sub-app from an app.
2021-08-24 00:31:21 +00:00
Carter Anderson
2e99d84cdc remove .system from pipelined code (#2538)
Now that we have main features, lets use them!
2021-07-26 23:44:23 +00:00
Carter Anderson
bc769d9641 omni light -> point light 2021-07-24 16:43:37 -07:00
bilsen
e1e4055a51 Allows resizing of windows (#8) 2021-07-24 16:43:37 -07:00
TheRawMeatball
7792b29aa4 Force main thread for prepare_windows system (#11)
Force main thread for prepare_windows system
2021-07-24 16:43:37 -07:00
Carter Anderson
25de2d1819 Port Mesh to RenderAsset, add Slab and FrameSlabMap garbage collection for Bind Groups 2021-07-24 16:43:37 -07:00
Carter Anderson
13ca00178a bevy_render now uses wgpu directly 2021-07-24 16:43:37 -07:00
Carter Anderson
3400fb4e61 SubGraphs, Views, Shadows, and more 2021-07-24 16:43:37 -07:00