Commit graph

1226 commits

Author SHA1 Message Date
Patrick Walton
16531fb3e3
Implement GPU frustum culling. (#12889)
This commit implements opt-in GPU frustum culling, built on top of the
infrastructure in https://github.com/bevyengine/bevy/pull/12773. To
enable it on a camera, add the `GpuCulling` component to it. To
additionally disable CPU frustum culling, add the `NoCpuCulling`
component. Note that adding `GpuCulling` without `NoCpuCulling`
*currently* does nothing useful. The reason why `GpuCulling` doesn't
automatically imply `NoCpuCulling` is that I intend to follow this patch
up with GPU two-phase occlusion culling, and CPU frustum culling plus
GPU occlusion culling seems like a very commonly-desired mode.

Adding the `GpuCulling` component to a view puts that view into
*indirect mode*. This mode makes all drawcalls indirect, relying on the
mesh preprocessing shader to allocate instances dynamically. In indirect
mode, the `PreprocessWorkItem` `output_index` points not to a
`MeshUniform` instance slot but instead to a set of `wgpu`
`IndirectParameters`, from which it allocates an instance slot
dynamically if frustum culling succeeds. Batch building has been updated
to allocate and track indirect parameter slots, and the AABBs are now
supplied to the GPU as `MeshCullingData`.

A small amount of code relating to the frustum culling has been borrowed
from meshlets and moved into `maths.wgsl`. Note that standard Bevy
frustum culling uses AABBs, while meshlets use bounding spheres; this
means that not as much code can be shared as one might think.

This patch doesn't provide any way to perform GPU culling on shadow
maps, to avoid making this patch bigger than it already is. That can be
a followup.

## Changelog

### Added

* Frustum culling can now optionally be done on the GPU. To enable it,
add the `GpuCulling` component to a camera.
* To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note
that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
Aevyrie
4b446c020e
Add error when extract resource build fails (#4964)
# Objective

- Provide feedback when an extraction plugin fails to add its system.

I had some troubleshooting pain when this happened to me, as the panic
only tells you a resource is missing. This PR adds an error when the
ExtractResource plugin is added before the render world exists, instead
of silently failing.


![image](https://user-images.githubusercontent.com/2632925/172491993-673d9351-215a-4f30-96f7-af239c44686a.png)
2024-04-28 05:20:59 +00:00
François Mockers
22d605c8df
asset throttling: don't be exhausted if there is no limit (#13112)
# Objective

- Since #12622 example `compute_shader_game_of_life` crashes
```
thread 'Compute Task Pool (2)' panicked at examples/shader/compute_shader_game_of_life.rs:137:65:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Encountered a panic in system `compute_shader_game_of_life::prepare_bind_group`!
thread '<unnamed>' panicked at examples/shader/compute_shader_game_of_life.rs:254:34:
Requested resource compute_shader_game_of_life::GameOfLifeImageBindGroups does not exist in the `World`.
                Did you forget to add it using `app.insert_resource` / `app.init_resource`?
                Resources are also implicitly added via `app.add_event`,
                and can be added by plugins.
Encountered a panic in system `bevy_render::renderer::render_system`!
```

## Solution

- `exhausted()` now checks that there is a limit
2024-04-27 09:00:10 +00:00
Doonv
de9dc9c204
Fix CameraProjection panic and improve CameraProjectionPlugin (#11808)
# Objective

Fix https://github.com/bevyengine/bevy/issues/11799 and improve
`CameraProjectionPlugin`

## Solution

`CameraProjectionPlugin` is now an all-in-one plugin for adding a custom
`CameraProjection`. I also added `PbrProjectionPlugin` which is like
`CameraProjectionPlugin` but for PBR.

P.S. I'd like to get this merged after
https://github.com/bevyengine/bevy/pull/11766.

---

## Changelog

- Changed `CameraProjectionPlugin` to be an all-in-one plugin for adding
a `CameraProjection`
- Removed `VisibilitySystems::{UpdateOrthographicFrusta,
UpdatePerspectiveFrusta, UpdateProjectionFrusta}`, now replaced with
`VisibilitySystems::UpdateFrusta`
- Added `PbrProjectionPlugin` for projection-specific PBR functionality.

## Migration Guide

`VisibilitySystems`'s `UpdateOrthographicFrusta`,
`UpdatePerspectiveFrusta`, and `UpdateProjectionFrusta` variants were
removed, they were replaced with `VisibilitySystems::UpdateFrusta`
2024-04-26 23:52:09 +00:00
robtfm
91a393a9e2
Throttle render assets (#12622)
# Objective

allow throttling of gpu uploads to prevent choppy framerate when many
textures/meshes are loaded in.

## Solution

- `RenderAsset`s can implement `byte_len()` which reports their size.
implemented this for `Mesh` and `Image`
- users can add a `RenderAssetBytesPerFrame` which specifies max bytes
to attempt to upload in a frame
- `render_assets::<A>` checks how many bytes have been written before
attempting to upload assets. the limit is a soft cap: assets will be
written until the total has exceeded the cap, to ensure some forward
progress every frame

notes:
- this is a stopgap until we have multiple wgpu queues for proper
streaming of data
- requires #12606

issues
- ~~fonts sometimes only partially upload. i have no clue why, needs to
be fixed~~ fixed now.
- choosing the #bytes is tricky as it should be hardware / framerate
dependent
- many features are not tested (env maps, light probes, etc) - they
won't break unless `RenderAssetBytesPerFrame` is explicitly used though

---------

Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-04-26 23:43:33 +00:00
Aevyrie
ade70b3925
Per-Object Motion Blur (#9924)
https://github.com/bevyengine/bevy/assets/2632925/e046205e-3317-47c3-9959-fc94c529f7e0

# Objective

- Adds per-object motion blur to the core 3d pipeline. This is a common
effect used in games and other simulations.
- Partially resolves #4710

## Solution

- This is a post-process effect that uses the depth and motion vector
buffers to estimate per-object motion blur. The implementation is
combined from knowledge from multiple papers and articles. The approach
itself, and the shader are quite simple. Most of the effort was in
wiring up the bevy rendering plumbing, and properly specializing for HDR
and MSAA.
- To work with MSAA, the MULTISAMPLED_SHADING wgpu capability is
required. I've extracted this code from #9000. This is because the
prepass buffers are multisampled, and require accessing with
`textureLoad` as opposed to the widely compatible `textureSample`.
- Added an example to demonstrate the effect of motion blur parameters.

## Future Improvements

- While this approach does have limitations, it's one of the most
commonly used, and is much better than camera motion blur, which does
not consider object velocity. For example, this implementation allows a
dolly to track an object, and that object will remain unblurred while
the background is blurred. The biggest issue with this implementation is
that blur is constrained to the boundaries of objects which results in
hard edges. There are solutions to this by either dilating the object or
the motion vector buffer, or by taking a different approach such as
https://casual-effects.com/research/McGuire2012Blur/index.html
- I'm using a noise PRNG function to jitter samples. This could be
replaced with a blue noise texture lookup or similar, however after
playing with the parameters, it gives quite nice results with 4 samples,
and is significantly better than the artifacts generated when not
jittering.

---

## Changelog

- Added: per-object motion blur. This can be enabled and configured by
adding the `MotionBlurBundle` to a camera entity.

---------

Co-authored-by: Torstein Grindvik <52322338+torsteingrindvik@users.noreply.github.com>
2024-04-25 01:16:02 +00:00
re0312
0f27500e46
Improve par_iter and Parallel (#12904)
# Objective

- bevy usually use `Parallel::scope` to collect items from `par_iter`,
but `scope` will be called with every satifified items. it will cause a
lot of unnecessary lookup.

## Solution

- similar to Rayon ,we introduce `for_each_init` for `par_iter` which
only be invoked when spawn a task for a group of items.

---

## Changelog

- added  `for_each_init`

## Performance
`check_visibility `  in  `many_foxes ` 

![image](https://github.com/bevyengine/bevy/assets/45868716/030c41cf-0d2f-4a36-a071-35097d93e494)
 
~40% performance gain in `check_visibility`.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2024-04-23 12:05:34 +00:00
Brezak
de875fdc4c
Make AppExit more specific about exit reason. (#13022)
# Objective

Closes #13017.

## Solution

- Make `AppExit` a enum with a `Success` and `Error` variant.
- Make `App::run()` return a `AppExit` if it ever returns.
- Make app runners return a `AppExit` to signal if they encountered a
error.

---

## Changelog

### Added

- [`App::should_exit`](https://example.org/)
- [`AppExit`](https://docs.rs/bevy/latest/bevy/app/struct.AppExit.html)
to the `bevy` and `bevy_app` preludes,

### Changed

- [`AppExit`](https://docs.rs/bevy/latest/bevy/app/struct.AppExit.html)
is now a enum with 2 variants (`Success` and `Error`).
- The app's [runner
function](https://docs.rs/bevy/latest/bevy/app/struct.App.html#method.set_runner)
now has to return a `AppExit`.
-
[`App::run()`](https://docs.rs/bevy/latest/bevy/app/struct.App.html#method.run)
now also returns the `AppExit` produced by the runner function.


## Migration Guide

- Replace all usages of
[`AppExit`](https://docs.rs/bevy/latest/bevy/app/struct.AppExit.html)
with `AppExit::Success` or `AppExit::Failure`.
- Any custom app runners now need to return a `AppExit`. We suggest you
return a `AppExit::Error` if any `AppExit` raised was a Error. You can
use the new [`App::should_exit`](https://example.org/) method.
- If not exiting from `main` any other way. You should return the
`AppExit` from `App::run()` so the app correctly returns a error code if
anything fails e.g.
```rust
fn main() -> AppExit {
    App::new()
        //Your setup here...
        .run()
}
```

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-04-22 16:48:18 +00:00
François Mockers
c40b485095
use a u64 for MeshPipelineKey (#13015)
# Objective

- `MeshPipelineKey` use some bits for two things
- First commit in this PR adds an assertion that doesn't work currently
on main
- This leads to some mesh topology not working anymore, for example
`LineStrip`
- With examples `lines`, there should be two groups of lines, the blue
one doesn't display currently

## Solution

- Change the `MeshPipelineKey` to be backed by a `u64` instead, to have
enough bits
2024-04-21 20:01:45 +00:00
BD103
b3d3daad5a
Fix Clippy lints on WASM (#13030)
# Objective

- Fixes #13024.

## Solution

- Run `cargo clippy --target wasm32-unknown-unknown` until there are no
more errors.
  - I recommend reviewing one commit at a time :)

---

## Changelog

- Fixed Clippy lints for `wasm32-unknown-unknown` target.
- Updated `bevy_transform`'s `README.md`.
2024-04-20 09:15:42 +00:00
Kanabenki
1df41b79dd
Expose desired_maximum_frame_latency through window creation (#12954)
# Objective

- Closes #12930.

## Solution

- Add a corresponding optional field on `Window` and `ExtractedWindow`

---

## Changelog

### Added

- `wgpu`'s `desired_maximum_frame_latency` is exposed through window
creation. This can be used to override the default maximum number of
queued frames on the GPU (currently 2).

## Migration Guide

- The `desired_maximum_frame_latency` field must be added to instances
of `Window` and `ExtractedWindow` where all fields are explicitly
specified.
2024-04-19 23:09:30 +00:00
Brezak
f68bc01544
Run CheckVisibility after all the other visibility system sets have… (#12962)
# Objective

Make visibility system ordering explicit. Fixes #12953.

## Solution

Specify `CheckVisibility` happens after all other `VisibilitySystems`
sets have happened.

---------

Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
2024-04-18 20:33:29 +00:00
andristarr
2b3e3341d6
separating finite and infinite 3d planes (#12426)
# Objective

Fixes #12388

## Solution

- Removing the plane3d and adding rect3d primitive mesh
2024-04-18 14:13:22 +00:00
Victor
11afe16079
Fix extensionless image loading panic (#13005)
Remake of #12938 targeting main
2024-04-17 15:13:33 +00:00
Brezak
368c5cef1a
Implement clone for most bundles. (#12993)
# Objective

Closes #12985.

## Solution

- Derive clone for most types with bundle in their name.
- Bundle types missing clone:
-
[`TextBundle`](https://docs.rs/bevy/latest/bevy/prelude/struct.TextBundle.html)
(Contains
[`ContentSize`](https://docs.rs/bevy/latest/bevy/ui/struct.ContentSize.html)
which can't be cloned because it itself contains a `Option<MeasureFunc>`
where
[`MeasureFunc`](https://docs.rs/taffy/0.3.18/taffy/node/enum.MeasureFunc.html)
isn't clone)
-
[`ImageBundle`](https://docs.rs/bevy/latest/bevy/prelude/struct.ImageBundle.html)
(Same as `TextBundle`)
-
[`AtlasImageBundle`](https://docs.rs/bevy/latest/bevy/prelude/struct.AtlasImageBundle.html)
(Will be deprecated in 0.14 there's no point)
2024-04-16 16:37:09 +00:00
BD103
7b8d502083
Fix beta lints (#12980)
# Objective

- Fixes #12976

## Solution

This one is a doozy.

- Run `cargo +beta clippy --workspace --all-targets --all-features` and
fix all issues
- This includes:
- Moving inner attributes to be outer attributes, when the item in
question has both inner and outer attributes
  - Use `ptr::from_ref` in more scenarios
- Extend the valid idents list used by `clippy:doc_markdown` with more
names
  - Use `Clone::clone_from` when possible
  - Remove redundant `ron` import
  - Add backticks to **so many** identifiers and items
    - I'm sorry whoever has to review this

---

## Changelog

- Added links to more identifiers in documentation.
2024-04-16 02:46:46 +00:00
Patrick Walton
1141e731ff
Implement alpha to coverage (A2C) support. (#12970)
[Alpha to coverage] (A2C) replaces alpha blending with a
hardware-specific multisample coverage mask when multisample
antialiasing is in use. It's a simple form of [order-independent
transparency] that relies on MSAA. ["Anti-aliased Alpha Test: The
Esoteric Alpha To Coverage"] is a good summary of the motivation for and
best practices relating to A2C.

This commit implements alpha to coverage support as a new variant for
`AlphaMode`. You can supply `AlphaMode::AlphaToCoverage` as the
`alpha_mode` field in `StandardMaterial` to use it. When in use, the
standard material shader automatically applies the texture filtering
method from ["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"].
Objects with alpha-to-coverage materials are binned in the opaque pass,
as they're fully order-independent.

The `transparency_3d` example has been updated to feature an object with
alpha to coverage. Happily, the example was already using MSAA.

This is part of #2223, as far as I can tell.

[Alpha to coverage]: https://en.wikipedia.org/wiki/Alpha_to_coverage

[order-independent transparency]:
https://en.wikipedia.org/wiki/Order-independent_transparency

["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"]:
https://bgolus.medium.com/anti-aliased-alpha-test-the-esoteric-alpha-to-coverage-8b177335ae4f

---

## Changelog

### Added

* The `AlphaMode` enum now supports `AlphaToCoverage`, to provide
limited order-independent transparency when multisample antialiasing is
in use.
2024-04-15 20:37:52 +00:00
Robert Swain
5f05e75a70
Fix 2D BatchedInstanceBuffer clear (#12922)
# Objective

- `cargo run --release --example bevymark -- --benchmark --waves 160
--per-wave 1000 --mode mesh2d` runs slower and slower over time due to
`no_gpu_preprocessing::write_batched_instance_buffer<bevy_sprite::mesh2d::mesh::Mesh2dPipeline>`
taking longer and longer because the `BatchedInstanceBuffer` is not
cleared

## Solution

- Split the `clear_batched_instance_buffers` system into CPU and GPU
versions
- Use the CPU version for 2D meshes
2024-04-15 05:00:43 +00:00
Hexorg
7a9a459a40
Fixed crash when transcoding one- or two-channel KTX2 textures (#12629)
# Objective

Fixes a crash when transcoding one- or two-channel KTX2 textures

## Solution

transcoded array has been pre-allocated up to levels.len using a macros.
Rgb8 transcoding already uses that and addresses transcoded array by an
index. R8UnormSrgb and Rg8UnormSrgb were pushing on top of the
transcoded vec, resulting in first levels.len() vectors to stay empty,
and second levels.len() levels actually being transcoded, which then
resulted in out of bounds read when copying levels to gpu
2024-04-14 14:40:10 +00:00
BD103
aa2ebbb43f
Fix some nightly Clippy lints (#12927)
# Objective

- I daily drive nightly Rust when developing Bevy, so I notice when new
warnings are raised by `cargo check` and Clippy.
- `cargo +nightly clippy` raises a few of these new warnings.

## Solution

- Fix most warnings from `cargo +nightly clippy`
- I skipped the docs-related warnings because some were covered by
#12692.
- Use `Clone::clone_from` in applicable scenarios, which can sometimes
avoid an extra allocation.
- Implement `Default` for structs that have a `pub const fn new() ->
Self` method.
- Fix an occurrence where generic constraints were defined in both `<C:
Trait>` and `where C: Trait`.
  - Removed generic constraints that were implied by the `Bundle` trait.

---

## Changelog

- `BatchingStrategy`, `NonGenericTypeCell`, and `GenericTypeCell` now
implement `Default`.
2024-04-13 02:05:38 +00:00
Patrick Walton
5caf085dac
Divide the single VisibleEntities list into separate lists for 2D meshes, 3D meshes, lights, and UI elements, for performance. (#12582)
This commit splits `VisibleEntities::entities` into four separate lists:
one for lights, one for 2D meshes, one for 3D meshes, and one for UI
elements. This allows `queue_material_meshes` and similar methods to
avoid examining entities that are obviously irrelevant. In particular,
this separation helps scenes with many skinned meshes, as the individual
bones are considered visible entities but have no rendered appearance.

Internally, `VisibleEntities::entities` is a `HashMap` from the `TypeId`
representing a `QueryFilter` to the appropriate `Entity` list. I had to
do this because `VisibleEntities` is located within an upstream crate
from the crates that provide lights (`bevy_pbr`) and 2D meshes
(`bevy_sprite`). As an added benefit, this setup allows apps to provide
their own types of renderable components, by simply adding a specialized
`check_visibility` to the schedule.

This provides a 16.23% end-to-end speedup on `many_foxes` with 10,000
foxes (24.06 ms/frame to 20.70 ms/frame).

## Migration guide

* `check_visibility` and `VisibleEntities` now store the four types of
renderable entities--2D meshes, 3D meshes, lights, and UI
elements--separately. If your custom rendering code examines
`VisibleEntities`, it will now need to specify which type of entity it's
interested in using the `WithMesh2d`, `WithMesh`, `WithLight`, and
`WithNode` types respectively. If your app introduces a new type of
renderable entity, you'll need to add an explicit call to
`check_visibility` to the schedule to accommodate your new component or
components.

## Analysis

`many_foxes`, 10,000 foxes: `main`:
![Screenshot 2024-03-31
114444](https://github.com/bevyengine/bevy/assets/157897/16ecb2ff-6e04-46c0-a4b0-b2fde2084bad)

`many_foxes`, 10,000 foxes, this branch:
![Screenshot 2024-03-31
114256](https://github.com/bevyengine/bevy/assets/157897/94dedae4-bd00-45b2-9aaf-dfc237004ddb)

`queue_material_meshes` (yellow = this branch, red = `main`):
![Screenshot 2024-03-31
114637](https://github.com/bevyengine/bevy/assets/157897/f90912bd-45bd-42c4-bd74-57d98a0f036e)

`queue_shadows` (yellow = this branch, red = `main`):
![Screenshot 2024-03-31
114607](https://github.com/bevyengine/bevy/assets/157897/6ce693e3-20c0-4234-8ec9-a6f191299e2d)
2024-04-11 20:33:20 +00:00
BD103
5c3ae32ab1
Enable clippy::ref_as_ptr (#12918)
# Objective

-
[`clippy::ref_as_ptr`](https://rust-lang.github.io/rust-clippy/master/index.html#/ref_as_ptr)
prevents you from directly casting references to pointers, requiring you
to use `std::ptr::from_ref` instead. This prevents you from accidentally
converting an immutable reference into a mutable pointer (`&x as *mut
T`).
- Follow up to #11818, now that our [`rust-version` is
1.77](11817f4ba4/Cargo.toml (L14)).

## Solution

- Enable lint and fix all warnings.
2024-04-10 20:16:48 +00:00
Patrick Walton
d59b1e71ef
Implement percentage-closer filtering (PCF) for point lights. (#12910)
I ported the two existing PCF techniques to the cubemap domain as best I
could. Generally, the technique is to create a 2D orthonormal basis
using Gram-Schmidt normalization, then apply the technique over that
basis. The results look fine, though the shadow bias often needs
adjusting.

For comparison, Unity uses a 4-tap pattern for PCF on point lights of
(1, 1, 1), (-1, -1, 1), (-1, 1, -1), (1, -1, -1). I tried this but
didn't like the look, so I went with the design above, which ports the
2D techniques to the 3D domain. There's surprisingly little material on
point light PCF.

I've gone through every example using point lights and verified that the
shadow maps look fine, adjusting biases as necessary.

Fixes #3628.

---

## Changelog

### Added
* Shadows from point lights now support percentage-closer filtering
(PCF), and as a result look less aliased.

### Changed
* `ShadowFilteringMethod::Castano13` and
`ShadowFilteringMethod::Jimenez14` have been renamed to
`ShadowFilteringMethod::Gaussian` and `ShadowFilteringMethod::Temporal`
respectively.

## Migration Guide

* `ShadowFilteringMethod::Castano13` and
`ShadowFilteringMethod::Jimenez14` have been renamed to
`ShadowFilteringMethod::Gaussian` and `ShadowFilteringMethod::Temporal`
respectively.
2024-04-10 20:16:08 +00:00
Patrick Walton
11817f4ba4
Generate MeshUniforms on the GPU via compute shader where available. (#12773)
Currently, `MeshUniform`s are rather large: 160 bytes. They're also
somewhat expensive to compute, because they involve taking the inverse
of a 3x4 matrix. Finally, if a mesh is present in multiple views, that
mesh will have a separate `MeshUniform` for each and every view, which
is wasteful.

This commit fixes these issues by introducing the concept of a *mesh
input uniform* and adding a *mesh uniform building* compute shader pass.
The `MeshInputUniform` is simply the minimum amount of data needed for
the GPU to compute the full `MeshUniform`. Most of this data is just the
transform and is therefore only 64 bytes. `MeshInputUniform`s are
computed during the *extraction* phase, much like skins are today, in
order to avoid needlessly copying transforms around on CPU. (In fact,
the render app has been changed to only store the translation of each
mesh; it no longer cares about any other part of the transform, which is
stored only on the GPU and the main world.) Before rendering, the
`build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the
full `MeshUniform`.

The mesh uniform building pass does the following, all on GPU:

1. Copy the appropriate fields of the `MeshInputUniform` to the
`MeshUniform` slot. If a single mesh is present in multiple views, this
effectively duplicates it into each view.

2. Compute the inverse transpose of the model transform, used for
transforming normals.

3. If applicable, copy the mesh's transform from the previous frame for
TAA. To support this, we double-buffer the `MeshInputUniform`s over two
frames and swap the buffers each frame. The `MeshInputUniform`s for the
current frame contain the index of that mesh's `MeshInputUniform` for
the previous frame.

This commit produces wins in virtually every CPU part of the pipeline:
`extract_meshes`, `queue_material_meshes`,
`batch_and_prepare_render_phase`, and especially
`write_batched_instance_buffer` are all faster. Shrinking the amount of
CPU data that has to be shuffled around speeds up the entire rendering
process.

| Benchmark              | This branch | `main`  | Speedup |
|------------------------|-------------|---------|---------|
| `many_cubes -nfc`      |      17.259 |  24.529 |  42.12% |
| `many_cubes -nfc -vpi` |     302.116 | 312.123 |   3.31% |
| `many_foxes`           |       3.227 |   3.515 |   8.92% |

Because mesh uniform building requires compute shader, and WebGL 2 has
no compute shader, the existing CPU mesh uniform building code has been
left as-is. Many types now have both CPU mesh uniform building and GPU
mesh uniform building modes. Developers can opt into the old CPU mesh
uniform building by setting the `use_gpu_uniform_builder` option on
`PbrPlugin` to `false`.

Below are graphs of the CPU portions of `many-cubes
--no-frustum-culling`. Yellow is this branch, red is `main`.

`extract_meshes`:
![Screenshot 2024-04-02
124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10)
It's notable that we get a small win even though we're now writing to a
GPU buffer.

`queue_material_meshes`:
![Screenshot 2024-04-02
124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94)
There's a bit of a regression here; not sure what's causing it. In any
case it's very outweighed by the other gains.

`batch_and_prepare_render_phase`:
![Screenshot 2024-04-02
125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435)
There's a huge win here, enough to make batching basically drop off the
profile.

`write_batched_instance_buffer`:
![Screenshot 2024-04-02
125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2)
There's a massive improvement here, as expected. Note that a lot of it
simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't
a maintainability problem in my view because `MeshInputUniform` is so
simple: just 16 tightly-packed words.)

## Changelog

### Added

* Per-mesh instance data is now generated on GPU with a compute shader
instead of CPU, resulting in rendering performance improvements on
platforms where compute shaders are supported.

## Migration guide

* Custom render phases now need multiple systems beyond just
`batch_and_prepare_render_phase`. Code that was previously creating
custom render phases should now add a `BinnedRenderPhasePlugin` or
`SortedRenderPhasePlugin` as appropriate instead of directly adding
`batch_and_prepare_render_phase`.
2024-04-10 05:33:32 +00:00
Robert Swain
ab7cbfa8fc
Consolidate Render(Ui)Materials(2d) into RenderAssets (#12827)
# Objective

- Replace `RenderMaterials` / `RenderMaterials2d` / `RenderUiMaterials`
with `RenderAssets` to enable implementing changes to one thing,
`RenderAssets`, that applies to all use cases rather than duplicating
changes everywhere for multiple things that should be one thing.
- Adopts #8149 

## Solution

- Make RenderAsset generic over the destination type rather than the
source type as in #8149
- Use `RenderAssets<PreparedMaterial<M>>` etc for render materials

---

## Changelog

- Changed:
- The `RenderAsset` trait is now implemented on the destination type.
Its `SourceAsset` associated type refers to the type of the source
asset.
- `RenderMaterials`, `RenderMaterials2d`, and `RenderUiMaterials` have
been replaced by `RenderAssets<PreparedMaterial<M>>` and similar.

## Migration Guide

- `RenderAsset` is now implemented for the destination type rather that
the source asset type. The source asset type is now the `RenderAsset`
trait's `SourceAsset` associated type.
2024-04-09 13:26:34 +00:00
Matty
956604e4c7
Meshing for Triangle3d primitive (#12686)
# Objective

- Ongoing work for #10572 
- Implement the `Meshable` trait for `Triangle3d`, allowing 3d triangle
primitives to produce meshes.

## Solution

The `Meshable` trait for `Triangle3d` directly produces a `Mesh`, much
like that of `Triangle2d`. The mesh consists only of a single triangle
(the triangle itself), and its vertex data consists of:
- Vertex positions, which are the triangle's vertices themselves (i.e.
the triangle provides its own coordinates in mesh space directly)
- Normals, which are all the normal of the triangle itself
- Indices, which are directly inferred from the vertex order (note that
this is slightly different than `Triangle2d` which, because of its lower
dimension, has an orientation which can be corrected for so that it
always faces "the right way")
- UV coordinates, which are produced as follows:
1. The first coordinate is coincident with the `ab` direction of the
triangle.
2. The second coordinate maps to be perpendicular to the first in mesh
space, so that the UV-mapping is skew-free.
3. The UV-coordinates map to the smallest rectangle possible containing
the triangle, given the preceding constraints.

Here is a visual demonstration; here, the `ab` direction of the triangle
is horizontal, left to right — the point `c` moves, expanding the
bounding rectangle of the triangle when it pushes past `a` or `b`:

<img width="1440" alt="Screenshot 2024-03-23 at 5 36 01 PM"
src="https://github.com/bevyengine/bevy/assets/2975848/bef4d786-7b82-4207-abd4-ac4557d0f8b8">

<img width="1440" alt="Screenshot 2024-03-23 at 5 38 12 PM"
src="https://github.com/bevyengine/bevy/assets/2975848/c0f72b8f-8e70-46fa-a750-2041ba6dfb78">

<img width="1440" alt="Screenshot 2024-03-23 at 5 37 15 PM"
src="https://github.com/bevyengine/bevy/assets/2975848/db287e4f-2b0b-4fd4-8d71-88f4e7a03b7c">

The UV-mapping of `Triangle2d` has also been changed to use the same
logic.

---

## Changelog

- Implemented `Meshable` for `Triangle3d`.
- Changed UV-mapping of `Triangle2d` to match that of `Triangle3d`.

## Migration Guide

The UV-mapping of `Triangle2d` has changed with this PR; the main
difference is that the UVs are no longer dependent on the triangle's
absolute coordinates, but instead follow translations of the triangle
itself in its definition. If you depended on the old UV-coordinates for
`Triangle2d`, then you will have to update affected areas to use the new
ones which, briefly, can be described as follows:
- The first coordinate is parallel to the line between the first two
vertices of the triangle.
- The second coordinate is orthogonal to this, pointing in the direction
of the third point.

Generally speaking, this means that the first two points will have
coordinates `[_, 0.]`, while the third coordinate will be `[_, 1.]`,
with the exact values depending on the position of the third point
relative to the first two. For acute triangles, the first two vertices
always have UV-coordinates `[0., 0.]` and `[1., 0.]` respectively. For
obtuse triangles, the third point will have coordinate `[0., 1.]` or
`[1., 1.]`, with the coordinate of one of the two other points shifting
to maintain proportionality.

For example: 
- The default `Triangle2d` has UV-coordinates `[0., 0.]`, `[0., 1.]`,
[`0.5, 1.]`.
- The triangle with vertices `vec2(0., 0.)`, `vec2(1., 0.)`, `vec2(2.,
1.)` has UV-coordinates `[0., 0.]`, `[0.5, 0.]`, `[1., 1.]`.
- The triangle with vertices `vec2(0., 0.)`, `vec2(1., 0.)`, `vec2(-2.,
1.)` has UV-coordinates `[2./3., 0.]`, `[1., 0.]`, `[0., 1.]`.

## Discussion

### Design considerations

1. There are a number of ways to UV-map a triangle (at least two of
which are fairly natural); for instance, we could instead declare the
second axis to be essentially `bc` so that the vertices are always `[0.,
0.]`, `[0., 1.]`, and `[1., 0.]`. I chose this method instead because it
is skew-free, so that the sampling from textures has only bilinear
scaling. I think this is better for cases where a relatively "uniform"
texture is mapped to the triangle, but it's possible that we might want
to support the other thing in the future. Thankfully, we already have
the capability of easily expanding to do that with Builders if the need
arises. This could also allow us to provide things like barycentric
subdivision.
2. Presently, the mesh-creation code for `Triangle3d` is set up to never
fail, even in the case that the triangle is degenerate. I have mixed
feelings about this, but none of our other primitive meshes fail, so I
decided to take the same approach. Maybe this is something that could be
worth revisiting in the future across the board.

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Jakub Marcowski <37378746+Chubercik@users.noreply.github.com>
2024-04-08 23:00:04 +00:00
James Liu
934f2cfadf
Clean up some low level dependencies (#12858)
# Objective
Minimize the number of dependencies low in the tree.

## Solution

* Remove the dependency on rustc-hash in bevy_ecs (not used) and
bevy_macro_utils (only used in one spot).
* Deduplicate the dependency on `sha1_smol` with the existing blake3
dependency already being used for bevy_asset.
 * Remove the unused `ron` dependency on `bevy_app`
* Make the `serde` dependency for `bevy_ecs` optional. It's only used
for serializing Entity.
* Change the `wgpu` dependency to `wgpu-types`, and make it optional for
`bevy_color`.
 * Remove the unused `thread-local` dependency on `bevy_render`.
* Make multiple dependencies for `bevy_tasks` optional and enabled only
when running with the `multi-threaded` feature. Preferably they'd be
disabled all the time on wasm, but I couldn't find a clean way to do
this.

---

## Changelog
TODO 

## Migration Guide
TODO
2024-04-08 19:45:42 +00:00
Hexorg
b9a232966b
Fixed a bug where skybox ddsfile would crash from wgpu (#12894)
Fixed a bug where skybox ddsfile would crash from wgpu while trying to
read past the file buffer.
Added a unit-test to prevent regression.
Bumped ddsfile dependency version to 0.5.2

# Objective

Prevents a crash when loading dds skybox.

## Solution

ddsfile already automatically sets array layers to be 6 for skyboxes.
Removed bevy's extra *= 6 multiplication.

---

This is a copy of
[#12598](https://github.com/bevyengine/bevy/pull/12598) ... I made that
one off of main and wasn't able to make more pull requests without
making a new branch.

---------

Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-04-08 17:16:25 +00:00
Martín Maita
0c78bf3bb0
Moves intern and label modules into bevy_ecs (#12772)
# Objective

- Attempts to solve two items from
https://github.com/bevyengine/bevy/issues/11478.

## Solution

- Moved `intern` module from `bevy_utils` into `bevy_ecs` crate and
updated all relevant imports.
- Moved `label` module from `bevy_utils` into `bevy_ecs` crate and
updated all relevant imports.

---

## Migration Guide

- Replace `bevy_utils::define_label` imports with
`bevy_ecs::define_label` imports.
- Replace `bevy_utils:🏷️:DynEq` imports with
`bevy_ecs:🏷️:DynEq` imports.
- Replace `bevy_utils:🏷️:DynHash` imports with
`bevy_ecs:🏷️:DynHash` imports.
- Replace `bevy_utils::intern::Interned` imports with
`bevy_ecs::intern::Interned` imports.
- Replace `bevy_utils::intern::Internable` imports with
`bevy_ecs::intern::Internable` imports.
- Replace `bevy_utils::intern::Interner` imports with
`bevy_ecs::intern::Interner` imports.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2024-04-08 15:34:11 +00:00
robtfm
452821dd52
more robust gpu image use (#12606)
# Objective

make morph targets and tonemapping more tolerant of delayed image
loading.

neither of these actually fail currently unless using a bespoke loader
(and even then it would be rare), but i am working on adding throttling
for asset gpu uploads (as a stopgap until we can do proper asset
streaming) and they break with that.

## Solution

when a mesh with morph targets is uploaded to the gpu, the prepare
function uploads the morph target texture if it's available, otherwise
it uploads without morph targets. this is generally fine as long as
morph targets are typically loaded from bytes (in gltf loader), but may
fail for a custom loader if the asset server async-loads the target
texture and the texture is not available yet. the mesh fails to render
and doesn't update when the image is loaded
-> if morph targets are specified but not ready yet, retry mesh upload
next frame

tonemapping `unwrap`s on the lookup table image. this is never a problem
since the image is added via `include_bytes!`, but could be a problem in
future with asset gpu throttling/streaming.
-> if the lookup texture is not yet available, use a fallback
-> in the node, check if the fallback was used before caching the bind
group
2024-04-07 17:18:58 +00:00
Luís Figueiredo
ac91b19118
Fixes #12000: When viewport is set to camera and switched to SizedFul… (#12861)
# Objective

- When viewport is set to the same size as the window on creation, when
adjusting to SizedFullscreen, the window may be smaller than the
viewport for a moment, which caused the arguments to be invalid and
panic.
- Fixes #12000.

## Solution

- The fix consists of matching the size of the viewport to the lower
size of the window ( if the x value of the window is lower, I update
only the x value of the viewport, same for the y value). Also added a
test to show that it does not panic anymore.

---
2024-04-06 02:22:50 +00:00
Multirious
a27ce270d0
Fix broken link in mesh docs (#12872)
# Objective

Fixes #12813

## Solution

Update the link to
`https://github.com/bevyengine/bevy/tree/main/crates/bevy_render/src/mesh/primitives`
2024-04-05 18:22:52 +00:00
Remi Godin
c233d6e0d0
Added method to get waiting pipelines IDs from pipeline cache. (#12874)
# Objective
- Add a way to easily get currently waiting pipelines IDs.

## Solution
- Added a method to get waiting pipelines `CachedPipelineId`.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2024-04-05 03:46:15 +00:00
James Liu
a4ed1b88b8
Relax BufferVec's type constraints (#12866)
# Objective
Since BufferVec was first introduced, `bytemuck` has added additional
traits with fewer restrictions than `Pod`. Within BufferVec, we only
rely on the constraints of `bytemuck::cast_slice` to a `u8` slice, which
now only requires `T: NoUninit` which is a strict superset of `Pod`
types.

## Solution
Change out the `Pod` generic type constraint with `NoUninit`. Also
taking the opportunity to substitute `cast_slice` with
`must_cast_slice`, which avoids a runtime panic in place of a compile
time failure if `T` cannot be used.

---

## Changelog
Changed: `BufferVec` now supports working with types containing
`NoUninit` but not `Pod` members.
Changed: `BufferVec` will now fail to compile if used with a type that
cannot be safely read from. Most notably, this includes ZSTs, which
would previously always panic at runtime.
2024-04-05 02:11:41 +00:00
Carter Anderson
b27896f875
Disable RAY_QUERY and RAY_TRACING_ACCELERATION_STRUCTURE by default (#12862)
# Objective

See https://github.com/gfx-rs/wgpu/issues/5488 for context and
rationale.

## Solution

- Disables `wgpu::Features::RAY_QUERY` and
`wgpu::Features::RAY_TRACING_ACCELERATION_STRUCTURE` by default. They
must be explicitly opted into now.

---

## Changelog

- Disables `wgpu::Features::RAY_QUERY` and
`wgpu::Features::RAY_TRACING_ACCELERATION_STRUCTURE` by default. They
must be explicitly opted into now.

## Migration Guide

- If you need `wgpu::Features::RAY_QUERY` or
`wgpu::Features::RAY_TRACING_ACCELERATION_STRUCTURE`, enable them
explicitly using `WgpuSettings::features`
2024-04-04 19:20:19 +00:00
Patrick Walton
37522fd0ae
Micro-optimize queue_material_meshes, primarily to remove bit manipulation. (#12791)
This commit makes the following optimizations:

## `MeshPipelineKey`/`BaseMeshPipelineKey` split

`MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives
in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`.
Conceptually, `BaseMeshPipelineKey` is a superclass of
`MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the
highest (most significant) bit and grow downward toward the lowest bit;
for `MeshPipelineKey`, the bits start at the lowest bit and grow upward
toward the highest bit. This prevents them from colliding.

The goal of this is to avoid having to reassemble bits of the pipeline
key for every mesh every frame. Instead, we can just use a bitwise or
operation to combine the pieces that make up a `MeshPipelineKey`.

## `specialize_slow`

Previously, all of `specialize()` was marked as `#[inline]`. This
bloated `queue_material_meshes` unnecessarily, as a large chunk of it
ended up being a slow path that was rarely hit. This commit refactors
the function to move the slow path to `specialize_slow()`.

Together, these two changes shave about 5% off `queue_material_meshes`:

![Screenshot 2024-03-29
130002](https://github.com/bevyengine/bevy/assets/157897/a7e5a994-a807-4328-b314-9003429dcdd2)

## Migration Guide

- The `primitive_topology` field on `GpuMesh` is now an accessor method:
`GpuMesh::primitive_topology()`.
- For performance reasons, `MeshPipelineKey` has been split into
`BaseMeshPipelineKey`, which lives in `bevy_render`, and
`MeshPipelineKey`, which lives in `bevy_pbr`. These two should be
combined with bitwise-or to produce the final `MeshPipelineKey`.
2024-04-01 21:58:53 +00:00
Matty
c8aa3ac7d1
Meshing for Annulus primitive (#12734)
# Objective

Related to #10572 
Allow the `Annulus` primitive to be meshed.

## Solution

We introduce a `Meshable` structure, `AnnulusMeshBuilder`, which allows
the `Annulus` primitive to be meshed, leaving optional configuration of
the number of angular sudivisions to the user. Here is a picture of the
annulus's UV-mapping:
<img width="1440" alt="Screenshot 2024-03-26 at 10 39 48 AM"
src="https://github.com/bevyengine/bevy/assets/2975848/b170291d-cba7-441b-90ee-2ad6841eaedb">

Other features are essentially identical to the implementations for
`Circle`/`Ellipse`.

---

## Changelog

- Introduced `AnnulusMeshBuilder`
- Implemented `Meshable` for `Annulus` with `Output =
AnnulusMeshBuilder`
- Implemented `From<Annulus>` and `From<AnnulusMeshBuilder>` for `Mesh`
- Added `impl_reflect!` declaration for `Annulus` and `Triangle3d` in
`bevy_reflect`

---

## Discussion

### Design considerations

The only interesting wrinkle here is that the existing UV-mapping of
`Ellipse` (and hence of `Circle` and `RegularPolygon`) is non-radial
(it's skew-free, created by situating the mesh in a bounding rectangle),
so the UV-mapping of `Annulus` doesn't limit to that of `Circle` as its
inner radius tends to zero, for instance. I don't see this as a real
issue for `Annulus`, which should almost certainly have this kind of
UV-mapping, but I think we ought to at least consider allowing mesh
configuration for `Circle`/`Ellipse` that performs radial UV-mapping
instead. (In these cases in particular, it would be especially easy,
since we wouldn't need a different parameter set in the builder.)

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-04-01 21:55:49 +00:00
BD103
84363f2fab
Remove redundant imports (#12817)
# Objective

- There are several redundant imports in the tests and examples that are
not caught by CI because additional flags need to be passed.

## Solution

- Run `cargo check --workspace --tests` and `cargo check --workspace
--examples`, then fix all warnings.
- Add `test-check` to CI, which will be run in the check-compiles job.
This should catch future warnings for tests. Examples are already
checked, but I'm not yet sure why they weren't caught.

## Discussion

- Should the `--tests` and `--examples` flags be added to CI, so this is
caught in the future?
- If so, #12818 will need to be merged first. It was also a warning
raised by checking the examples, but I chose to split off into a
separate PR.

---------

Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-04-01 19:59:08 +00:00
Jake
abd94480ab
Normalize warning messages with Nvidia drivers (#12833)
# Objective
There are currently 2 different warning messages that are logged when
resizing on Linux with Nvidia drivers (introduced in
70c69cdd51).
Fixes #12830

## Solution
Generalize both to say:
```Couldn't get swap chain texture. This often happens with the NVIDIA drivers on Linux. It can be safely ignored.```
2024-04-01 19:56:56 +00:00
François Mockers
93fd02e8ea
remove DeterministicRenderingConfig (#12811)
# Objective

- Since #12453, `DeterministicRenderingConfig` doesn't do anything

## Solution

- Remove it

---

## Migration Guide

- Removed `DeterministicRenderingConfig`. There shouldn't be any z
fighting anymore in the rendering even without setting
`stable_sort_z_fighting`
2024-04-01 09:32:47 +00:00
Cameron
01649f13e2
Refactor App and SubApp internals for better separation (#9202)
# Objective

This is a necessary precursor to #9122 (this was split from that PR to
reduce the amount of code to review all at once).

Moving `!Send` resource ownership to `App` will make it unambiguously
`!Send`. `SubApp` must be `Send`, so it can't wrap `App`.

## Solution

Refactor `App` and `SubApp` to not have a recursive relationship. Since
`SubApp` no longer wraps `App`, once `!Send` resources are moved out of
`World` and into `App`, `SubApp` will become unambiguously `Send`.

There could be less code duplication between `App` and `SubApp`, but
that would break `App` method chaining.

## Changelog

- `SubApp` no longer wraps `App`.
- `App` fields are no longer publicly accessible.
- `App` can no longer be converted into a `SubApp`.
- Various methods now return references to a `SubApp` instead of an
`App`.
## Migration Guide

- To construct a sub-app, use `SubApp::new()`. `App` can no longer
convert into `SubApp`.
- If you implemented a trait for `App`, you may want to implement it for
`SubApp` as well.
- If you're accessing `app.world` directly, you now have to use
`app.world()` and `app.world_mut()`.
- `App::sub_app` now returns `&SubApp`.
- `App::sub_app_mut`  now returns `&mut SubApp`.
- `App::get_sub_app` now returns `Option<&SubApp>.`
- `App::get_sub_app_mut` now returns `Option<&mut SubApp>.`
2024-03-31 03:16:10 +00:00
Eero Lehtinen
70c69cdd51
Fix crash on Linux Nvidia 550 driver (#12542)
# Objective

Fix crashing on Linux with latest stable Nvidia 550 driver when
resizing. The crash happens at startup with some setups.

Fixes #12199

I think this would be nice to get into 0.13.1

## Solution

Ignore `wgpu::SurfaceError::Outdated` always on this platform+driver.

It looks like Nvidia considered the previous behaviour of not returning
this error a bug:
"Fixed a bug where vkAcquireNextImageKHR() was not returning
VK_ERROR_OUT_OF_DATE_KHR when it should with WSI X11 swapchains"
(https://www.nvidia.com/Download/driverResults.aspx/218826/en-us/)

What I gather from this is that the surface was outdated on previous
drivers too, but they just didn't report it as an error. So behaviour
shouldn't change.

In the issue conversation we experimented with calling `continue` when
this error happens, but I found that it results in some small issues
like bevy_egui scale not updating with the window sometimes. Just doing
nothing seems to work better.

## Changelog

- Fixed crashing on Linux with Nvidia 550 driver when resizing the
window

## Migration Guide

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 23:10:54 +00:00
James Liu
24030d2a0c
Vectorize reset_view_visibility (#12797)
# Objective
Speed up CPU-side rendering.

## Solution
Use `QueryIter::for_each` and `Mut::bypass_change_detection` to minimize
the total amount of data being written and allow autovectorization to
speed up iteration.

## Performance
Tested against the default `many_cubes`, this results in greater than
15x speed up: 281us -> 18.4us.

![image](https://github.com/bevyengine/bevy/assets/3137680/18369285-843e-4eb6-9716-c99c6f5ea4e2)

As `ViewVisibility::HIDDEN` just wraps false, this is likely just
degenerating into `memset(0)`s on the tables.
2024-03-30 08:28:16 +00:00
Patrick Walton
4dadebd9c4
Improve performance by binning together opaque items instead of sorting them. (#12453)
Today, we sort all entities added to all phases, even the phases that
don't strictly need sorting, such as the opaque and shadow phases. This
results in a performance loss because our `PhaseItem`s are rather large
in memory, so sorting is slow. Additionally, determining the boundaries
of batches is an O(n) process.

This commit makes Bevy instead applicable place phase items into *bins*
keyed by *bin keys*, which have the invariant that everything in the
same bin is potentially batchable. This makes determining batch
boundaries O(1), because everything in the same bin can be batched.
Instead of sorting each entity, we now sort only the bin keys. This
drops the sorting time to near-zero on workloads with few bins like
`many_cubes --no-frustum-culling`. Memory usage is improved too, with
batch boundaries and dynamic indices now implicit instead of explicit.
The improved memory usage results in a significant win even on
unbatchable workloads like `many_cubes --no-frustum-culling
--vary-material-data-per-instance`, presumably due to cache effects.

Not all phases can be binned; some, such as transparent and transmissive
phases, must still be sorted. To handle this, this commit splits
`PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the
logic that today deals with `PhaseItem`s has been moved to
`SortedPhaseItem`. `BinnedPhaseItem` has the new logic.

Frame time results (in ms/frame) are as follows:

| Benchmark                | `binning` | `main`  | Speedup |
| ------------------------ | --------- | ------- | ------- |
| `many_cubes -nfc -vpi` | 232.179     | 312.123   | 34.43%  |
| `many_cubes -nfc`        | 25.874 | 30.117 | 16.40%  |
| `many_foxes`             | 3.276 | 3.515 | 7.30%   |

(`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for
`--vary-per-instance`.)

---

## Changelog

### Changed

* Render phases have been split into binned and sorted phases. Binned
phases, such as the common opaque phase, achieve improved CPU
performance by avoiding the sorting step.

## Migration Guide

- `PhaseItem` has been split into `BinnedPhaseItem` and
`SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need
to migrate them to one of these two types. `SortedPhaseItem` requires
the fewest code changes, but you may want to pick `BinnedPhaseItem` if
your phase doesn't require sorting, as that enables higher performance.

## Tracy graphs

`many-cubes --no-frustum-culling`, `main` branch:
<img width="1064" alt="Screenshot 2024-03-12 180037"
src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55">

`many-cubes --no-frustum-culling`, this branch:
<img width="1064" alt="Screenshot 2024-03-12 180011"
src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c">

You can see that `batch_and_prepare_binned_render_phase` is a much
smaller fraction of the time. Zooming in on that function, with yellow
being this branch and red being `main`, we see:

<img width="1064" alt="Screenshot 2024-03-12 175832"
src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0">

The binning happens in `queue_material_meshes`. Again with yellow being
this branch and red being `main`:
<img width="1064" alt="Screenshot 2024-03-12 175755"
src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96">

We can see that there is a small regression in `queue_material_meshes`
performance, but it's not nearly enough to outweigh the large gains in
`batch_and_prepare_binned_render_phase`.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
James Liu
e62a01f403
Make PersistentGpuBufferable a safe trait (#12744)
# Objective
Fixes #12727. All parts that `PersistentGpuBuffer` interact with should
be 100% safe both on the CPU and the GPU: `Queue::write_buffer_with`
zeroes out the slice being written to and when uploading to the GPU, and
all slice writes are bounds checked on the CPU side.

## Solution
Make `PersistentGpuBufferable` a safe trait. Enforce it's correct
implementation via assertions. Re-enable `forbid(unsafe_code)` on
`bevy_pbr`.
2024-03-29 13:14:34 +00:00
Martín Maita
1b7837c0b2
Update image requirement from 0.24 to 0.25 (#12458)
# Objective

- Closes https://github.com/bevyengine/bevy/pull/12415

## Solution

- Refactored code that was changed/deprecated in `image` 0.25.
- Please review this PR carefully since I'm just making the changes
without any context or deep knowledge of the module.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-29 06:40:09 +00:00
James Liu
56bcbb0975
Forbid unsafe in most crates in the engine (#12684)
# Objective
Resolves #3824. `unsafe` code should be the exception, not the norm in
Rust. It's obviously needed for various use cases as it's interfacing
with platforms and essentially running the borrow checker at runtime in
the ECS, but the touted benefits of Bevy is that we are able to heavily
leverage Rust's safety, and we should be holding ourselves accountable
to that by minimizing our unsafe footprint.

## Solution
Deny `unsafe_code` workspace wide. Add explicit exceptions for the
following crates, and forbid it in almost all of the others.

* bevy_ecs - Obvious given how much unsafe is needed to achieve
performant results
* bevy_ptr - Works with raw pointers, even more low level than bevy_ecs.
 * bevy_render - due to needing to integrate with wgpu
 * bevy_window - due to needing to integrate with raw_window_handle
* bevy_utils - Several unsafe utilities used by bevy_ecs. Ideally moved
into bevy_ecs instead of made publicly usable.
 * bevy_reflect - Required for the unsafe type casting it's doing.
 * bevy_transform - for the parallel transform propagation
 * bevy_gizmos  - For the SystemParam impls it has.
* bevy_assets - To support reflection. Might not be required, not 100%
sure yet.
* bevy_mikktspace - due to being a conversion from a C library. Pending
safe rewrite.
* bevy_dynamic_plugin - Inherently unsafe due to the dynamic loading
nature.

Several uses of unsafe were rewritten, as they did not need to be using
them:

* bevy_text - a case of `Option::unchecked` could be rewritten as a
normal for loop and match instead of an iterator.
* bevy_color - the Pod/Zeroable implementations were replaceable with
bytemuck's derive macros.
2024-03-27 03:30:08 +00:00
James Liu
a0f492b2dd
Fix CI for wasm atomics (#12730)
# Objective
CI is currently broken because of `DiagnosticsRecorder` not being Send
and Sync as required by Resource.

## Solution
Wrap `DiagnosticsRecorder` internally with a `WgpuWrapper`.
2024-03-26 14:26:21 +00:00
Ian Kettlewell
b35974010b
Get Bevy building for WebAssembly with multithreading (#12205)
# Objective

This gets Bevy building on Wasm when the `atomics` flag is enabled. This
does not yet multithread Bevy itself, but it allows Bevy users to use a
crate like `wasm_thread` to spawn their own threads and manually
parallelize work. This is a first step towards resolving #4078 . Also
fixes #9304.

This provides a foothold so that Bevy contributors can begin to think
about multithreaded Wasm's constraints and Bevy can work towards changes
to get the engine itself multithreaded.

Some flags need to be set on the Rust compiler when compiling for Wasm
multithreading. Here's what my build script looks like, with the correct
flags set, to test out Bevy examples on web:

```bash
set -e
RUSTFLAGS='-C target-feature=+atomics,+bulk-memory,+mutable-globals' \
     cargo build --example breakout --target wasm32-unknown-unknown -Z build-std=std,panic_abort --release
 wasm-bindgen --out-name wasm_example \
   --out-dir examples/wasm/target \
   --target web target/wasm32-unknown-unknown/release/examples/breakout.wasm
 devserver --header Cross-Origin-Opener-Policy='same-origin' --header Cross-Origin-Embedder-Policy='require-corp' --path examples/wasm
```

A few notes:

1. `cpal` crashes immediately when the `atomics` flag is set. That is
patched in https://github.com/RustAudio/cpal/pull/837, but not yet in
the latest crates.io release.

That can be temporarily worked around by patching Cpal like so:
```toml
[patch.crates-io]
cpal = { git = "https://github.com/RustAudio/cpal" }
```

2. When testing out `wasm_thread` you need to enable the `es_modules`
feature.

## Solution

The largest obstacle to compiling Bevy with `atomics` on web is that
`wgpu` types are _not_ Send and Sync. Longer term Bevy will need an
approach to handle that, but in the near term Bevy is already configured
to be single-threaded on web.

Therefor it is enough to wrap `wgpu` types in a
`send_wrapper::SendWrapper` that _is_ Send / Sync, but panics if
accessed off the `wgpu` thread.

---

## Changelog

- `wgpu` types that are not `Send` are wrapped in
`send_wrapper::SendWrapper` on Wasm + 'atomics'
- CommandBuffers are not generated in parallel on Wasm + 'atomics'

## Questions
- Bevy should probably add CI checks to make sure this doesn't regress.
Should that go in this PR or a separate PR? **Edit:** Added checks to
build Wasm with atomics

---------

Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: daxpedda <daxpedda@gmail.com>
Co-authored-by: François <francois.mockers@vleue.com>
2024-03-25 19:10:18 +00:00
Tygyh
e9343b052f
Support calculating normals for indexed meshes (#11654)
# Objective

- Finish #3987

## Solution

- Rebase and fix typo.

Co-authored-by: Robert Bragg <robert@sixbynine.org>
2024-03-25 19:09:24 +00:00