Mirrors/bevy

mirror of https://github.com/bevyengine/bevy synced 2025-01-01 15:58:52 +00:00

Author	SHA1	Message	Date
atlv	c29e67153b	Expose Pipeline Compilation Zero Initialize Workgroup Memory Option (#16301 ) # Objective - wgpu 0.20 made workgroup vars stop being zero-init by default. this broke some applications (cough foresight cough) and now we workaround it. wgpu exposes a compilation option that zero initializes workgroup memory by default, but bevy does not expose it. ## Solution - expose the compilation option wgpu gives us ## Testing - ran examples: 3d_scene, compute_shader_game_of_life, gpu_readback, lines, specialized_mesh_pipeline. they all work - confirmed fix for our own problems --- </details> ## Migration Guide - add `zero_initialize_workgroup_memory: false,` to `ComputePipelineDescriptor` or `RenderPipelineDescriptor` structs to preserve 0.14 functionality, add `zero_initialize_workgroup_memory: true,` to restore bevy 0.13 functionality.	2024-11-08 21:42:37 +00:00
robtfm	262f471934	move mesh uniform construction out of Core3d (#16188 ) # Objective gpu based mesh uniform construction in the `GpuPreprocessNode` is currently in `Core3d`. The node iterates all views and schedules the uniform construction for each. so - when there are multiple 3d cameras, it runs multiple times on each view - if a view wants to render meshes but doesn't use the `Core3d` graph, the camera must run later than at least one `Core3d`-based camera (or add the node to its own graph, duplicating the work) - If views want to share mesh uniforms there is no way to avoid running the preprocessing for every view ## Solution - move the node to the top level of the rendergraph, before the camera driver node - make the `PreprocessBindGroup` `clone`able, and add a `SkipGpuPreprocessing` component to allow opting out per view	2024-11-04 15:29:11 +00:00
Zachary Harrold	d70595b667	Add `core` and `alloc` over `std` Lints (#15281 ) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>	2024-09-27 00:59:59 +00:00
Clar Fon	efda7f3f9c	Simpler lint fixes: makes `ci lints` work but disables a lint for now (#15376 ) Takes the first two commits from #15375 and adds suggestions from this comment: https://github.com/bevyengine/bevy/pull/15375#issuecomment-2366968300 See #15375 for more reasoning/motivation. ## Rebasing (rerunning) ```rust git switch simpler-lint-fixes git reset --hard main cargo fmt --all -- --unstable-features --config normalize_comments=true,imports_granularity=Crate cargo fmt --all git add --update git commit --message "rustfmt" cargo clippy --workspace --all-targets --all-features --fix cargo fmt --all -- --unstable-features --config normalize_comments=true,imports_granularity=Crate cargo fmt --all git add --update git commit --message "clippy" git cherry-pick e6c0b94f6795222310fb812fa5c4512661fc7887 ```	2024-09-24 11:42:59 +00:00
Zachary Harrold	bc13161416	Migrated `NonZero` to `NonZero<>` (#14978 ) # Objective - Fixes #14974 ## Solution - Replace all* instances of `NonZero` with `NonZero<>` ## Testing - CI passed locally. --- ## Notes Within the `bevy_reflect` implementations for `std` types, `impl_reflect_value!()` will continue to use the type aliases instead, as it inappropriately parses the concrete type parameter as a generic argument. If the `ZeroablePrimitive` trait was stable, or the macro could be modified to accept a finite list of types, then we could fully migrate.	2024-08-30 02:37:47 +00:00
Rob Parrett	e46e246581	Fix a few "repeated word" typos (#13955 ) # Objective Stumbled on one of these and went digging for more ## Solution ```diff - word word + word ```	2024-06-20 21:35:20 +00:00
Patrick Walton	16531fb3e3	Implement GPU frustum culling. (#12889 ) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` currently does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into indirect mode. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.	2024-04-28 12:50:00 +00:00
Patrick Walton	11817f4ba4	Generate `MeshUniform`s on the GPU via compute shader where available. (#12773 ) Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a mesh input uniform and adding a mesh uniform building compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the extraction phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. \| Benchmark \| This branch \| `main` \| Speedup \| \|------------------------\|-------------\|---------\|---------\| \| `many_cubes -nfc` \| 17.259 \| 24.529 \| 42.12% \| \| `many_cubes -nfc -vpi` \| 302.116 \| 312.123 \| 3.31% \| \| `many_foxes` \| 3.227 \| 3.515 \| 8.92% \| Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`: ![Screenshot 2024-04-02 124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10) It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`: ![Screenshot 2024-04-02 124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94) There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`: ![Screenshot 2024-04-02 125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435) There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`: ![Screenshot 2024-04-02 125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2) There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`.	2024-04-10 05:33:32 +00:00

8 commits