mirror of
https://github.com/bevyengine/bevy
synced 2024-11-22 20:53:53 +00:00
a0faf9cd01
### Builder changes - Increased meshlet max vertices/triangles from 64v/64t to 255v/128t (meshoptimizer won't allow 256v sadly). This gives us a much greater percentage of meshlets with max triangle count (128). Still not perfect, we still end up with some tiny <=10 triangle meshlets that never really get simplified, but it's progress. - Removed the error target limit. Now we allow meshoptimizer to simplify as much as possible. No reason to cap this out, as the cluster culling code will choose a good LOD level anyways. Again leads to higher quality LOD trees. - After some discussion and consulting the Nanite slides again, changed meshlet group error from _adding_ the max child's error to the group error, to doing `group_error = max(group_error, max_child_error)`. Error is already cumulative between LODs as the edges we're collapsing during simplification get longer each time. - Bumped the 65% simplification threshold to allow up to 95% of the original geometry (e.g. accept simplification as valid even if we only simplified 5% of the triangles). This gives us closer to log2(initial_meshlet_count) LOD levels, and fewer meshlet roots in the DAG. Still more work to be done in the future here. Maybe trying METIS for meshlet building instead of meshoptimizer. Using ~8 clusters per group instead of ~4 might also make a big difference. The Nanite slides say that they have 8-32 meshlets per group, suggesting some kind of heuristic. Unfortunately meshopt's compute_cluster_bounds won't work with large groups atm (https://github.com/zeux/meshoptimizer/discussions/750#discussioncomment-10562641) so hard to test. Based on discussion from https://github.com/bevyengine/bevy/discussions/14998, https://github.com/zeux/meshoptimizer/discussions/750, and discord. ### Runtime changes - cluster:triangle packed IDs are now stored 25:7 instead of 26:6 bits, as max triangles per cluster are now 128 instead of 64 - Hardware raster now spawns 128 * 3 vertices instead of 64 * 3 vertices to account for the new max triangles limit - Hardware raster now outputs NaN triangles (0 / 0) instead of zero-positioned triangles for extra vertex invocations over the cluster triangle count. Shouldn't really be a difference idt, but I did it anyways. - Software raster now does 128 threads per workgroup instead of 64 threads. Each thread now loads, projects, and caches a vertex (vertices 0-127), and then if needed does so again (vertices 128-254). Each thread then rasterizes one of 128 triangles. - Fixed a bug with `needs_dispatch_remap`. I had the condition backwards in my last PR, I probably committed it by accident after testing the non-default code path on my GPU. |
||
---|---|---|
.. | ||
3d_scene.rs | ||
3d_shapes.rs | ||
3d_viewport_to_world.rs | ||
animated_material.rs | ||
anisotropy.rs | ||
anti_aliasing.rs | ||
atmospheric_fog.rs | ||
auto_exposure.rs | ||
blend_modes.rs | ||
bloom_3d.rs | ||
clearcoat.rs | ||
color_grading.rs | ||
deferred_rendering.rs | ||
depth_of_field.rs | ||
fog.rs | ||
fog_volumes.rs | ||
generate_custom_mesh.rs | ||
irradiance_volumes.rs | ||
lighting.rs | ||
lightmaps.rs | ||
lines.rs | ||
load_gltf.rs | ||
load_gltf_extras.rs | ||
meshlet.rs | ||
motion_blur.rs | ||
orthographic.rs | ||
parallax_mapping.rs | ||
parenting.rs | ||
pbr.rs | ||
post_processing.rs | ||
reflection_probes.rs | ||
render_to_texture.rs | ||
rotate_environment_map.rs | ||
scrolling_fog.rs | ||
shadow_biases.rs | ||
shadow_caster_receiver.rs | ||
skybox.rs | ||
spherical_area_lights.rs | ||
split_screen.rs | ||
spotlight.rs | ||
ssao.rs | ||
ssr.rs | ||
texture.rs | ||
tonemapping.rs | ||
transmission.rs | ||
transparency_3d.rs | ||
two_passes.rs | ||
update_gltf_scene.rs | ||
vertex_colors.rs | ||
visibility_range.rs | ||
volumetric_fog.rs | ||
wireframe.rs |