No description
Find a file
Gonçalo Rica Pais da Silva 9c0fca072d
Optimise Entity with repr align & manual PartialOrd/Ord (#10558)
# Objective

- Follow up on https://github.com/bevyengine/bevy/pull/10519, diving
deeper into optimising `Entity` due to the `derive`d `PartialOrd`
`partial_cmp` not being optimal with codegen:
https://github.com/rust-lang/rust/issues/106107
- Fixes #2346.

## Solution

Given the previous PR's solution and the other existing LLVM codegen
bug, there seemed to be a potential further optimisation possible with
`Entity`. In exploring providing manual `PartialOrd` impl, it turned out
initially that the resulting codegen was not immediately better than the
derived version. However, once `Entity` was given `#[repr(align(8)]`,
the codegen improved remarkably, even more once the fields in `Entity`
were rearranged to correspond to a `u64` layout (Rust doesn't
automatically reorder fields correctly it seems). The field order and
`align(8)` additions also improved `to_bits` codegen to be a single
`mov` op. In turn, this led me to replace the previous
"non-shortcircuiting" impl of `PartialEq::eq` to use direct `to_bits`
comparison.

The result was remarkably better codegen across the board, even for
hastable lookups.

The current baseline codegen is as follows:
https://godbolt.org/z/zTW1h8PnY

Assuming the following example struct that mirrors with the existing
`Entity` definition:

```rust
#[derive(Clone, Copy, Eq, PartialEq, PartialOrd, Ord)]
pub struct FakeU64 {
    high: u32,
    low: u32,
}
```

the output for `to_bits` is as follows:

```
example::FakeU64::to_bits:
        shl     rdi, 32
        mov     eax, esi
        or      rax, rdi
        ret
```

Changing the struct to:
```rust
#[derive(Clone, Copy, Eq)]
#[repr(align(8))]
pub struct FakeU64 {
    low: u32,
    high: u32,
}
```
and providing manual implementations for `PartialEq`/`PartialOrd`/`Ord`,
`to_bits` now optimises to:
```
example::FakeU64::to_bits:
        mov     rax, rdi
        ret
```
The full codegen example for this PR is here for reference:
https://godbolt.org/z/n4Mjx165a

To highlight, `gt` comparison goes from
```
example::greater_than:
        cmp     edi, edx
        jae     .LBB3_2
        xor     eax, eax
        ret
.LBB3_2:
        setne   dl
        cmp     esi, ecx
        seta    al
        or      al, dl
        ret
```
to
```
example::greater_than:
        cmp     rdi, rsi
        seta    al
        ret
```

As explained on Discord by @scottmcm :

>The root issue here, as far as I understand it, is that LLVM's
middle-end is inexplicably unwilling to merge loads if that would make
them under-aligned. It leaves that entirely up to its target-specific
back-end, and thus a bunch of the things that you'd expect it to do that
would fix this just don't happen.

## Benchmarks

Before discussing benchmarks, everything was tested on the following
specs:

AMD Ryzen 7950X 16C/32T CPU
64GB 5200 RAM
AMD RX7900XT 20GB Gfx card
Manjaro KDE on Wayland

I made use of the new entity hashing benchmarks to see how this PR would
improve things there. With the changes in place, I first did an
implementation keeping the existing "non shortcircuit" `PartialEq`
implementation in place, but with the alignment and field ordering
changes, which in the benchmark is the `ord_shortcircuit` column. The
`to_bits` `PartialEq` implementation is the `ord_to_bits` column. The
main_ord column is the current existing baseline from `main` branch.


![Screenshot_20231114_132908](https://github.com/bevyengine/bevy/assets/3116268/cb9090c9-ff74-4cc5-abae-8e4561332261)

My machine is not super set-up for benchmarking, so some results are
within noise, but there's not just a clear improvement between the
non-shortcircuiting implementation, but even further optimisation taking
place with the `to_bits` implementation.

On my machine, a fair number of the stress tests were not showing any
difference (indicating other bottlenecks), but I was able to get a clear
difference with `many_foxes` with a fox count of 10,000:

Test with `cargo run --example many_foxes --features
bevy/trace_tracy,wayland --release -- --count 10000`:


![Screenshot_20231114_144217](https://github.com/bevyengine/bevy/assets/3116268/89bdc21c-7209-43c8-85ae-efbf908bfed3)

On avg, a framerate of about 28-29FPS was improved to 30-32FPS. "This
trace" represents the current PR's perf, while "External trace"
represents the `main` branch baseline.

## Changelog

Changed: micro-optimized Entity align and field ordering as well as
providing manual `PartialOrd`/`Ord` impls to help LLVM optimise further.

## Migration Guide

Any `unsafe` code relying on field ordering of `Entity` or sufficiently
cursed shenanigans should change to reflect the different internal
representation and alignment requirements of `Entity`.

Co-authored-by: james7132 <contact@jamessliu.com>
Co-authored-by: NathanW <nathansward@comcast.net>
2023-11-18 20:04:37 +00:00
.cargo Change recommended linker: zld to lld for MacOS (#7496) 2023-02-06 18:24:12 +00:00
.github add test on Android 14 / Pixel 8 (#10148) 2023-10-17 14:52:11 +00:00
assets Added Method to Allow Pipelined Asset Loading (#10565) 2023-11-16 17:47:31 +00:00
benches Optimize Entity::eq (#10519) 2023-11-14 02:06:21 +00:00
crates Optimise Entity with repr align & manual PartialOrd/Ord (#10558) 2023-11-18 20:04:37 +00:00
docs Add "update screenshots" to release checklist (#10369) 2023-11-04 18:43:15 +00:00
docs-template Improve WebGPU unstable flags docs (#10163) 2023-10-18 17:30:44 +00:00
errors Add some more docs for bevy_text. (#9873) 2023-10-27 18:53:57 +00:00
examples Added Method to Allow Pipelined Asset Loading (#10565) 2023-11-16 17:47:31 +00:00
src Schedule-First: the new and improved add_systems (#8079) 2023-03-18 01:45:34 +00:00
tests Deferred Renderer (#9258) 2023-10-12 22:10:38 +00:00
tools examples showcase: use patches instead of sed for wasm hacks (#10601) 2023-11-17 22:21:12 +00:00
.gitattributes Enforce linux-style line endings for .rs and .toml (#3197) 2021-11-26 21:05:35 +00:00
.gitignore Fix example showcase (#10366) 2023-11-04 01:33:51 +00:00
Cargo.toml Added Method to Allow Pipelined Asset Loading (#10565) 2023-11-16 17:47:31 +00:00
CHANGELOG.md 0.12 Changelog (#10361) 2023-11-04 01:57:29 +00:00
clippy.toml Use clippy::doc_markdown more. (#10286) 2023-10-27 22:49:02 +00:00
CODE_OF_CONDUCT.md Update CODE_OF_CONDUCT.md 2020-08-19 20:25:58 +01:00
CONTRIBUTING.md Add examples page build instructions (#8413) 2023-04-17 16:13:24 +00:00
CREDITS.md Add morph targets (#8158) 2023-06-22 20:00:01 +00:00
deny.toml check for all-features with cargo-deny (#10544) 2023-11-14 13:51:19 +00:00
LICENSE-APACHE Let the project page support GitHub's new ability to display open source licenses (#4966) 2022-06-08 17:55:57 +00:00
LICENSE-MIT Let the project page support GitHub's new ability to display open source licenses (#4966) 2022-06-08 17:55:57 +00:00
README.md Fix orphaned contributing paragraph (#10174) 2023-10-18 15:52:04 +00:00
rustfmt.toml Cargo fmt with unstable features (#1903) 2021-04-21 23:19:34 +00:00

Bevy

License Crates.io Downloads Docs CI Discord

What is Bevy?

Bevy is a refreshingly simple data-driven game engine built in Rust. It is free and open-source forever!

WARNING

Bevy is still in the early stages of development. Important features are missing. Documentation is sparse. A new version of Bevy containing breaking changes to the API is released approximately once every 3 months. We provide migration guides, but we can't guarantee migrations will always be easy. Use only if you are willing to work in this environment.

MSRV: Bevy relies heavily on improvements in the Rust language and compiler. As a result, the Minimum Supported Rust Version (MSRV) is generally close to "the latest stable release" of Rust.

Design Goals

  • Capable: Offer a complete 2D and 3D feature set
  • Simple: Easy for newbies to pick up, but infinitely flexible for power users
  • Data Focused: Data-oriented architecture using the Entity Component System paradigm
  • Modular: Use only what you need. Replace what you don't like
  • Fast: App logic should run quickly, and when possible, in parallel
  • Productive: Changes should compile quickly ... waiting isn't fun

About

  • Features: A quick overview of Bevy's features.
  • News: A development blog that covers our progress, plans and shiny new features.

Docs

  • The Bevy Book: Bevy's official documentation. The best place to start learning Bevy.
  • Bevy Rust API Docs: Bevy's Rust API docs, which are automatically generated from the doc comments in this repo.
  • Official Examples: Bevy's dedicated, runnable examples, which are great for digging into specific concepts.
  • Community-Made Learning Resources: More tutorials, documentation, and examples made by the Bevy community.

Community

Before contributing or participating in discussions with the community, you should familiarize yourself with our Code of Conduct.

  • Discord: Bevy's official discord server.
  • Reddit: Bevy's official subreddit.
  • GitHub Discussions: The best place for questions about Bevy, answered right here!
  • Bevy Assets: A collection of awesome Bevy projects, tools, plugins and learning materials.

Contributing

If you'd like to help build Bevy, check out the Contributor's Guide. For simple problems, feel free to open an issue or PR and tackle it yourself!

For more complex architecture decisions and experimental mad science, please open an RFC (Request For Comments) so we can brainstorm together effectively!

Getting Started

We recommend checking out The Bevy Book for a full tutorial.

Follow the Setup guide to ensure your development environment is set up correctly. Once set up, you can quickly try out the examples by cloning this repo and running the following commands:

# Switch to the correct version (latest release, default is main development branch)
git checkout latest
# Runs the "breakout" example
cargo run --example breakout

To draw a window with standard functionality enabled, use:

use bevy::prelude::*;

fn main(){
  App::new()
    .add_plugins(DefaultPlugins)
    .run();
}

Fast Compiles

Bevy can be built just fine using default configuration on stable Rust. However for really fast iterative compiles, you should enable the "fast compiles" setup by following the instructions here.

Libraries Used

Bevy is only possible because of the hard work put into these foundational technologies:

  • wgpu: modern / low-level / cross-platform graphics library based on the WebGPU API.
  • glam-rs: a simple and fast 3D math library for games and graphics
  • winit: cross-platform window creation and management in Rust

Bevy Cargo Features

This list outlines the different cargo features supported by Bevy. These allow you to customize the Bevy feature set for your use-case.

Third Party Plugins

Plugins are very welcome to extend Bevy's features. Guidelines are available to help integration and usage.

Thanks and Alternatives

Additionally, we would like to thank the Amethyst, macroquad, coffee, ggez, Fyrox, and Piston projects for providing solid examples of game engine development in Rust. If you are looking for a Rust game engine, it is worth considering all of your options. Each engine has different design goals, and some will likely resonate with you more than others.

This project is tested with BrowserStack.

License

Bevy is free, open source and permissively licensed! Except where noted (below and/or in individual files), all code in this repository is dual-licensed under either:

at your option. This means you can select the license you prefer! This dual-licensing approach is the de-facto standard in the Rust ecosystem and there are very good reasons to include both.

Some of the engine's code carries additional copyright notices and license terms due to their external origins. These are generally BSD-like, but exact details vary by crate: If the README of a crate contains a 'License' header (or similar), the additional copyright notices and license terms applicable to that crate will be listed. The above licensing requirement still applies to contributions to those crates, and sections of those crates will carry those license terms. The license field of each crate will also reflect this. For example, bevy_mikktspace has code under the Zlib license (as well as a copyright notice when choosing the MIT license).

The assets included in this repository (for our examples) typically fall under different open licenses. These will not be included in your game (unless copied in by you), and they are not distributed in the published bevy crates. See CREDITS.md for the details of the licenses of those files.

Your contributions

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.