bevy/benches/benches/bevy_reflect/function.rs
Gino Valente d21c7a1911
bevy_reflect: Function Overloading (Generic & Variadic Functions) (#15074)
# Objective

Currently function reflection requires users to manually monomorphize
their generic functions. For example:

```rust
fn add<T: Add<Output=T>>(a: T, b: T) -> T {
    a + b
}

// We have to specify the type of `T`:
let reflect_add = add::<i32>.into_function();
```

This PR doesn't aim to solve that problem—this is just a limitation in
Rust. However, it also means that reflected functions can only ever work
for a single monomorphization. If we wanted to support other types for
`T`, we'd have to create a separate function for each one:

```rust
let reflect_add_i32 = add::<i32>.into_function();
let reflect_add_u32 = add::<u32>.into_function();
let reflect_add_f32 = add::<f32>.into_function();
// ...
```

So in addition to requiring manual monomorphization, we also lose the
benefit of having a single function handle multiple argument types.

If a user wanted to create a small modding script that utilized function
reflection, they'd have to either:
- Store all sets of supported monomorphizations and require users to
call the correct one
- Write out some logic to find the correct function based on the given
arguments

While the first option would work, it wouldn't be very ergonomic. The
second option is better, but it adds additional complexity to the user's
logic—complexity that `bevy_reflect` could instead take on.

## Solution

Introduce [function
overloading](https://en.wikipedia.org/wiki/Function_overloading).

A `DynamicFunction` can now be overloaded with other `DynamicFunction`s.
We can rewrite the above code like so:

```rust
let reflect_add = add::<i32>
    .into_function()
    .with_overload(add::<u32>)
    .with_overload(add::<f32>);
```

When invoked, the `DynamicFunction` will attempt to find a matching
overload for the given set of arguments.

And while I went into this PR only looking to improve generic function
reflection, I accidentally added support for variadic functions as well
(hence why I use the broader term "overload" over "generic").

```rust
// Supports 1 to 4 arguments
let multiply_all = (|a: i32| a)
    .into_function()
    .with_overload(|a: i32, b: i32| a * b)
    .with_overload(|a: i32, b: i32, c: i32| a * b * c)
    .with_overload(|a: i32, b: i32, c: i32, d: i32| a * b * c * d);
```

This is simply an added bonus to this particular implementation. ~~Full
variadic support (i.e. allowing for an indefinite number of arguments)
will be added in a later PR.~~ I actually decided to limit the maximum
number of arguments to 63 to supplement faster lookups, a reduced memory
footprint, and faster cloning.

### Alternatives & Rationale

I explored a few options for handling generic functions. This PR is the
one I feel the most confident in, but I feel I should mention the others
and why I ultimately didn't move forward with them.

#### Adding `GenericDynamicFunction`

**TL;DR:** Adding a distinct `GenericDynamicFunction` type unnecessarily
splits and complicates the API.

<details>
<summary>Details</summary>

My initial explorations involved a dedicated `GenericDynamicFunction` to
contain and handle the mappings.

This was initially started back when `DynamicFunction` was distinct from
`DynamicClosure`. My goal was to not prevent us from being able to
somehow make `DynamicFunction` implement `Copy`. But once we reverted
back to a single `DynamicFunction`, that became a non-issue.

But that aside, the real problem was that it created a split in the API.
If I'm using a third-party library that uses function reflection, I have
to know whether to request a `DynamicFunction` or a
`GenericDynamicFunction`. I might not even know ahead of time which one
I want. It might need to be determined at runtime.

And if I'm creating a library, I might want a type to contain both
`DynamicFunction` and `GenericDynamicFunction`. This might not be
possible if, for example, I need to store the function in a `HashMap`.

The other concern is with `IntoFunction`. Right now `DynamicFunction`
trivially implements `IntoFunction` since it can just return itself. But
what should `GenericDynamicFunction` do? It could return itself wrapped
into a `DynamicFunction`, but then the API for `DynamicFunction` would
have to account for this. So then what was the point of having a
separate `GenericDynamicFunction` anyways?

And even apart from `IntoFunction`, there's nothing stopping someone
from manually creating a generic `DynamicFunction` through lying about
its `FunctionInfo` and wrapping a `GenericDynamicFunction`.

That being said, this is probably the "best" alternative if we added a
`Function` trait and stored functions as `Box<dyn Function>`.

However, I'm not convinced we gain much from this. Sure, we could keep
the API for `DynamicFunction` the same, but consumers of `Function` will
need to account for `GenericDynamicFunction` regardless (e.g. handling
multiple `FunctionInfo`, a ranged argument count, etc.). And for all
cases, except where using `DynamicFunction` directly, you end up
treating them all like `GenericDynamicFunction`.

Right now, if we did go with `GenericDynamicFunction`, the only major
benefit we'd gain would be saving 24 bytes. If memory ever does become
an issue here, we could swap over. But I think for the time being it's
better for us to pursue a clearer mental model and end-user ergonomics
through unification.

</details>

##### Using the `FunctionRegistry`

**TL;DR:** Having overloads only exist in the `FunctionRegistry`
unnecessarily splits and complicates the API.

<details>
<summary>Details</summary>

Another idea was to store the overloads in the `FunctionRegistry`. Users
would then just call functions directly through the registry (i.e.
`registry.call("my_func", my_args)`).

I didn't go with this option because of how it specifically relies on
the functions being registered. You'd not only always need access to the
registry, but you'd need to ensure that the functions you want to call
are even registered.

It also means you can't just store a generic `DynamicFunction` on a
type. Instead, you'll need to store the function's name and use that to
look up the function in the registry—even if it's only ever used by that
type.

Doing so also removes all the benefits of `DynamicFunction`, such as the
ability to pass it to functions accepting `IntoFunction`, modify it if
needed, and so on.

Like `GenericDynamicFunction` this introduces a split in the ecosystem:
you either store `DynamicFunction`, store a string to look up the
function, or force `DynamicFunction` to wrap your generic function
anyways. Or worse yet: have `DynamicFunction` wrap the lookup function
using `FunctionRegistryArc`.

</details>

#### Generic `ArgInfo`

**TL;DR:** Allowing `ArgInfo` and `ReturnInfo` to store the generic
information introduces a footgun when interpreting `FunctionInfo`.

<details>
<summary>Details</summary>

Regardless of how we represent a generic function, one thing is clear:
we need to be able to represent the information for such a function.

This PR does so by introducing a `FunctionInfoType` enum to wrap one or
more `FunctionInfo` values.

Originally, I didn't do this. I had `ArgInfo` and `ReturnInfo` allow for
generic types. This allowed us to have a single `FunctionInfo` to
represent our function, but then I realized that it actually lies about
our function.

If we have two `ArgInfo` that both allow for either `i32` or `u32`, what
does this tell us about our function? It turns out: nothing! We can't
know whether our function takes `(i32, i32)`, `(u32, u32)`, `(i32,
u32)`, or `(u32, i32)`.

It therefore makes more sense to just represent a function with multiple
`FunctionInfo` since that's really what it's made up of.

</details>

#### Flatten `FunctionInfo`

**TL;DR:** Flattening removes additional per-overload information some
users may desire and prevents us from adding more information in the
future.

<details>
<summary>Details</summary>

Why don't we just flatten multiple `FunctionInfo` into just one that can
contain multiple signatures?

This is something we could do, but I decided against it for a few
reasons:
- The only thing we'd be able to get rid of for each signature would be
the `name`. While not enough to not do it, it doesn't really suggest we
*have* to either.
- Some consumers may want access to the names of the functions that make
up the overloaded function. For example, to track a bug where an
undesirable function is being added as an overload. Or to more easily
locate the original function of an overload.
- We may eventually allow for more information to be stored on
`FunctionInfo`. For example, we may allow for documentation to be stored
like we do for `TypeInfo`. Consumers of this documentation may want
access to the documentation of each overload as they may provide
documentation specific to that overload.

</details>

## Testing

This PR adds lots of tests and benchmarks, and also adds to the example.

To run the tests:

```
cargo test --package bevy_reflect --all-features
```

To run the benchmarks:

```
cargo bench --bench reflect_function --all-features
```

To run the example:

```
cargo run --package bevy --example function_reflection --all-features
```

### Benchmarks

One of my goals with this PR was to leave the typical case of
non-overloaded functions largely unaffected by the changes introduced in
this PR. ~~And while the static size of `DynamicFunction` has increased
by 17% (from 136 to 160 bytes), the performance has generally stayed the
same~~ The static size of `DynamicFunction` has decreased from 136 to
112 bytes, while calling performance has generally stayed the same:

|                                     | `main` | 7d293ab | 252f3897d |
|-------------------------------------|--------|---------|-----------|
| `into/function`                     | 37 ns  | 46 ns   | 142 ns    |
| `with_overload/01_simple_overload`  | -      | 149 ns  | 268 ns    |
| `with_overload/01_complex_overload` | -      | 332 ns  | 431 ns    |
| `with_overload/10_simple_overload`  | -      | 1266 ns | 2618 ns   |
| `with_overload/10_complex_overload` | -      | 2544 ns | 4170 ns   |
| `call/function`                     | 57 ns  | 58 ns   | 61 ns     |
| `call/01_simple_overload`           | -      | 255 ns  | 242 ns    |
| `call/01_complex_overload`          | -      | 595 ns  | 431 ns    |
| `call/10_simple_overload`           | -      | 740 ns  | 699 ns    |
| `call/10_complex_overload`          | -      | 1824 ns | 1618 ns   |

For the overloaded function tests, the leading number indicates how many
overloads there are: `01` indicates 1 overload, `10` indicates 10
overloads. The `complex` cases have 10 unique generic types and 10
arguments, compared to the `simple` 1 generic type and 2 arguments.

I aimed to prioritize the performance of calling the functions over
creating them, hence creation speed tends to be a bit slower.

There may be other optimizations we can look into but that's probably
best saved for a future PR.

The important bit is that the standard ~~`into/function`~~ and
`call/function` benchmarks show minimal regressions. Since the latest
changes, `into/function` does have some regressions, but again the
priority was `call/function`. We can probably optimize `into/function`
if needed in the future.

---

## Showcase

Function reflection now supports [function
overloading](https://en.wikipedia.org/wiki/Function_overloading)! This
can be used to simulate generic functions:

```rust
fn add<T: Add<Output=T>>(a: T, b: T) -> T {
    a + b
}

let reflect_add = add::<i32>
    .into_function()
    .with_overload(add::<u32>)
    .with_overload(add::<f32>);

let args = ArgList::default().push_owned(25_i32).push_owned(75_i32);  
let result = func.call(args).unwrap().unwrap_owned();  
assert_eq!(result.try_take::<i32>().unwrap(), 100);  
  
let args = ArgList::default().push_owned(25.0_f32).push_owned(75.0_f32);  
let result = func.call(args).unwrap().unwrap_owned();  
assert_eq!(result.try_take::<f32>().unwrap(), 100.0);
```

You can also simulate variadic functions:

```rust
#[derive(Reflect, PartialEq, Debug)]
struct Player {
    name: Option<String>,
    health: u32,
}

// Creates a `Player` with one of the following:  
// - No name and 100 health  
// - A name and 100 health  
// - No name and custom health  
// - A name and custom health
let create_player = (|| Player {
        name: None,
        health: 100,
    })
    .into_function()
    .with_overload(|name: String| Player {
        name: Some(name),
        health: 100,
    })
    .with_overload(|health: u32| Player {
        name: None,
        health
    })
    .with_overload(|name: String, health: u32| Player {
        name: Some(name),
        health,
    });

let args = ArgList::default()
    .push_owned(String::from("Urist"))
    .push_owned(55_u32);
    
let player = create_player
    .call(args)
    .unwrap()
    .unwrap_owned()
    .try_take::<Player>()
    .unwrap();
	
assert_eq!(
    player,
    Player {
        name: Some(String::from("Urist")),
        health: 55
    }
);
```
2024-12-10 01:51:47 +00:00

388 lines
15 KiB
Rust

use bevy_reflect::func::{ArgList, IntoFunction, IntoFunctionMut, TypedFunction};
use criterion::{criterion_group, criterion_main, BatchSize, Criterion};
criterion_group!(benches, typed, into, call, overload, clone);
criterion_main!(benches);
fn add(a: i32, b: i32) -> i32 {
a + b
}
fn typed(c: &mut Criterion) {
c.benchmark_group("typed")
.bench_function("function", |b| {
b.iter(|| add.get_function_info());
})
.bench_function("closure", |b| {
let capture = 25;
let closure = |a: i32| a + capture;
b.iter(|| closure.get_function_info());
})
.bench_function("closure_mut", |b| {
let mut capture = 25;
let closure = |a: i32| capture += a;
b.iter(|| closure.get_function_info());
});
}
fn into(c: &mut Criterion) {
c.benchmark_group("into")
.bench_function("function", |b| {
b.iter(|| add.into_function());
})
.bench_function("closure", |b| {
let capture = 25;
let closure = |a: i32| a + capture;
b.iter(|| closure.into_function());
})
.bench_function("closure_mut", |b| {
let mut _capture = 25;
let closure = move |a: i32| _capture += a;
b.iter(|| closure.into_function_mut());
});
}
fn call(c: &mut Criterion) {
c.benchmark_group("call")
.bench_function("trait_object", |b| {
b.iter_batched(
|| Box::new(add) as Box<dyn Fn(i32, i32) -> i32>,
|func| func(75, 25),
BatchSize::SmallInput,
);
})
.bench_function("function", |b| {
let add = add.into_function();
b.iter_batched(
|| ArgList::new().push_owned(75_i32).push_owned(25_i32),
|args| add.call(args),
BatchSize::SmallInput,
);
})
.bench_function("closure", |b| {
let capture = 25;
let add = (|a: i32| a + capture).into_function();
b.iter_batched(
|| ArgList::new().push_owned(75_i32),
|args| add.call(args),
BatchSize::SmallInput,
);
})
.bench_function("closure_mut", |b| {
let mut capture = 25;
let mut add = (|a: i32| capture += a).into_function_mut();
b.iter_batched(
|| ArgList::new().push_owned(75_i32),
|args| add.call(args),
BatchSize::SmallInput,
);
});
}
fn overload(c: &mut Criterion) {
fn add<T: std::ops::Add<Output = T>>(a: T, b: T) -> T {
a + b
}
fn complex<T0, T1, T2, T3, T4, T5, T6, T7, T8, T9>(
_: T0,
_: T1,
_: T2,
_: T3,
_: T4,
_: T5,
_: T6,
_: T7,
_: T8,
_: T9,
) {
}
c.benchmark_group("with_overload")
.bench_function("01_simple_overload", |b| {
b.iter_batched(
|| add::<i8>.into_function(),
|func| func.with_overload(add::<i16>),
BatchSize::SmallInput,
);
})
.bench_function("01_complex_overload", |b| {
b.iter_batched(
|| complex::<i8, i16, i32, i64, i128, u8, u16, u32, u64, u128>.into_function(),
|func| {
func.with_overload(complex::<i16, i32, i64, i128, u8, u16, u32, u64, u128, i8>)
},
BatchSize::SmallInput,
);
})
.bench_function("03_simple_overload", |b| {
b.iter_batched(
|| add::<i8>.into_function(),
|func| {
func.with_overload(add::<i16>)
.with_overload(add::<i32>)
.with_overload(add::<i64>)
},
BatchSize::SmallInput,
);
})
.bench_function("03_complex_overload", |b| {
b.iter_batched(
|| complex::<i8, i16, i32, i64, i128, u8, u16, u32, u64, u128>.into_function(),
|func| {
func.with_overload(complex::<i16, i32, i64, i128, u8, u16, u32, u64, u128, i8>)
.with_overload(complex::<i32, i64, i128, u8, u16, u32, u64, u128, i8, i16>)
.with_overload(complex::<i64, i128, u8, u16, u32, u64, u128, i8, i16, i32>)
},
BatchSize::SmallInput,
);
})
.bench_function("10_simple_overload", |b| {
b.iter_batched(
|| add::<i8>.into_function(),
|func| {
func.with_overload(add::<i16>)
.with_overload(add::<i32>)
.with_overload(add::<i64>)
.with_overload(add::<i128>)
.with_overload(add::<u8>)
.with_overload(add::<u16>)
.with_overload(add::<u32>)
.with_overload(add::<u64>)
.with_overload(add::<u128>)
},
BatchSize::SmallInput,
);
})
.bench_function("10_complex_overload", |b| {
b.iter_batched(
|| complex::<i8, i16, i32, i64, i128, u8, u16, u32, u64, u128>.into_function(),
|func| {
func.with_overload(complex::<i16, i32, i64, i128, u8, u16, u32, u64, u128, i8>)
.with_overload(complex::<i32, i64, i128, u8, u16, u32, u64, u128, i8, i16>)
.with_overload(complex::<i64, i128, u8, u16, u32, u64, u128, i8, i16, i32>)
.with_overload(complex::<i128, u8, u16, u32, u64, u128, i8, i16, i32, i64>)
.with_overload(complex::<u8, u16, u32, u64, u128, i8, i16, i32, i64, i128>)
.with_overload(complex::<u16, u32, u64, u128, i8, i16, i32, i64, i128, u8>)
.with_overload(complex::<u32, u64, u128, i8, i16, i32, i64, i128, u8, u16>)
.with_overload(complex::<u64, u128, i8, i16, i32, i64, i128, u8, u16, u32>)
.with_overload(complex::<u128, i8, i16, i32, i64, i128, u8, u16, u32, u64>)
},
BatchSize::SmallInput,
);
})
.bench_function("01_nested_simple_overload", |b| {
b.iter_batched(
|| add::<i8>.into_function(),
|func| func.with_overload(add::<i16>),
BatchSize::SmallInput,
);
})
.bench_function("03_nested_simple_overload", |b| {
b.iter_batched(
|| add::<i8>.into_function(),
|func| {
func.with_overload(
add::<i16>
.into_function()
.with_overload(add::<i32>.into_function().with_overload(add::<i64>)),
)
},
BatchSize::SmallInput,
);
})
.bench_function("10_nested_simple_overload", |b| {
b.iter_batched(
|| add::<i8>.into_function(),
|func| {
func.with_overload(
add::<i16>.into_function().with_overload(
add::<i32>.into_function().with_overload(
add::<i64>.into_function().with_overload(
add::<i128>.into_function().with_overload(
add::<u8>.into_function().with_overload(
add::<u16>.into_function().with_overload(
add::<u32>.into_function().with_overload(
add::<u64>
.into_function()
.with_overload(add::<u128>),
),
),
),
),
),
),
),
)
},
BatchSize::SmallInput,
);
});
c.benchmark_group("call_overload")
.bench_function("01_simple_overload", |b| {
b.iter_batched(
|| {
(
add::<i8>.into_function().with_overload(add::<i16>),
ArgList::new().push_owned(75_i8).push_owned(25_i8),
)
},
|(func, args)| func.call(args),
BatchSize::SmallInput,
);
})
.bench_function("01_complex_overload", |b| {
b.iter_batched(
|| {
(
complex::<i8, i16, i32, i64, i128, u8, u16, u32, u64, u128>
.into_function()
.with_overload(
complex::<i16, i32, i64, i128, u8, u16, u32, u64, u128, i8>,
),
ArgList::new()
.push_owned(1_i8)
.push_owned(2_i16)
.push_owned(3_i32)
.push_owned(4_i64)
.push_owned(5_i128)
.push_owned(6_u8)
.push_owned(7_u16)
.push_owned(8_u32)
.push_owned(9_u64)
.push_owned(10_u128),
)
},
|(func, args)| func.call(args),
BatchSize::SmallInput,
);
})
.bench_function("03_simple_overload", |b| {
b.iter_batched(
|| {
(
add::<i8>
.into_function()
.with_overload(add::<i16>)
.with_overload(add::<i32>)
.with_overload(add::<i64>),
ArgList::new().push_owned(75_i32).push_owned(25_i32),
)
},
|(func, args)| func.call(args),
BatchSize::SmallInput,
);
})
.bench_function("03_complex_overload", |b| {
b.iter_batched(
|| {
(
complex::<i8, i16, i32, i64, i128, u8, u16, u32, u64, u128>
.into_function()
.with_overload(
complex::<i16, i32, i64, i128, u8, u16, u32, u64, u128, i8>,
)
.with_overload(
complex::<i32, i64, i128, u8, u16, u32, u64, u128, i8, i16>,
)
.with_overload(
complex::<i64, i128, u8, u16, u32, u64, u128, i8, i16, i32>,
),
ArgList::new()
.push_owned(1_i32)
.push_owned(2_i64)
.push_owned(3_i128)
.push_owned(4_u8)
.push_owned(5_u16)
.push_owned(6_u32)
.push_owned(7_u64)
.push_owned(8_u128)
.push_owned(9_i8)
.push_owned(10_i16),
)
},
|(func, args)| func.call(args),
BatchSize::SmallInput,
);
})
.bench_function("10_simple_overload", |b| {
b.iter_batched(
|| {
(
add::<i8>
.into_function()
.with_overload(add::<i16>)
.with_overload(add::<i32>)
.with_overload(add::<i64>)
.with_overload(add::<i128>)
.with_overload(add::<u8>)
.with_overload(add::<u16>)
.with_overload(add::<u32>)
.with_overload(add::<u64>)
.with_overload(add::<u128>),
ArgList::new().push_owned(75_u8).push_owned(25_u8),
)
},
|(func, args)| func.call(args),
BatchSize::SmallInput,
);
})
.bench_function("10_complex_overload", |b| {
b.iter_batched(
|| {
(
complex::<i8, i16, i32, i64, i128, u8, u16, u32, u64, u128>
.into_function()
.with_overload(
complex::<i16, i32, i64, i128, u8, u16, u32, u64, u128, i8>,
)
.with_overload(
complex::<i32, i64, i128, u8, u16, u32, u64, u128, i8, i16>,
)
.with_overload(
complex::<i64, i128, u8, u16, u32, u64, u128, i8, i16, i32>,
)
.with_overload(
complex::<i128, u8, u16, u32, u64, u128, i8, i16, i32, i64>,
)
.with_overload(
complex::<u8, u16, u32, u64, u128, i8, i16, i32, i64, i128>,
)
.with_overload(
complex::<u16, u32, u64, u128, i8, i16, i32, i64, i128, u8>,
)
.with_overload(
complex::<u32, u64, u128, i8, i16, i32, i64, i128, u8, u16>,
)
.with_overload(
complex::<u64, u128, i8, i16, i32, i64, i128, u8, u16, u32>,
)
.with_overload(
complex::<u128, i8, i16, i32, i64, i128, u8, u16, u32, u64>,
),
ArgList::new()
.push_owned(1_u8)
.push_owned(2_u16)
.push_owned(3_u32)
.push_owned(4_u64)
.push_owned(5_u128)
.push_owned(6_i8)
.push_owned(7_i16)
.push_owned(8_i32)
.push_owned(9_i64)
.push_owned(10_i128),
)
},
|(func, args)| func.call(args),
BatchSize::SmallInput,
);
});
}
fn clone(c: &mut Criterion) {
c.benchmark_group("clone").bench_function("function", |b| {
let add = add.into_function();
b.iter(|| add.clone());
});
}