This commit makes skinned meshes batchable on platforms other than WebGL
2. On supported platforms, it replaces the two uniform buffers used for
joint matrices with a pair of storage buffers containing all matrices
for all skinned meshes packed together. The indices into the buffer are
stored in the mesh uniform and mesh input uniform. The GPU mesh
preprocessing step copies the indices in if that step is enabled.
On the `many_foxes` demo, I observed a frame time decrease from 15.470ms
to 11.935ms. This is the result of reducing the `submit_graph_commands`
time from an average of 5.45ms to 0.489ms, an 11x speedup in that
portion of rendering.
![Screenshot 2024-12-01
192838](https://github.com/user-attachments/assets/7d2db997-8939-466e-8b9e-050d4a6a78ee)
This is what the profile looks like for `many_foxes` after these
changes.
![Screenshot 2024-12-01
193026](https://github.com/user-attachments/assets/68983fc3-01b8-41fd-835e-3d93cb65d0fa)
---------
Co-authored-by: François Mockers <mockersf@gmail.com>