Bevy ECS Evolution | Tainted Coders

The goal of this post is to explore each major version change to the ECS framework of bevy and explore some of its original roots.

V1

The initial version of bevy's ECS was a fork of the hecs library and heavily inspired by Legion.

Bevy ECS V1, hecs, Legion, Flecs, and Unity DOTS are all archetypal ECS.

hecs is short for "A handy ECS". Its goals were:

fast traversals
a simple interface
a small dependency closure
exclusion of externally-implementable functionality

And when we look at some hecs code we can see the influence clearly:

let mut world = World::new();
// Nearly any type can be used as a component with zero boilerplate
let a = world.spawn((123, true, "abc"));
let b = world.spawn((42, false));
// Systems can be simple for loops
for (id, (number, &flag)) in world.query_mut::<(&mut i32, &bool)>() {
  if flag { *number *= 2; }
}
// Random access is simple and safe
assert_eq!(*world.get::<&i32>(a).unwrap(), 246);
assert_eq!(*world.get::<&i32>(b).unwrap(), 42);

And here is some modern (0.10) bevy code expressing something similar:

use bevy_ecs::prelude::*;

#[derive(Component)]
struct Position { x: f32, y: f32 }
#[derive(Component)]
struct Velocity { x: f32, y: f32 }

let mut world = World::new();

let entity = world
    .spawn((Position { x: 0.0, y: 0.0 }, Velocity { x: 1.0, y: 0.0 }))
    .id();

let entity_ref = world.entity(entity);
let position = entity_ref.get::<Position>().unwrap();
let velocity = entity_ref.get::<Velocity>().unwrap();

hecs internally tracks groups of entities which all have the same components. Each group has a dense, contiguous array for each type of component. When a system accesses all entities with a certain set of components, a fast linear traversal can be made through each group having a superset of those components. This is effectively a columnar database, and has the same benefits: the CPU can accurately predict memory accesses, bypassing unneeded data, maximizing cache use and minimizing latency.

hecs uses components, bundles, archetypes, worlds, and commands.

Bevy's initial fork was focused on a couple of key innovations:

Adding systems as rust functions
Adding global and local resources as a new type of components
Adding a scheduler to run the core game loop (single threaded)
Removing generational entity ids and replacing them with UUIDs for serialization
Adding the ability to query entity indices directly
Exposing more of the core of hecs as public to build higher level apis
About twice as much performance as hecs

There was also inspiration taken from Legion (boxed system traits + schedules + resources), and Shipyard (system functions + using archetypes to store resources + queryable entities)

At this stage there was actually only two types of functional systems: foreach systems and query systems.

Of particular note is how they implemented the original IntoSystem macros that form the core of bevy systems as rust functions.

Most of the inspiration comes from the original Legion implementation that was removed

Here is the original system definition:

pub trait System: Send + Sync {
    fn name(&self) -> Cow<'static, str>;
    fn id(&self) -> SystemId;
    fn thread_local_execution(&self) -> ThreadLocalExecution;
    fn run(&mut self, world: &World, resources: &Resources);
    fn run_thread_local(&mut self, world: &mut World, resources: &mut Resources);
    fn initialize(&mut self, _resources: &mut Resources) {}


// Here is what adding systems to an app looked like:

App::build()
  .add_system(my_system.system())
  .add_system(my_thread_local_system.thread_local_system())
  .run()

The original functional system implementation had a few problems:

system params must follow an arbitrary set of order rules
hard to understand
slow to compile

Its not until 0.3.0 when these were worked out. Its around then that we drop the distinction between "foreach" systems and query systems.

There was some early quirks, like Query::iter/entity being quite awkward:

for (a, b) in &mut query.iter() {}
// Or this:
for (a, b) in query.iter().iter() {}

// This is query.get in modern bevy
if let Ok(mut result) = query.entity(entity) {
  if let Some((a, b)) = result.get() { }
}

One of the problems encountered early on was the Transform was a set of components that could be queried independently (Translation, Rotation and Scale as separate components).

This helped with parallelism but ended up being unweildly to have the output be one frame behind constantly.

Taking some more inspiration from Legion the Transform we know today was born.

See this discussion for more information.

Bevy tasks was also added in place of the original rayon integration for multithreading.

Change detection also comes up.

Generational indexes on entities made a comeback.

App system ergonomics got vastly improved. Oh wait, due to safety it got rolled back temporarily.

The algorithm for parallel execution gets a v2, it looks like:

1. Evaluate run criteria of system sets.
2. If any of the sets have changed, rebuild the cached scheduling data.
3. Run exclusive systems that want to be at the start of stage.
4. If world's archetypes were changed since the last time parallel systems were
   ran, update their access and the relevant part of scheduling data.
5. Enter outer compute pool scope.
6. Prepare parallel systems - spawn tasks, reset counters, queue systems with
   no dependencies, etc.
7. Try running a queued thread-local system. Queue its dependants if this was
   the last system they were waiting on.
8. Enter inner compute pool scope.
9. Try starting some queued thread-agnostic systems.
10. See if any thread-agnostic systems have finished, queue their dependants
    if those were the last systems they were waiting on.
11. Exit inner scope.
12. If there are any queued or running systems, continue from step 7.
13. Exit outer scope.
14. Merge in command buffers.
15. Run exclusive systems that want to be at the end of stage.
16. Re-evaluate run criteria, continue from step 3 if needed.

Reflection got added to help with scripting, serialization and improving the ergonomics by not constantly requiring end users to derive too many traits.

Initially, schedules had quite a few limitations:

Only one Schedule allowed
Very static: you are limited to using the tools we gave you (stages are lists of systems, you can add stages to schedules)
Couldn't switch between schedules at runtime
Couldn't easily support "fixed timestep" scenarios

Schedule V2 fixed these. This is where we get some of the modern bevy ideas like nested schedules, system stages, run criteria and app states.

V2

The overall goal was to improve the performance and versatility of Bevy ECS.

The implementation of World gets an overhaul and breaks free from the original hecs definition.

Improvements on the ECS are focused around generalizing the storage and retrieval of components to handle more use cases:

"Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance."

Archetypal ECS:

Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity.
Each "archetype" has its own table. Adding/removing an entity's component changes the archetype.
Enables super-fast Query iteration due to its cache-friendly data layout
Comes at the cost of more expensive add/remove operations for an entity's components, because all components need to be copied to the new archetype's "table"

Sparse Set ECS:

Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids)
Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array.
Adding/removing components is a cheap, constant time operation

Bevy ECS V2 introduced a hybrid storage model where we use Table for archetype based storage and SparseSet for rapidly updating components.

Archetypes become simply metadata, they store:

The ComponentId of each of the Archetype components (and that component's storage type)

Archetypes are uniquely defined by their component layouts
For example: entities with "table" components [A, B, C] and "sparse set" components [D, E] will always be in the same archetype.

The TableId associated with the archetype

For now each archetype has exactly one table (which can have no components),
There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it:
- Ex: an entity with "table storage" components [A, B, C] and "sparse set" components [D, E] will share the same [A, B, C] table as an entity with [A, B, C] table component and [F] sparse set components.
- This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later)

A list of entities that are in the archetype and the row id of the table they are in ArchetypeComponentId

unique densely packed identifiers for (ArchetypeId, ComponentId) pairs
used by the schedule executor for cheap system access control
"Archetype Graph Edges"

Archetypes are usually quite expensive to compute. Normally querying becomes slower as the number of archetypes goes up.

To cache the results Bevy starts building a graph of them. Bundles are the preferred way to add components as they reduce the amount this cache needs to be refreshed.

To achieve this Query and SystemParam are now stateful, holding information that helps cache their results and ensure proper initialization.

Components finally got added which gives world.components() and stores mappings from ComponentId to ComponentInfo as well as TypeId to TypeInfo.

The only difference between Resource and Component in V1 of Bevy was that a Resource was required to be unique. Now it gets merged into World, with a familiar API and the difference between them becomes even more slight:

world.insert_resource(1);
world.insert_resource(2.0);
let a = world.get_resource::<i32>().unwrap();
let mut b = world.get_resource_mut::<f64>().unwrap();
*b = 3.0;

They have their own ComponentId (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type), and they are stored in a special "resource archetype".

We also got our familiar builder pattern for entities:

// spawn now takes no inputs and returns an EntityMut
let entity = world.spawn()
    .insert(A) // insert a single component into the entity
    .insert_bundle((B, C)) // insert a bundle of components into the entity
    .id() // id returns the Entity id

Around now we get the buearacratic machinery that Bevy is famous for up and running.

System sets and run criteria get a V2 update which allows us to:

Reuse run criteria within a stage.
Pipe output of one run criteria as input to another.
Assign labels, dependencies, run criteria, and ambiguity sets to many systems at the same time.

V3

Scheduler V3 Part 1
Scheduler V3 Part 2
- Schedules are now stored in a publicly-accessible resource in the world so you could add systems to a schedule (one that isn't currently running) while the app is running (removal not allowed)
- Current design is that you can change the field from wherever you can access the Schedules resource, and then the next time App::update is called it'll run the schedule associated with that label
One Shot Systems Part 1
One Shot Systems Part 2
CSS Grid Layout
Assets V2
Runtime Query Generation Part 1
Runtime Query Generation Part 2
Runtime Query Generation Part 3

V4

Configurable and more reliable events
Changes to Plugin to make it more configurable
WorldQueryData vs WorldQueryFilter split
One-shot systems
Dynamic components
Runtime insertion / removal of systems

Future work

Relations
Entity niching
Scoped EntityMut equivalent for WorldQuery