Bevy Archetypes
Bevy is an archetypal ECS. Archetypes enable efficient storage and processing of entities with similar behavior by grouping them based on their component composition.
An Archetype
is a unique combination of components. Bevy stores these combinations on a Table
.
By placing components that are accessed together in the same memory location we are helping our memory access be more efficient.
When a query is used as a system parameter Bevy calculates the matching Archetype
for the set of components it is asking for. This lets Bevy calculate which systems can run in parallel (see below).
When you add a component to an entity you are moving that entity to a new archetype.
A world has only one Archetype
for each unique combination of components.
An Archetype
stores some metadata about itself:
// https://docs.rs/bevy/latest/bevy/ecs/archetype/struct.Archetype.html
pub struct Archetype {
id: ArchetypeId,
table_id: TableId,
edges: Edges,
entities: Vec<ArchetypeEntity>,
components: ImmutableSparseSet<ComponentId, ArchetypeComponentInfo>,
}
Archetypes
are stored in a particular world and are only locally unique to that World
:
// https://docs.rs/bevy/latest/bevy/ecs/archetype/struct.Archetypes.html
pub struct Archetypes {
pub(crate) archetypes: Vec<Archetype>,
archetype_component_count: usize,
by_components: bevy_utils::HashMap<ArchetypeComponents, ArchetypeId>
}
Starting without archetypes
Imagine we have a naive implementation of an ECS that stores components as arrays and entities as the index:
Entity ID | Health | Player | Position | Enemy |
---|---|---|---|---|
1 | 100.0 | X | (3, 2) | |
2 | 72.0 | (3, 1) | X | |
3 | 40.0 | X | (4, 2) | |
4 | 20.0 | (3, 0) | X |
The gap in player and the gap in entity is very awkward for our memory access. If we wanted to write vectorized code to work on these arrays then the empty values would present a serious problem.
"Vectorized code" means code that operates on entire arrays or vectors of data at once, rather than processing individual elements sequentially.
Vectorized operations can be performed efficiently using hardware optimizations like SIMD (Single Instruction, Multiple Data) instructions. These instructions allow simultaneous execution of the same operation on multiple data elements, often resulting in improved performance.
However, in the given scenario, because the entity ids do not match up with the array indices (our empty spaces), it becomes challenging to write vectorized code that requires both arrays of Player
and Enemy
.
Vectorized operations usually rely on the assumption that corresponding elements in different arrays have the same index. If the entity ids in A and B are not aligned with the array indices, it becomes difficult to perform operations that depend on matching elements between the two arrays.
Improving the layout of our components
What if instead we stored the components how they are likely to be used?
Entity ID | Health | Player | Position |
---|---|---|---|
1 | 100.0 | X | (3, 2) |
3 | 40.0 | X | (4, 2) |
Entity ID | Health | Enemy | Position |
---|---|---|---|
2 | 100.0 | X | (3, 2) |
4 | 20.0 | X | (3, 0) |
Now we have two tables, but each table forms a contiguous array of elements that can be optimized upon retrieval.
The two tables are each a different Archetype
. One for the Player
and one for the Enemy
.
So we are storing our components in arrays of the same type, but inside of different tables representing a type (as in the sum of all its components) of an entity. We shall call this "type" an Archetype
.
Archetypes help find disjointed queries
A Component
could be present in many Archetype
. This is done by using a ArchetypeComponentId
and this many to many relationship is how the parallel scheduler figures out disjointed read-only queries that can be run at the same time.
At first glance you would think this shouldn't be able to run in parallel:
fn move_players(
player_positions: Query<&mut Position, With<Player>>,
) {
// ...
}
fn move_enemies(
enemy_positions: Query<&mut Position, Without<Player>>
) {
// ...
}
If components were simply stored and accessed on giant Vec<T>
then the player_positions
will have already borrowed the Position
components and enemy_positions
should not run in parallel.
However because we have one table for each Archetype
they won't conflict and can be run in parallel even though both queries return &mut Position
.
This is because their ArchetypeComponentId
would be different even though the ComponentId
of the Position
would be the same.
In other words, each Component
has only one ComponentId
but can have many ArchetypeComponentId
. It would be similar to a many-to-many relationship between Component
and Archetype
in a SQL database:
Component <-- ComponentId <-- ArchetypeComponentId --> ArchetypeId --> Archetype
When and how do archetypes get updated?
An Archetype
comes into existence when we spawn or insert a Bundle
.
Given a set of components stored in either a Table
or SparseSet
or some combination of the two, Bevy will call Archetypes::get_id_or_insert
which allocates or finds a TableId
which it will use to create a new Archetype
.
When we add or remove bundles from an Entity
it moves it to a new Archetype
table.
These moves can be quite expensive so Bevy caches these moves in Edges
. Next time a bundle gets added we can fetch for a matching edge and find the target Archetype
to move it to without computing it from scratch.
How does bevy know our archetypes have changed?
Every time we add or remove components the ArchetypeGeneration
of our Archetypes
changes.
Bevy uses the length of the current archetypes to create a generational index that guarantees a unique index as we add or remove archetypes.
We use this generation when we have to update our archetype indices so that when we query for things they are where they are supposed to be.