Bevy is an archetypal ECS. Archetypes enable efficient storage and processing of entities with similar behavior by grouping them based on their component composition.
An Archetype
is a unique combination of components stored on a Table
.
By placing components that are accessed together into one memory layout we
are helping our memory access be more efficient.
When a query is used as a system
parameter it calculates the matching Archetype
for the set of components it is
asking for. This lets Bevy calculate which systems can run in parallel (see
below).
When you add a component to an entity you are moving that entity to a new archetype.
A world has only one Archetype
for
each unique combination of components.
An Archetype
stores some metadata about itself:
// https://github.com/bevyengine/bevy/blob/60773e6787d177e97458f9fcf118985906762b2a/crates/bevy_ecs/src/archetype.rs#L307
pub struct Archetype {
id: ArchetypeId,
table_id: TableId,
edges: Edges,
entities: Vec<ArchetypeEntity>,
components: ImmutableSparseSet<ComponentId, ArchetypeComponentInfo>,
}
Archetypes
are stored in a particular world and are only locally unique to that
World
:
// https://github.com/bevyengine/bevy/blob/60773e6787d177e97458f9fcf118985906762b2a/crates/bevy_ecs/src/archetype.rs#L615
pub struct Archetypes {
pub(crate) archetypes: Vec<Archetype>,
pub(crate) archetype_component_count: usize,
archetype_ids: bevy_utils::HashMap<ArchetypeIdentity, ArchetypeId>,
}
Starting without archetypes
Imagine we have a naive implementation of an ECS that stores components as arrays and entities as the index:
Entity ID | Health | Player | Position | Enemy |
---|---|---|---|---|
1 | 100.0 | X | (3, 2) | |
2 | 72.0 | (3, 1) | X | |
3 | 40.0 | X | (4, 2) | |
4 | 20.0 | (3, 0) | X |
The gap in player and the gap in entity is very awkward. If we wanted to write vectorized code to work on these arrays they would present a serious problem.
“Vectorized code” means code that operates on entire arrays or vectors of data at once, rather than processing individual elements sequentially.
Vectorized operations can be performed efficiently using hardware optimizations like SIMD (Single Instruction, Multiple Data) instructions. These instructions allow simultaneous execution of the same operation on multiple data elements, often resulting in improved performance.
However, in the given scenario, because the entity ids do not match up with the
array indices (our empty spaces), it becomes challenging to write vectorized code that requires
both arrays of Player
and Enemy
.
Vectorized operations usually rely on the assumption that corresponding elements in different arrays have the same index. If the entity ids in A and B are not aligned with the array indices, it becomes difficult to perform operations that depend on matching elements between the two arrays.
Improving the layout of our components
What if instead we stored the components how they are likely to be used?
Entity ID | Health | Player | Position |
---|---|---|---|
1 | 100.0 | X | (3, 2) |
3 | 40.0 | X | (4, 2) |
Entity ID | Health | Enemy | Position |
---|---|---|---|
2 | 100.0 | X | (3, 2) |
4 | 20.0 | X | (3, 0) |
Now we have two tables, but each table forms a contiguous array of elements that can be optimized upon retrieval.
The two tables are each a different Archetype
. The first Archetype
is made
up of Health
, Player
, and Position
. The second is Health
, Enemy
and
Position
.
So we are no longer storing our components in simple arrays of the same type. We
are storing our components in arrays of the same type, but inside of different
tables representing a “type” (as in the sum of all its components) of an entity.
We shall call this “type” an Archetype
.
Archetypes help find disjointed queries
A Component
could be present in many Archetype
. This is done by using a
ArchetypeComponentId
and this many to many relationship is how the parallel
scheduler figures out disjointed read-only queries that can be run at the same
time.
At first glance you would think this shouldn’t be able to run in parallel:
fn move_players(
player_positions: Query<&mut Position, With<Player>>,
) {
// ...
}
fn move_enemies(
enemy_positions: Query<&mut Position, Without<Player>>
) {
// ...
}
If components were simply stored and accessed on giant Vec<T>
then the
player_positions
will have already borrowed the Position
components and
enemy_positions
should not run in parallel.
However because we have one table for each Archetype
they won’t conflict and
can be run in parallel even though both queries return &mut Position
. This is
because their ArchetypeComponentId
would be different even though the
ComponentId
of the Position
would be the same.
In other words each Component
has only one ComponentId
but can have many
ArchetypeComponentId
. It would be similar to a many-to-many relationship
between Component
and Archetype
in a SQL database:
Component <-- ComponentId <-- ArchetypeComponentId --> ArchetypeId --> Archetype
When and how do archetypes get updated?
An Archetype
comes into existence when we spawn or insert a Bundle
.
Given a set of components stored in either a Table
or SparseSet
or some
combination of the two, Bevy will call Archetypes::get_id_or_insert
which
allocates or finds a TableId
which it will use to create a new Archetype
.
When we add or remove bundles from an Entity
it moves it to a new Archetype
table.
These moves can be quite expensive so Bevy caches these moves in Edges
. Next
time a bundle gets added we can fetch for a matching edge and find the target
Archetype
to move it to without computing it from scratch.
How does bevy know our archetypes have changed?
Everytime we add or remove components the ArchetypeGeneration
of our
Archetypes
changes:
// https://github.com/bevyengine/bevy/blob/60773e6787d177e97458f9fcf118985906762b2a/crates/bevy_ecs/src/archetype.rs#L615
// An opaque generational id that changes every time the set of [`Archetypes`] changes.
#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd)]
pub struct ArchetypeGeneration(usize);
impl Archetypes {
// ...
// Returns the current archetype generation. This is an ID indicating the current set of archetypes
// that are registered with the world.
#[inline]
pub fn generation(&self) -> ArchetypeGeneration {
ArchetypeGeneration(self.archetypes.len())
}
// ...
}
We use this generation when we have to update our archetype indices so that when we query for things they are where they are supposed to be:
// See: https://github.com/bevyengine/bevy/blob/main/crates/bevy_ecs/src/query/state.rs#L177
// See: https://github.com/bevyengine/bevy/blob/main/crates/bevy_ecs/src/system/function_system.rs#L489
// Something similar happens in QueryState implementations
impl<Param: SystemParam> SystemState<Param> {
// ...
#[inline]
pub fn update_archetypes_unsafe_world_cell(&mut self, world: UnsafeWorldCell) {
let archetypes = world.archetypes();
let new_generation = archetypes.generation();
let old_generation = std::mem::replace(&mut self.archetype_generation, new_generation);
let archetype_index_range = old_generation.value()..new_generation.value();
for archetype_index in archetype_index_range {
Param::new_archetype(
&mut self.param_state,
&archetypes[ArchetypeId::new(archetype_index)],
&mut self.meta,
);
}
}
// ...
}