Tainted \\ Coders

Bevy Archetypes

Last updated:

Bevy is an archetypal ECS. Archetypes enable efficient storage and processing of entities with similar behavior by grouping them based on their component composition.

An Archetype is a unique combination of components stored on a Table. By placing components that are accessed together into one memory layout we are helping our memory access be more efficient.

When a query is used as a system parameter it calculates the matching Archetype for the set of components it is asking for. This lets Bevy calculate which systems can run in parallel (see below).

When you add a component to an entity you are moving that entity to a new archetype.

A world has only one Archetype for each unique combination of components.

An Archetype stores some metadata about itself:

// https://github.com/bevyengine/bevy/blob/60773e6787d177e97458f9fcf118985906762b2a/crates/bevy_ecs/src/archetype.rs#L307
pub struct Archetype {
    id: ArchetypeId,
    table_id: TableId,
    edges: Edges,
    entities: Vec<ArchetypeEntity>,
    components: ImmutableSparseSet<ComponentId, ArchetypeComponentInfo>,
}

Archetypes are stored in a particular world and are only locally unique to that World:

// https://github.com/bevyengine/bevy/blob/60773e6787d177e97458f9fcf118985906762b2a/crates/bevy_ecs/src/archetype.rs#L615
pub struct Archetypes {
    pub(crate) archetypes: Vec<Archetype>,
    pub(crate) archetype_component_count: usize,
    archetype_ids: bevy_utils::HashMap<ArchetypeIdentity, ArchetypeId>,
}

Starting without archetypes

Imagine we have a naive implementation of an ECS that stores components as arrays and entities as the index:

Entity ID Health Player Position Enemy
1 100.0 X (3, 2)  
2 72.0   (3, 1) X
3 40.0 X (4, 2)  
4 20.0   (3, 0) X

The gap in player and the gap in entity is very awkward. If we wanted to write vectorized code to work on these arrays they would present a serious problem.

“Vectorized code” means code that operates on entire arrays or vectors of data at once, rather than processing individual elements sequentially.

Vectorized operations can be performed efficiently using hardware optimizations like SIMD (Single Instruction, Multiple Data) instructions. These instructions allow simultaneous execution of the same operation on multiple data elements, often resulting in improved performance.

However, in the given scenario, because the entity ids do not match up with the array indices (our empty spaces), it becomes challenging to write vectorized code that requires both arrays of Player and Enemy.

Vectorized operations usually rely on the assumption that corresponding elements in different arrays have the same index. If the entity ids in A and B are not aligned with the array indices, it becomes difficult to perform operations that depend on matching elements between the two arrays.

Improving the layout of our components

What if instead we stored the components how they are likely to be used?

Entity ID Health Player Position
1 100.0 X (3, 2)
3 40.0 X (4, 2)
Entity ID Health Enemy Position
2 100.0 X (3, 2)
4 20.0 X (3, 0)

Now we have two tables, but each table forms a contiguous array of elements that can be optimized upon retrieval.

The two tables are each a different Archetype. The first Archetype is made up of Health, Player, and Position. The second is Health, Enemy and Position.

So we are no longer storing our components in simple arrays of the same type. We are storing our components in arrays of the same type, but inside of different tables representing a “type” (as in the sum of all its components) of an entity. We shall call this “type” an Archetype.

Archetypes help find disjointed queries

A Component could be present in many Archetype. This is done by using a ArchetypeComponentId and this many to many relationship is how the parallel scheduler figures out disjointed read-only queries that can be run at the same time.

At first glance you would think this shouldn’t be able to run in parallel:

fn move_players(
    player_positions: Query<&mut Position, With<Player>>,
) {
    // ...
}

fn move_enemies(
    enemy_positions: Query<&mut Position, Without<Player>>
) {
    // ...
}

If components were simply stored and accessed on giant Vec<T> then the player_positions will have already borrowed the Position components and enemy_positions should not run in parallel.

However because we have one table for each Archetype they won’t conflict and can be run in parallel even though both queries return &mut Position. This is because their ArchetypeComponentId would be different even though the ComponentId of the Position would be the same.

In other words each Component has only one ComponentId but can have many ArchetypeComponentId. It would be similar to a many-to-many relationship between Component and Archetype in a SQL database:

Component <-- ComponentId <-- ArchetypeComponentId --> ArchetypeId --> Archetype

When and how do archetypes get updated?

An Archetype comes into existence when we spawn or insert a Bundle.

Given a set of components stored in either a Table or SparseSet or some combination of the two, Bevy will call Archetypes::get_id_or_insert which allocates or finds a TableId which it will use to create a new Archetype.

When we add or remove bundles from an Entity it moves it to a new Archetype table.

These moves can be quite expensive so Bevy caches these moves in Edges. Next time a bundle gets added we can fetch for a matching edge and find the target Archetype to move it to without computing it from scratch.

How does bevy know our archetypes have changed?

Everytime we add or remove components the ArchetypeGeneration of our Archetypes changes:

// https://github.com/bevyengine/bevy/blob/60773e6787d177e97458f9fcf118985906762b2a/crates/bevy_ecs/src/archetype.rs#L615

// An opaque generational id that changes every time the set of [`Archetypes`] changes.
#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd)]
pub struct ArchetypeGeneration(usize);

impl Archetypes {
    // ...
    // Returns the current archetype generation. This is an ID indicating the current set of archetypes
    // that are registered with the world.
    #[inline]
    pub fn generation(&self) -> ArchetypeGeneration {
        ArchetypeGeneration(self.archetypes.len())
    }
    // ...
}

We use this generation when we have to update our archetype indices so that when we query for things they are where they are supposed to be:

// See: https://github.com/bevyengine/bevy/blob/main/crates/bevy_ecs/src/query/state.rs#L177
// See: https://github.com/bevyengine/bevy/blob/main/crates/bevy_ecs/src/system/function_system.rs#L489

// Something similar happens in QueryState implementations
impl<Param: SystemParam> SystemState<Param> {
    // ...
    #[inline]
    pub fn update_archetypes_unsafe_world_cell(&mut self, world: UnsafeWorldCell) {
        let archetypes = world.archetypes();
        let new_generation = archetypes.generation();
        let old_generation = std::mem::replace(&mut self.archetype_generation, new_generation);
        let archetype_index_range = old_generation.value()..new_generation.value();

        for archetype_index in archetype_index_range {
            Param::new_archetype(
                &mut self.param_state,
                &archetypes[ArchetypeId::new(archetype_index)],
                &mut self.meta,
            );
        }
    }
    // ...
}

Read more