Tainted\\Coders

Bevy Archetypes

Bevy version: 0.14Last updated:

Bevy is an archetypal ECS. Archetypes enable efficient storage and processing of entities with similar behavior by grouping them based on their component composition.

An Archetype is a unique combination of components. Bevy stores these combinations on a Table.

By placing components that are accessed together in the same memory location we are helping our memory access be more efficient.

When a query is used as a system parameter Bevy calculates the matching Archetype for the set of components it is asking for. This lets Bevy calculate which systems can run in parallel (see below).

When you add a component to an entity you are moving that entity to a new archetype.

A world has only one Archetype for each unique combination of components.

An Archetype stores some metadata about itself:

// https://docs.rs/bevy/latest/bevy/ecs/archetype/struct.Archetype.html
pub struct Archetype {
  id: ArchetypeId,
  table_id: TableId,
  edges: Edges,
  entities: Vec<ArchetypeEntity>,
  components: ImmutableSparseSet<ComponentId, ArchetypeComponentInfo>,
}

Archetypes are stored in a particular world and are only locally unique to that World:

// https://docs.rs/bevy/latest/bevy/ecs/archetype/struct.Archetypes.html
pub struct Archetypes {
  pub(crate) archetypes: Vec<Archetype>,
  archetype_component_count: usize,
  by_components: bevy_utils::HashMap<ArchetypeComponents, ArchetypeId>
}

Starting without archetypes

Imagine we have a naive implementation of an ECS that stores components as arrays and entities as the index:

Entity IDHealthPlayerPositionEnemy
1100.0X(3, 2)
272.0(3, 1)X
340.0X(4, 2)
420.0(3, 0)X

The gap in player and the gap in entity is very awkward for our memory access. If we wanted to write vectorized code to work on these arrays then the empty values would present a serious problem.

"Vectorized code" means code that operates on entire arrays or vectors of data at once, rather than processing individual elements sequentially.

Vectorized operations can be performed efficiently using hardware optimizations like SIMD (Single Instruction, Multiple Data) instructions. These instructions allow simultaneous execution of the same operation on multiple data elements, often resulting in improved performance.

However, in the given scenario, because the entity ids do not match up with the array indices (our empty spaces), it becomes challenging to write vectorized code that requires both arrays of Player and Enemy.

Vectorized operations usually rely on the assumption that corresponding elements in different arrays have the same index. If the entity ids in A and B are not aligned with the array indices, it becomes difficult to perform operations that depend on matching elements between the two arrays.

Improving the layout of our components

What if instead we stored the components how they are likely to be used?

Entity IDHealthPlayerPosition
1100.0X(3, 2)
340.0X(4, 2)
Entity IDHealthEnemyPosition
2100.0X(3, 2)
420.0X(3, 0)

Now we have two tables, but each table forms a contiguous array of elements that can be optimized upon retrieval.

The two tables are each a different Archetype. One for the Player and one for the Enemy.

So we are storing our components in arrays of the same type, but inside of different tables representing a type (as in the sum of all its components) of an entity. We shall call this "type" an Archetype.

Archetypes help find disjointed queries

A Component could be present in many Archetype. This is done by using a ArchetypeComponentId and this many to many relationship is how the parallel scheduler figures out disjointed read-only queries that can be run at the same time.

At first glance you would think this shouldn't be able to run in parallel:

fn move_players(
  player_positions: Query<&mut Position, With<Player>>,
) {
  // ...
}

fn move_enemies(
  enemy_positions: Query<&mut Position, Without<Player>>
) {
  // ...
}

If components were simply stored and accessed on giant Vec<T> then the player_positions will have already borrowed the Position components and enemy_positions should not run in parallel.

However because we have one table for each Archetype they won't conflict and can be run in parallel even though both queries return &mut Position.

This is because their ArchetypeComponentId would be different even though the ComponentId of the Position would be the same.

In other words, each Component has only one ComponentId but can have many ArchetypeComponentId. It would be similar to a many-to-many relationship between Component and Archetype in a SQL database:

Component <-- ComponentId <-- ArchetypeComponentId --> ArchetypeId --> Archetype

When and how do archetypes get updated?

An Archetype comes into existence when we spawn or insert a Bundle.

Given a set of components stored in either a Table or SparseSet or some combination of the two, Bevy will call Archetypes::get_id_or_insert which allocates or finds a TableId which it will use to create a new Archetype.

When we add or remove bundles from an Entity it moves it to a new Archetype table.

These moves can be quite expensive so Bevy caches these moves in Edges. Next time a bundle gets added we can fetch for a matching edge and find the target Archetype to move it to without computing it from scratch.

How does bevy know our archetypes have changed?

Every time we add or remove components the ArchetypeGeneration of our Archetypes changes.

Bevy uses the length of the current archetypes to create a generational index that guarantees a unique index as we add or remove archetypes.

We use this generation when we have to update our archetype indices so that when we query for things they are where they are supposed to be.