Bevy ECS | Tainted Coders

ECS is an acronym for Entity Component System and is a programming paradigm where we store and access our data in a way that maximizes performance.

A good way to imagine how it all fits together is to think about an in-memory database with. Entities and components make up the data:

Entity	Health	Position
1	Health(100.0)	Position(5., 5.))

Entities An identifier for a row
Components A column in a row
Systems All the behavior

This however is just a helpful fiction. The reality is that components are stored in arrays of the same type:

let health: Vec<Health> = vec![Health(100.0), Health(90.0)]
let position: Vec<Position> = vec![Position(0., 0.), Position(1., 1.)]

We can collect all the columns for an entity by reaching in to each array and grabbing the nth component that matches the Entity's ID.

This way of thinking is probably not as intuitive as something like Model View Controller (MVC) programming where we group the idea of entities, components and systems together inside models.

Breaking them apart means the programmer has a harder time keeping the entire system in their head. We do things this way to increase the performance by reducing cache misses in your CPU.

Squeezing performance out of our CPU

Your CPU has multiple levels of caching that get progressively larger but also slower.

L1 Cache: Smallest (~32-128kb), fastest cache located closest to the CPU
L2 Cache: Larger (~256kb-1MB), slower than L1 but still faster than RAM
L3 Cache: Largest (several MBs), shared among CPU cores, slower than L2

Your CPU will try to get data from the fastest cache possible. If the data is not there, you get a cache miss and your CPU has to go to the next level of cache or fetch it from RAM which is much slower.

To utilize the cache effectively, data should be organized in a way that minimizes the number of cache misses.

The more we can keep our data in contiguous arrays, the better your CPU will be at using its cache. Contiguous array means that every item in our array should be both useful and sequential (no nulls) so when we access it our CPU cache can predict our intention correctly.

We are being more sympathetic to our CPU and in exchange our CPU is working harder to save us time accessing memory.

The big idea in ECS is to store everything in contiguous arrays in a way that matches how we would access them in our game logic.

Arrays of structures

The naive way to store game data is to group all the data for an entity inside a single structure.

struct Position {
  x: f32,
  y: f32,
}

struct Velocity {
  dx: f32,
  dy: f32,
}

struct Points(f32);

struct Player {
  points: Points(f32),
  position: Position,
  velocity: Velocity,
}

Player is convenient to a developer because all the data for a player is in one place. The main loop of our game would take all the players and do something to them like move them around the screen.

fn main() {
  let players: Vec<Player> = vec![
    Player {
      points: Points(1.0),
      position: Position { x: 0.0, y: 0.0 },
      velocity: Velocity { dx: 1.0, dy: 1.0 }
    },
    Player {
      points: Points(1.0),
      position: Position { x: 0.0, y: 0.0 },
      velocity: Velocity { dx: 2.0, dy: 2.0 },
    },
    // More players...
  ];

  loop {
    // Iterate over all entities and update their positions
    for player in players.iter() {
      player.points += 1.0
      player.position.x += player.velocity.dx;
      player.position.y += player.velocity.dy;
    }
  }
}

This approach is intuitive and easy to read, but our CPU has a much harder time using its cache when the memory is laid out this way.

Our memory layout from the above example, would currently look something like this:

Player 1: [Points1, Position1, Velocity1]
Player 2: [Points2, Position2, Velocity2]
Player 3: [Points3, Position3, Velocity3]

When your CPU fetches data from memory and puts it into the cache, it does so in a fixed block called a cache line. Common cache line sizes range from 32 to 128 bytes, with 64 bytes being a prevalent choice in modern CPUs.

Your CPU will grab the whole cache line, even if only a portion of the data is actually needed. In doing so your CPU is guessing that things you store together in memory are likely to be accessed together, this is called spatial locality.

So we can imagine that currently our cache lines look like:

Cache Line 1: [Points1, Position1]
Cache Line 2: [Velocity1, Points2]
Cache Line 3: [Position2, Velocity2]
... etc

When you load a variable onto the stack, you load it into the cache. Then when you access it again later your CPU will first check its cache.

If you had loaded other data in the interim, it will have been evicted and have to be re-fetched from RAM memory. This is called a cache miss.

When we iterate over each player and fetch the data, we would be bouncing all over our RAM trying to fetch the missing data and evicting our cache each iteration.

So whats the alternative?

Structures of arrays

Instead of grouping all the data for an entity together, we break down the data into separate arrays for each piece.

These pieces are called components.

struct Entity {
  id: u32,
}

struct PositionComponent {
  x: f32,
  y: f32,
}

struct VelocityComponent {
  dx: f32,
  dy: f32,
}

struct World {
  entities: Vec<Entity>,
  positions: Vec<PositionComponent>,
  velocities: Vec<VelocityComponent>,
}

Then in the actual game loop we can use the index of each component in the array to recreate the entity. This is a simplified version of what Bevy's actual ECS is doing for you.


fn update_positions_and_velocities(world: &mut World) {
  for (position, velocity) in world.positions.iter_mut().zip(world.velocities.iter()) {
    position.x += velocity.dx;
    position.y += velocity.dy;
  }
}

Now when we load our components, our memory is laid out like:

Velocities: [Velocity1, Velocity2, Velocity3]
Positions: [Position1, Position2, Position3]

By loading each component onto the stack sequentially which will match the memory access patters that the CPU is predicting and our cache lines are less likely to be thrashed.

Entities help us avoid passing references to our data

Okay so by using the entities and components part of our ECS we get better memory performance. But there is also the idea of how do we manage our references and pointers?

This can be particularly painful in Rust which requires you to manage the lifetimes of your references.

In our example above we got rid of our Player struct and it became implicit. The concept of the player became an Entity with a Player component.

To rebuild the Player in our game world we just access the nth item of each of the arrays according to the Entity's ID. All the components together make up the total representation of an entity's data.

This is a powerful abstraction because we can avoid passing around references to the data on our arrays. Instead we can pass around this index of our entities and when we want the data we can request it from one place.

By localizing our memory access within our systems we can perform disjointed queries of our data in parallel with each other for even more performance gains.

Archetypes help combinations of components stay together in memory

Our systems are typically iterating over entities based on the groups of components they have. However these arrays can be scattered in different arrays or structures of arrays.

To get around this some ECS frameworks (Bevy included) introduce archetypes.

Archetypes address this inefficiency by organizing entities with similar component compositions into one table. Each archetype owns a table which represents a specific combination of components. Entities that share the same component composition are placed into the table of that archetype.

Archetypes ensure that entities with similar component compositions are stored in contiguous memory locations. This allows systems to access the necessary components in a sequential and cache-friendly manner.

By grouping entities with similar component compositions, archetypes eliminate redundancy in naive component storage.

Each archetype has its own set of component arrays, and entities within the same archetype share the same array instances. This reduces memory overhead compared to storing components for each entity individually.

Archetypes also enable batch processing of entities with similar component compositions. Systems can process multiple entities within the same archetype simultaneously, taking advantage of data parallelism. This can be beneficial for operations such as updating positions, applying physics, or performing AI calculations.

So that explains why Bevy chose to use an archetypal ECS as the core of their framework. Lets see how it actually works specifically in Bevy.

Archetypes and bundles form a graph. Adding or removing a bundle moves an Entity to a new Archetype. Edges are used to cache the results of these moves.