Bevy ECS | Tainted Coders

ECS is an acronym for Entity Component System and is a programming paradigm (like Model View Controller) where we store and access our data in a way that maximizes performance.

As with any programming paradigm we name and box things taxonomically:

Entities represent the "things" in your world, its actually just an ID
Components represent the data on your "things"
Systems enumerate the components and affect the rest of your world

This way of thinking is probably not as intuitive as something like MVC programming where we group the idea of entities, components and systems together.

Breaking them apart means the programmer is more sympathetic to the machine at the cost of being able to easily understand the whole system at once.

In an ECS you are encouraged to break up your game into smaller parts that operate on small collections of these components. This means each piece of functionality is perhaps easier to write in isolation, but harder to understand the whole puzzle.

So why bother? Performance.

Squeezing performance out of our CPU

To squeeze speed out of our computers we need to be careful about the way we store our data in memory. Our CPU is going to be more performant if we can get it to give us data from its cache instead of fetching it from RAM which is much more expensive.

The more we can keep our data in contiguous arrays, the better your CPU will be at using its cache which avoids unnecessary trips to our computers much slower memory.

When I say "contiguous array" I mean every item in our array should be both useful and sequential so when we access it our CPU cache can predict our intention correctly.

We are being more sympathetic to our CPU and in exchange our CPU is working harder to save us time accessing memory.

An array of [1, 0, 0, 0, 5, 0, 0, 7] could be contiguous if the 0 means something. But if we are using the 0 as a null value slot then our access patterns would not match what the cache expects.

The big idea in ECS is to store everything in contiguous arrays in a way that matches how we would use them in our game logic.

Game engines loop over our game logic and each tick we want to do something with the "things" in our game. Usually modifying the data they hold, like move its position. This advances the state of our game.

In a classic object oriented game we might layout our player like this:

struct Position {
  x: f32,
  y: f32,
}

struct Velocity {
  dx: f32,
  dy: f32,
}

struct Points(f32);

struct Player {
  points: Points(f32),
  position: Position,
  velocity: Velocity,
}

fn main() {
  let players: Vec<Player> = vec![
    Player {
      points: Points(1.0),
      position: Position { x: 0.0, y: 0.0 },
      velocity: Velocity { dx: 1.0, dy: 1.0 }
    },
    Player {
      points: Points(1.0),
      position: Position { x: 0.0, y: 0.0 },
      velocity: Velocity { dx: 2.0, dy: 2.0 },
    },
    // More players...
  ];

  loop {
    // Iterate over all entities and update their positions
    for player in players.iter() {
      player.points += 1.0
      player.position.x += player.velocity.dx;
      player.position.y += player.velocity.dy;
    }
  }
}

This approach is quite concise and easy to read, but it takes a hit in terms of its performance.

Efficient memory layout

Our memory layout from the above example would currently look like:

Player 1: [Points1, Position1, Velocity1]
Player 2: [Points2, Position2, Velocity2]
Player 3: [Points3, Position3, Velocity3]

When your CPU fetches data from memory and puts it into the cache it does so in a fixed block called a cache line. Common cache line sizes range from 32 to 512 bytes, with 64 bytes being a prevalent choice in modern CPUs.

Your CPU will grab the whole cache line, even if only a portion of the data is actually needed.

In doing so your CPU is guessing that things you store together in memory are likely to be accessed together, this is called spatial locality.

So we can imagine that currently our cache lines look like:

Cache Line 1: [Points1, Position1]
Cache Line 2: [Velocity1, Points2]
Cache Line 3: [Position2, Velocity2]
... etc

When you load a variable onto the stack, you load it into the cache. Then when you access it again later your CPU will first check its cache.

If you had loaded other data in the interim it will have been evicted and have to be fetched from memory, this is called a cache miss.

When we iterate over the players above and fetch the data for each player, we would be bouncing all over our cache trying to fetch the missing data. This would be evicting other cache lines and getting cache misses resulting in worse performance.

Because we are accessing entities which hold a lot of data and enumerating all of the data for each entity our cache is likely getting obliterated.

So whats the alternative?

Well, we could store our game data into structures of arrays (SoA) instead of our previous array of structures (AoS).

struct Entity {
  id: u32,
}

struct PositionComponent {
  x: f32,
  y: f32,
}

struct VelocityComponent {
  dx: f32,
  dy: f32,
}

struct World {
  entities: Vec<Entity>,
  positions: Vec<PositionComponent>,
  velocities: Vec<VelocityComponent>,
}

fn update_positions_and_velocities(world: &mut World) {
  for (position, velocity) in world.positions.iter_mut().zip(world.velocities.iter()) {
    position.x += velocity.dx;
    position.y += velocity.dy;
  }
}

Now when we load our components, our memory is laid out like:

Velocities: [Velocity1, Velocity2, Velocity3]
Positions: [Position1, Position2, Position3]

By storing components of the same type together, and using the entity's ID as the index in each array, we can access only the data we need.

So when we enumerate these arrays our memory access patters match what the CPU is predicting and our cache lines are less likely to be thrashed.

We load each component onto the stack by accessing it sequentially which maximizes the efficiency of our cache.

Entities help us avoid passing references to our data

Okay so by using the entities and components part of our ECS we get better memory performance. But there is also the idea of how do we manage our references and pointers?

In our example above we got rid of our Player and it became implicit. The concept of the player became an entity with a Player component stored in the index matching our entity's ID.

To rebuild the thing in our game world we just access the nth item of each of the arrays.

This is a powerful abstraction because we can avoid passing around references to the data on our arrays. Instead we can pass around this index of our entities and when we want the data we can request it from one place.

By localizing our memory access within our systems we can perform disjointed queries of our data in parallel with each other for even more performance gains.

Archetypes help combinations of components stay together in memory

Our systems are typically iterating over entities based on the groups of components they have. However these arrays can be scattered in different arrays or structures of arrays.

To get around this some ECS frameworks (Bevy included) introduce archetypes.

Archetypes address this inefficiency by organizing entities with similar component compositions into one table. Each archetype owns a table which represents a specific combination of components. Entities that share the same component composition are placed into the table of that archetype.

Archetypes ensure that entities with similar component compositions are stored in contiguous memory locations. This allows systems to access the necessary components in a sequential and cache-friendly manner.

By grouping entities with similar component compositions, archetypes eliminate redundancy in naive component storage.

Each archetype has its own set of component arrays, and entities within the same archetype share the same array instances. This reduces memory overhead compared to storing components for each entity individually.

Archetypes also enable batch processing of entities with similar component compositions. Systems can process multiple entities within the same archetype simultaneously, taking advantage of data parallelism. This can be beneficial for operations such as updating positions, applying physics, or performing AI calculations.

So that explains why Bevy chose to use an archetypal ECS as the core of their framework. Lets see how it actually works specifically in Bevy.

Archetypes and bundles form a graph. Adding or removing a bundle moves an Entity to a new Archetype. Edges are used to cache the results of these moves.