Rust Sparse Sets
Imagine we want to store an array of integers:
let entity_ids = vec![1, 2, 3]
Vectors are fast at adding, removing O(1)
.
Vectors are actually also fast at clearing, also O(1)
because the vector simply sets its length to 0. It doesn't actually clean up the memory it previously used.
Unfortunately vectors are slower to look up an elements membership, iterate over itself. All O(N)
operations.
This would be a very common operation in an ECS where we are constantly iterating and retrieving entities by their ID.
There are some problems though with using a Vec
to store our integers:
- Slow membership checks
- Duplicate membership is allowed
So we could try using a HashSet
instead:
use std::collections::HashSet;
fn main() {
let mut entity_ids: HashSet<i32> = HashSet::new();
set.insert(5);
set.insert(10);
set.insert(15);
}
A HashSet
lets us get rid of the two big problems we had with our Vec
. Checking for membership becomes an O(1)
operation and because of the hash keys being unique, forces us to avoid duplicate membership.
We did lose a little along the way: our elements are now in a non deterministic order during iteration and we lost the contiguous memory layout of our Vec
.
We essentially traded efficient member checks for higher memory consumption because HashSets
will require extra memory for its internal data structures like buckets, hashes and linked lists.
For game dev we want to be careful with our memory constraints so the added overkill of a HashSet
might not be worthwhile.
In Rust 1.3.0 we actually had a third option: a BitSet
specifically for unique integers. However its since been removed, so lets revive it:
struct BitSet {
bits: Vec<u64>,
}
impl BitSet {
fn new(size: usize) -> Self {
let num_bits = (size as f64 / 64.0).ceil() as usize;
BitSet {
bits: vec![0; num_bits],
}
}
fn add(&mut self, num: usize) {
let word_idx = num / 64;
let bit_idx = num % 64;
self.bits[word_idx] |= 1 << bit_idx;
}
fn remove(&mut self, num: usize) {
let word_idx = num / 64;
let bit_idx = num % 64;
self.bits[word_idx] &= !(1 << bit_idx);
}
fn contains(&self, num: usize) -> bool {
let word_idx = num / 64;
let bit_idx = num % 64;
(self.bits[word_idx] & (1 << bit_idx)) != 0
}
fn clear(&mut self) {
self.bits.iter_mut().for_each(|word| *word = 0);
}
fn iterate(&self, size: usize) {
for num in 0..size {
if self.contains(num) {
// Perform operations on the current number
println!("Number {} is present in the set.", num);
}
}
}
}
In this implementation, the BitSet
struct holds a vector of u64
values, where each u64
represents 64 bits.
The new function initializes the bit vector based on the given size. The add, remove, and contains methods perform the respective operations on the bit vector using bitwise operations.
The advantage of a BitSet
is that we now get O(1)
membership checking AND we get to keep our fast adding and removing of elements. We also get to keep our great memory layout.
Unfortunately, two important operations are slow:
- Iterating over all the elements in the set takes time
O(m)
- So does clearing the set
O(m)
here is the maximum integer we can store in the vector which is the size
parameter in our ::new
.
When a set is sparsely populated, the BitSet
representation becomes more advantageous because it utilizes a bit vector where each bit represents the presence or absence of an element.
Since the majority of bits are set to 0 (representing the absence of elements), memory usage is minimized.
In contrast, if a set is densely populated, meaning a larger proportion of the possible values are included in the set, the advantages of the BitSet
representation diminish. In such cases, the memory usage becomes less efficient, and the constant-time complexity of bit operations might not outweigh the overhead of maintaining the bit vector.
In an ECS iteration and cleaning do happen frequently. So if our BitSet
is sparsely populated and iterating or clearing the set happens frequently, then we run right into the limitations of the BitSet
representation.
How can we improve the situation? Well it turns out there is a clever trick to making our sparse sets be more efficient in their slower operations.
struct SparseSet<T> {
dense: Vec<T>,
sparse: Vec<Option<usize>>,
count: usize,
}
impl<T: Copy + Eq> SparseSet<T> {
fn new() -> Self {
SparseSet {
dense: Vec::new(),
sparse: Vec::new(),
count: 0,
}
}
fn insert(&mut self, value: T) {
if let Some(&index) = self.sparse.get(value.into()) {
if index.is_some() {
return; // Value already exists in the set
}
}
let index = self.count;
self.dense.push(value);
self.sparse.resize_with(value.into() + 1, || None);
self.sparse[value.into()] = Some(index);
self.count += 1;
}
fn remove(&mut self, value: T) {
if let Some(&index) = self.sparse.get(value.into()) {
if let Some(i) = index {
if i < self.count - 1 {
// Swap the removed element with the last element in the dense array
let last_value = self.dense[self.count - 1];
self.dense.swap(i, self.count - 1);
self.sparse[last_value.into()] = Some(i);
}
self.dense.pop();
self.sparse[value.into()] = None;
self.count -= 1;
}
}
}
fn contains(&self, value: T) -> bool {
if let Some(&index) = self.sparse.get(value.into()) {
index.is_some() && index.unwrap() < self.count && self.dense[index.unwrap()] == value
} else {
false
}
}
fn clear(&mut self) {
self.dense.clear();
self.sparse.clear();
self.count = 0;
}
fn iter(&self) -> std::slice::Iter<T> {
self.dense.iter()
}
}
The trick here is that we have two arrays:
- A dense array storing your domain values (entity_ids)
- A sparse array storing an index to the values
We also use an index to set the current size of our array:
|--- in the set ---|
|-- index = 8
v
dense: [0, 1, 2, 3, 4, 5, 6, 7, 8]
| | | | | | | | |
sparse: [0, 1, 2, 3, 4, 5, 6, 7, 8]
The sparse array is holding the index values of our dense array:
dense[sparse[i]] = i
To remove values we swap the last item with the item we are removing and then decrement our index so it stops before the new value:
|--- in the set ---| Removed |
|-- index = 7
v
dense: [0, 1, 2, 3, 8, 5, 6, 7, 4]
| | | | \ | | | /
| | | | \| | /|
| | | | |\ |/ |
| | | | | \| |
| | | | |/ |\ |
| | | | /| | \|
| | | | / | | |\
sparse: [0, 1, 2, 3, 8, 5, 6, 7, 4]
When we then go to check for membership in our dense array we ask the sparse set for the index within our dense array of i
. So asking sparse[4]
gives us the index of 8
on the dense array.
Because this is greater than our index
we know this item has been removed from our array.
On the other hand when we go to sparse[8]
we get the index of 4
, which is less than our index and so we know it is contained in our array.
Other important operations are O(N)
this way such as "Remove all values but 3 from the array involves simply moving 3 to the front of the array and setting the index to 0.
Another clever thing is that we can "backtrack" our changes without losing the previous state by simply resetting the index
to some higher value. Because the old values were not actually cleaned up we can recover.
Comparing it to our BitSet
:
Operation Bit Vector Sparse set
is-member O(1) O(1)
add-member O(1) O(1)
clear-set O(m) O(1)
iterate O(m) O(n)
Our sparse set is faster than our bit vectors for every operation. However we did trade some memory space. Instead of storing a fixed size bit we have two vectors.
Sparse sets are great for work queues where we want iteration over the set in the order we inserted them so any new jobs to be done added during the operation would be visited later.
In contrast our BitSet
would iterate according to the integer values. So newly added elements might be missed.