When we have two objects on the heap that need to reference each other we can reach for a smart pointer.
Code like this is robust to memory corruption (we can check when we dereference) but performance takes a big hit without lots of optimization since most code is full of cache misses (I’ll explain later).
At a higher level memory managers for your programs tend to keep pools for
specific allocation ranges, eg, a pool for < 24 bytes, a pool for <= 64, etc up
to, say, a megabyte after which it might be delegated to the OS, such as
VirtualAlloc
on Windows, directly.
This keeps objects of similar or the same sizes together, but it does not keep objects of the same type together, because a memory manager is not type-aware.
For more performance we need to carefully pack our data into an aligned array that takes up a contiguous memory space to avoid large sequential iterations.
But then once we have these aligned arrays how should we handle access?
CPU Cache
Your CPU is constantly caching frequently accessed data and instructions on a small memory component located on your processor chip. Its acting as a buffer between your CPU and RAM (which is slower) to reduce the need to fetch it from memory.
When your CPU needs to access data it first checks this cache to see if the data is already present. If the data is found (a “cache hit”) then it doesn’t need to go any further and use the relatively slower memory.
A “cache miss” is when the CPU does not hold this data any more. It needs to both fetch the data from your RAM and make room for this data by evicting existing data from its cache (if it needs to).
Caches are layered (L1, L2, etc) depending on the architecture of the CPU. These caches operate on the observations:
A. Spatial Locality: There is a tendency of a program to access data that is close to previously accessed data
B. Temporal Locality: There is a tendency of a program to access the same data repeatedly over a short period
By exploiting these common situations your CPU is saving itself from doing any more work than it needs to.
So if we are dereferencing our pointers all the time we are breaking the assumptions the CPU is trying to optimize for (by fetching data far away on the heap) and therefore enabling more cache misses.
Consider the following example:
let array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
let mut sum = 0;
for i in 0..10_000 {
sum += array[i % 10];
}
Because we are accessing the array in a non sequential order we are breaking the assumptions of spatial locality leading to cache misses.
Or take a classic recursive function:
fn calculate_factorial(n: u64) -> u64 {
if n == 0 {
return 1;
} else {
let previous_result = calculate_factorial(n - 1);
return n * previous_result;
}
}
Every time our function calls itself its pushing data onto the stack evicting our old stack and causing potential cache misses. We broke the temporal locality by constantly wiping out our stack context.
With pointers and dereferencing them we are accessing data that is outside of any assumptions our CPU is making about spatial locality.
Handles
This pattern works well in data oriented architectures where systems operate on arrays of data located close together in memory. Very relevent to game development where we are creating and destroying thousands (or millions) of components in memory.
To implement this pattern we need to:
- Move memory management into self contained systems
- Group items of the same type into arrays
- When we create items, return a
Handle
to the item (not a pointer) - Store the index in the handle as a
x
bit number that encodes information about the position in the index and additional memory safety checks - Only convert handles to pointers when absolutely necessary
By doing this we can allow the users of our libraries to interact with pointers but prevent them from having to manually allocate, reduces the passing around of raw pointers to a minimum and most importantly removes the ownership pointers have to their underlying data.
Because we are grouping items of the same type into arrays and only holding the
indexes of those arrays in a Handle
, we are no longer holding pointers to the
underlying object, only an index to its (hopeful) position.
If user code actually needs the pointer its forced to look it up by passing a handle and receiving a pointer. But our users have to be careful to not store these pointers anywhere and not pass them across function calls.
All the runtime safety checks should be handled in this lookup function.
When we get back the pointer, because memory allocation is all done through systems, we can be sure of the pointers validity until those systems run and update their arrays.
But there is a problem: how do we guarantee that even if the array has a valid item for the index that it actually is the item we want? What if items were removed and then added?
Thats why we store the HandleID
(the index) as a certain number of bits with
additional room to encode this kind of information. Part of the bits encode the
index and part of the bits encode a unique key.
Here is an example of a 16 bit ID which would be stored in a HandleID
of type
u16
:
The Index Unique
[---------] [-----]
10110101010 001010
When we convert the Handle
to a pointer we use the first part to check the
arrays index, and the second part to check that we have the right item (and not
another different item thats been switched in since last update).
Handles made this way are quite powerful. We can share object references across processes like this.
But there is still a nagging concern: what happens if two items share their unique bits by chance? They would appear to be the validly pointing to the right object but return something else.
This is where something like a generational index would come in where we can increment a counter every time an array slot is reallocated to guarantee we generate a different unique ID.
Handles can be combined with closures to allow us to relocate the memory of actively used objects without fear of dangling references:
let mut allocation = MyAllocator::<Foo>::new();
let handle_a = allocation.allocate();
let handle_b = allocation.allocate();
allocation.expose::<&mut Foo, &Foo>(handle_a, handle_b, |foo_a, foo_b| {
foo_a.some_mutating_method();
println!(foo_b);
});
allocation.compact(); // Rearrange the allocated memory for myfoo and myfoo2
allocation.expose::<&mut Foo>(handle_a, |foo_a| {
// Still valid!
foo_a.some_mutating_method();
});
Handles in the Erlang VM
Interestingly these types of handles are used as process IDs PID
in erlang.
The PID
acts as a handle to interact with the different processes.
When one process wants to interact with another process, it typically sends a
message to the target process using the PID
as the recipient. The receiving
process can then use the PID handle to identify the sender and respond
accordingly.
Handles in Bevy ECS
In Bevy’s ECS we get handles for our
assets Handle<T>
.
For our assets we create an Asset<Font>
and receive back a Handle<Font>
which is a unique index on our Assets
collection and we can then access with
Assets::get
or Assets::get_mut
.
The actual definition of the HandleId
:
pub enum HandleId {
// A handle id of a loaded asset.
Id(Uuid, u64),
// A handle id of a pending asset.
AssetPathId(AssetPathId),
}
Which has a tuple type containing a UUID
and a u64
.
When we generate a HandleId
we use its TYPE_UUID
as its index and a random
number as its unique key:
impl HandleId {
// Creates a random id for an asset of type `T`.
#[inline]
pub fn random<T: Asset>() -> Self {
HandleId::Id(T::TYPE_UUID, fastrand::u64(..))
}
}
Its not quite the same as the implementation mentioned at the start of this article because they are manually implementing some reference counting. But using Bevy uses these handles to avoid passing pointers as well.
EntityId
is probably even more like these handles but I haven’t yet dug in
deep enough to say for sure.