Rust Lifetimes

This article is a personal rewriting of: https://www.lurklurk.org/effective-rust/lifetimes.html

History

Lifetimes are an evolution of RAII (Resource Acquisition Is Initialization) which is a strategy in C++ to automatically prevent dangling pointers.

Dangling pointers are caused by referencing memory values on the stack which have since been moved.

The Stack

While your program is running the memory it uses is divided into chunks:

A fixed size for your program’s code
A fixed size for your programs global constant data
A variable sized heap
A variable sized stack

The stack and the heap are allocated on opposite ends:

(code) (global data) (heap) --(grows)--> free memory <--(grows)-- (stack)

Your stack is holding parameters, local variables and temporary values inside its stack frame. When a function is called a new stack frame is added to the stack.

Your CPU then updates a pointer to the current stack frame called a stack pointer.

When your function returns, the stack pointer is reset to where it was before the call, which would be the caller’s stack frame.

So moving down the call stack we are pushing stack frames onto the stack and updating our stack pointer to the right one.

When we return we do the reverse and pop stack frames off the stack and update our stack pointer.

Dangling references

Now when we invoke some function, return, then invoke another function, the stack frame for the second function will re-use all the same memory that the first function did.

So if we have a pointer that points to a ephemeral value in our stack frame and it goes out of scope we have a dangling pointer.

To get around this rust uses lifetimes as its implementation of RAII to clean up these dangling refernces automatically.

The entire lifetime system is just a constraint-solving system that tries to minify the region of every reference.

Crafting lifetimes

One constraint of the Rust system is that all borrow checking must be done in each function body independently. Otherwise the compiler would be computationally unfeasible.

So within our function bodies we can’t talk about lifetimes and wouldn’t want to because the information for them is readily available. But higher level lifetimes like those at the type or API level the compiler is missing this info.

For every type that includes a & there is an associated lifetime 'a.

In the beginning you had to explicitly state these everywhere but rust has since upgraded the compiler to figure this out automatically most of the time.

The borrow checker ensures that values may not be mutated or moved while they are borrowed.

The lifetime is the period where the reference is guaranteed to stay in the same place. Once the referenced value is moved or dropped the reference is no longer able to be accessed.

There are two kinds of lifetimes we can talk about:

The lifetime of a reference, corresponding to the span of time in which that reference is used.
The lifetime of a value, corresponding to the span of time before that value gets freed (or, put another way, before the destructor for the value runs).

The lifetime of 2 is known as a value’s scope. These two concepts, lifetime and scopes are related to each other. If you make a reference to a value, the lifetime of that reference cannot outlive the scope of that value.

fn foo() {
    let mut data = vec!['a', 'b', 'c']; // --+ 'scope
    capitalize(&mut data[..]);          //   |
//  ^~~~~~~~~~~~~~~~~~~~~~~~~ 'lifetime //   |
    data.push('d');                     //   |
    data.push('e');                     //   |
    data.push('f');                     //   |
} // <---------------------------------------+

fn capitalize(data: &mut [char]) {
    // do something
}

Your scope usually corresponds to some block. But lifetimes can span many things, even an individual expression like in the example above.

The lifetime of the borrow in the example is confined to just the call to capitalize.

Although lifetimes are everywhere you don’t actually get to specify them. There is no way to tell the compiler you want the lifetime to exist only for lines 10-20.

Instead we assign arbitrary labels like 'a or 'b.

Giving two types the same lifetime means they both live for the same period of time.

A simple function that inputs a reference and outputs that same reference would have to have the same lifetime for both parts of the function:

pub fn first<'a>(data: &'a [Item]) -> Option<&'a Item> {}

But rust has lifetime elision rules meaning we can let the compiler figure them out automatically:

pub fn first(data: &[Item]) -> Option<&Item> {}

A definition like:

pub fn smaller<'a>(left: &'a Item, right: &'a Item) -> &'a Item {}

Looks like the two input lifetimes are the same. But remember, the purpose of lifetimes is only to ensure that references do not out-live the values they point to.

So if you keep around the output of this function then the input(s) must live at least as long. It will expand the region our input(s) must be valid for.

{
    let outer = Item { contents: 7 };
    {
        let inner = Item { contents: 8 };
        {
            let min = smaller(&inner, &outer);
            println!("smaller of {:?} and {:?} is {:?}", inner, outer, min);
        } // `min` dropped
    } // `inner` dropped
} // `outer` dropped

One input, one or more outputs: assume outputs have the “same” lifetime as the input.

fn f(x: &Item) -> (&Item, &Item)
// ... is equivalent to ...
fn f<'a>(x: &'a Item) -> (&'a Item, &'a Item)

Multiple inputs, no output: assume all the inputs have different lifetimes.

fn f(x: &Item, y: &Item, z: &Item) -> i32
// ... is equivalent to ...
fn f<'a, 'b, 'c>(x: &'a Item, y: &'b Item, z: &'c Item) -> i32

Multiple inputs including &self, one or more outputs: assume output lifetime(s) are the “same” as &self’s lifetime.

fn f(&self, y: &Item, z: &Item) -> &Thing
// ... is equivalent to ...
fn f(&'a self, y: &'b Item, z: &'c Item) -> &'a Thing

On the heap lifetimes are related to the scope of the owner. So a Box<T> lifetime would be the same as the variable it is owned by and therefore the lifetime of the data on the heap is the same as the lifetime of the variable on the stack.

For data structures that hold references we need both the item and the structure itself to have the same lifetime:

pub struct ReferenceHolder<'a> {
    pub index: usize,
    pub item: &'a Item,
}

// Annotation includes lifetimes of all fields
pub struct MultiRefHolder<'a, 'b> {
    pub left: ReferenceHolder<'a>,
    pub right: ReferenceHolder<'b>, // Could choose 'a instead here
}

Anything that contains a reference, no matter how deeply nested, is only valid for the lifetime of the item referred to

An axiom of all this is to prefer data structures that own their contents where possible. Avoid references by using a smart pointer if you must.

Borrowing Rules

To make the borrow checker possible some rules must be followed:

The scope of any reference must be smaller than the lifetime of the item it refers to.
There can be unlimited immutable references to the item OR a single mutable reference, but never both.