Tainted\\Coders

Rust Macros

Last updated:

Rust offers 3 types of procedural macros (metaprogramming for rust):

  1. Custom #[derive] macros that specify code added with the derive attribute used on structs and enums
  2. Attribute-like macros that define custom attributes on any item
  3. Function-like macros that look like function calls but operate on the tokens specified as their argument

Macros are more flexible than functions:

Built in derive macros

Some macros are special, the ones included in rust are used directly by the compiler.

There is no serialization, and no dynamic library calls.

Examples: #[derive(Clone)], #[derive(Copy)]

Derived traits are implemented (expanded) by the compiler automatically and annotated with #[automatically_derived], which helps the compiler differentiate between our code and its own.

#[derive(Clone)]
struct FruitBowl {
  content: Vec<Fruit>,
  needs_cleaning: bool,
}

Would be expanded by the compiler to:

#[automatically_derived]
impl ::core::clone::Clone for FruitBowl {
    #[inline]
    fn clone(&self) -> FruitBowl {
        FruitBowl {
            content: ::core::clone::Clone::clone(&self.content),
            needs_cleaning: ::core::clone::Clone::clone(&self.needs_cleaning),
        }
    }
}

Declarative Macros

Declarative macros are similar to match expressions. Macros are built on syntax extensions. They are more strict and less powerful than procedural macros.

For one, they cannot work with generics. They are patterns that are treated as a whole and cannot be broken down within the macro body.

Procedural Macros

Basically a rust function that is executed at compile time.

They need to be in their own crate, planned to be changed (maybe) in the future.

Procedural macros are rust functions that take input TokenStream(s) and produce an output TokenStream.

For an example invocation foo!(bar) the input token stream would consist of the bar token. The returned token stream will replace the macro invocation.

3 types of procedural macros:

  1. Function like macros
#[proc_macro]
pub fn foo(body: TokenStream) -> TokenStream { ... }

fn main() {
    foo!( foo bar baz );
}
  1. Custom derive procedural macros
#[proc_macro_derive(Bar)]
pub fn bar(body: TokenStream) -> TokenStream { ... }

#[derive(Bar)]
struct S;
  1. Custom attributes
#[proc_macro_attribute]
pub fn baz(
   attr: TokenStream,
   item: TokenStream
) -> TokenStream { ... }

#[baz]
fn some_item() {}

Tokenization

First stage of compiling is tokenization. During tokenization macros are not yet evaluated, in juxtaposition to C/C++ where they are.

Tokens have three types: identifiers, punctuation, and literals.

foo! {
    ------------- Token A (identifier)
    ||| --------- Token B (punctuation)
    ||| | ------- Token C (literal)
    ||| | |||||-- Token D (punctuation)
    ||| | ||||||
    vvv v vvvvvv
    bar = "123";
}

Each of the types also includes a span for where the token is placed, which is needed by the compiler to show the errors correctly.

Abstract Syntax Tree

Next is parsing where tokens are turned into an Abstract Syntax Tree (AST). Macros are parsed after this step.

Token trees are like AST's but different. Token trees have a better representation for guaranteeing matching braces and grouping tokens.

Rust parser understands macro calls as black boxes to be evaluated later. So they can contain incorrect rust code. The parser remembers the tokens and their structure but does not try and understand them.

Expansion

After construction of AST but before compiler begins to understand, an expansion step takes place which will expand all syntax extensions. This expansion takes place in "passes" where macros can be expanded and use other macros which are also expanded on the subsequent passes.

Creating procedural macros

Macros are commonly defined using syn, quote and proc-macro2.

proc-macro2: Wraps proc-macro. Bring proc-macro-like functionality to other contexts like build.rs and main.rs. Make procedural macros unit testable.

quote: Exposes the macro quote!. This little macro allows you to easily create token streams by writing the actual source out as syntax while also giving you the power of interpolating tokens right into the written syntax.

syn: Parses streams of tokens into syntax tree of Rust source code.