Rust for Rubyists

This is a WIP collection of notes for what I found interesting in the contrast between Rust and Ruby.

Ruby is a beautiful but high level language. You don’t manage memory yourself, you don’t worry about the stack vs the heap. You focus mostly on crafting a DSL to solve your domains specific problems.

Ruby is so high level that you can go a ways into your career without dealing with these lower level concerns.

So for a soft introduction lets build up our knowledge about Ruby’s internals to make Rust easier to compare and contrast.

Stack and the Heap

In Ruby when we have code like:

a_string = "Hello world"

We are actually allocating a c struct called RString onto the heap, and only a reference to that string is bound to the variable a_string.

Because everything in Ruby (except integers, they are kinda special) is an object, and because objects are allocated on the heap, almost everything is in Ruby, except the references we bind to our local stack frame (the scope of a method).

These objects are deallocated by the garbage collector when all references to them on the stack are cleaned up.

Memory management

In Ruby, everything is an object and objects are always allocated on the heap.

The memory allocated for Ruby objects is managed by the Ruby interpreter, and it relies on a garbage collector to handle memory deallocation when objects are no longer being referenced (by using reference counting).

When you allocate an object in Ruby SomeClass.new you are handed back a pointer with an c type of VALUE which is a typedef unsigned long VALUE;.

Underneath Ruby’s interpreter uses an RVALUE structure which contains fields to store the object’s instance variables, flags, and a pointer to the object’s class.

RVALUE is a union of the following types:

struct RObject 	all things for which none of the following applies
struct RClass 	class object
struct RFloat 	small numbers
struct RString 	string
struct RArray 	array
struct RRegexp 	regular expression
struct RHash 	hash table
struct RFile 	IO, File, Socket, etc…
struct RData 	all the classes defined at C level, except the ones mentioned above
struct RStruct 	Ruby’s Struct class
struct RBignum 	big integers 

The Ruby interpreter takes care of managing this memory and ensuring proper memory deallocation through the garbage collector by counting references.

When you define a Struct in Ruby its actually just a high level object implementing the real structs underneath from a common union of the above types.

However a struct in Rust is much lower level. It would be like defining your own RObject.

Both modules and classes in Ruby use the same underlying c struct type: RClass.

Defining methods

Ruby adds any defined method to the class corresponding to the current lexical scope, unless we add a prefix.

A lexical scope is represented as a struct in c called a cref which points to:

nd_next: The parent lexical scope
nd_clss: The target class

The default lexical scope for a method definition without a prefix is its parents lexical scope target class.

When Ruby reads class or module or instance_eval and friends it generates a new lexical scope and points nd_next to the last one.

class Dog
  def speak
    "woof"
  end
end

When we put class Dog Ruby creates a new lexical scope and assigns the nd_clss pointer to the RClass of Dog.

When Ruby reads our def keyword it uses this nd_clss pointer to find the proper method table to add the method to.

On the first pass of the compiler the method definition is turned into a snippet held by the YARV code. Later when that YARV code executes Ruby assigns this code to the target class according to the pointer, saving this method in its RClass table.

What about class methods?

class Dog
  def self.speak
    "woof"
  end
end

Here when Ruby encounters def it notices our prefix. The prefix changes the algorithm for finding the target class. Ruby will set the target class to be the class of the prefix. We used self as our prefix so whatever self has as its super pointer, which would be our metaclass (where we define class methods).

That’s quite interesting because it means we can define methods outside of their classes using the same target class lookup:

class Dog; end

def Dog.speak
  "woof"
end

Or we can create a new lexical scope:

class Dog
  # Ruby reads the `self` part first and finds its class to set as the lexical
  # scopes target class.
  class << self
    def speak
      "woof"
    end
  end
end

We get the same class method as above defined on Dog in either case.

We can also define methods on single instances, no other instances of dog will be able to call it:

dog = Dog.new

def dog.shout
  "WOOF"
end

But how? Wouldn’t Ruby search for the prefixes class and define this on Dog? Well because the prefix is an object and not a class Ruby creates a hidden “singleton class” to define this method on.

Okay but it actually keeps going:

dog = Dog.new

class << dog
  def shout
    "WOOF"
  end
end

Now we can see the metaprogramming call class << can be fed more than just self to create a new lexical scope for the defined methods, thereby setting the target class (which holds the method table to add this to).

Passing messages

In Ruby we say we are sending “messages” but what does that really mean?

When we call a method on an class it will look up the method in the m_tbl (method table) and if it can’t find it then work up to its super and repeat the process again, until eventually something is found or we get a NoMethodError.

When we send a message by calling a method we:

Look through the RClass method table
If no method found, set the current class to the RClass of our superclass and try again
Continue until found and call the method or raise an error

During the execution the method body has access to the object’s instance variables and other state loaded from the heap.

The method execution itself happens within the context of the object’s class and can access both class and instance methods from the context of self.

 struct RBasic {
     VALUE flags;
     const VALUE klass; // A reference to the class of which we are
 };

 struct RObject {
     struct RBasic basic; // Holds a pointer to our class inside
     union {
         struct {
             uint32_t numiv; // Number of instance variables
             VALUE *ivptr; // A pointer to an array of instance variables
             void *iv_index_tbl; // shortcut for RCLASS_IV_INDEX_TBL(rb_obj_class(obj))
         } heap;
         VALUE ary[ROBJECT_EMBED_LEN_MAX];
     } as;
 };

Knowing all this we can see an insight into all Ruby objects: they are just a combination of a class pointer and arrays of instance variables.

Dynamic vs static dispatch

In Ruby all methods are dynamically dispatched by walking the RClass method definitions until something is found. When it does find something it adds it to a method cache for faster lookup next time, and clears this cache when new methods are defined, even at runtime.

This kind of dynamic dispatch is at the heart of Ruby’s amazing metaprogramming for method overriding, method_missing, and other magic.

However Rust uses (mostly) static dispatch where method resolution is determined at compile-time based on the declared type of the variable or expression.

This swaps the runtime cost of method lookup to compile time which improves performance.

Parameters

Ruby has both positional and named arguments, but it didn’t always. Pre 1.8 there was actually only positional arguments.

A common pattern at the time was to pass a hash as the last argument to hold something like keyword arguments:

def initialize(post, options = {})
    @post = post
    @style = options[:style]
end

In Ruby 2.0 we finally got keyword arguments which had better errors for missing keys than passing in the hash explicitly as a positional argument.

This is the reason that we have to put our keyword arguments last and our positional arguments first.

We have a ton of flexibility in specifying the types of parameters we can receive:

class Rectangle
  def initialize(width:, height: 1, meta = {})
    @width = width
    @height = height
    @meta = meta
  end

  def area
    @width * @height
  end
end

Rust on the other hand is quite limited. There are only positionals.

There are default values, but they come from implementing the trait Default for your types. We also have to spread it in to any structs we create.

struct Rectangle {
    width: u32,
    height: u32,
}

impl Rectangle {
    fn new(width: u32, height: u32) -> Rectangle {
        Rectangle { width, height }
    }

    // Methods are just functions that take a reference (or owned value) of self
    fn area(&self) -> u32 {
        self.width * self.height
    }
}

impl Default for Rectangle {
    fn default() -> Self {
        Self::new(0, 0);
    }
}

In the example above we can never call Rectangle::new by defining only the width unless we create a different function with a new name because Rust doesnt have any method overloading.

In Rust we care deeply about the types we can accept, and we can be more sure of our inputs because the compiler will tell us when it thinks we are wrong.

Classes

A Ruby class is a Ruby object that also contains method definitions, attribute names, a superclass pointer, and a constants table.

Classes hold references to their hierarchy by pointing to their superclass, if none is given then its set to the Object class:

Classes are created as the RClass struct and hold a table of their instance methods and a pointer to its superclass.

For class methods Ruby actually automatically creates a metaclass that holds them. Ruby sets this metaclass to be the class of your class, as in it sets the klass pointer in the RClass struct to point to this metaclass:

class Dog
  def self.speak
    "Woof"
  end
end

Dog.singleton_class #=> #<Class:Dog> (this is the metaclass)
Dog.singleton_class.methods #=> [:speak, ...]

Okay, but what about class variables (not to be confused with class instance variables)? As a refresher, class variables are the ones with two @@ signs:

class Animal
  @speak = "..." # A class INSTANCE variable @speak
  @@shout = "..." # A class variable @@shout

  def self.speak
    @speak
  end

  def self.shout
    @@shout
  end
end

class Dog < Animal
  @speak = "Woof" # A class INSTANCE variable @speak
  @@shout = "WOOF" # A class variable @@shout
end

Dog.speak #=> "Woof"
Dog.shout #=> WOOF"

# Okay, that all looks good... But wait...

Animal.shout #=> "WOOF", uh oh...

Well actually both class variables and class instance variables both end up fitting into the RClass under the iv_tbl. However how it gets or sets each type of these variables is different.

When you get or set a class variable (@@) it will look up in the ancestor chain (linked list of superclasses) for the same named instance variable and set or get it on that class instead.

Rust does not have a built-in object-oriented programming (OOP) system with classes, inheritance, and polymorphism.

In Ruby objects are instances of classes, and classes define both the data (attributes) and behavior (methods) of the objects. Inheritance allows classes to inherit attributes and behavior from parent classes, and polymorphism enables objects to be treated as instances of their parent classes or interfaces.

But in Rust, the primary abstraction mechanism is the “struct,” which allows you to define custom data structures with associated fields. However, unlike classes in OOP languages, structs in Rust do not have inherent methods or inheritance.

Instead, Rust embraces a different paradigm called “composition over inheritance.” It encourages the use of smaller, reusable components that can be composed together to create more complex behavior. This approach is facilitated by Rust’s strong support for functions, traits, and generic programming.

Ruby actually also has “composition over inheritance” through its modules, which are the closest thing Ruby has to Rust’s traits.

Modules

A Ruby module is a Ruby object that also contains method definitions, a superclass pointer, and a constants table.

Underneath modules are also RClass structs. There is very little difference between them and classes other than not being able to spawn instances.

We know that the superclass we store in RClass makes our class hierarchy a linked list structure. Each class points to its superclass and so on until reaching Object which would return Nil.

So when you include a module Ruby is inserting a copy of the module into this linked list by having the copy point to the superclass and the class doing the including now points to this copy.

Of course extend also does the same but by changing your metaclasses superclass to the module instead. And prepend is some black magic where it surgically creates a clone of the class and moves methods into it, placing it after the included module in the chain.

Compiling

Compiling in Ruby happens in 3 separate acts:

Tokenization -> tokens
Parsing -> AST nodes
Compiling

First Ruby turns the characters on the screen into “tokens” by reading the characters in the called Ruby file one at a time.

Then the parser takes these tokens and turns them into a structured representation of your code called an Abstract Syntax Tree (AST).

Before 1.8 your code was actually executed immediately as it is being read. It does this by walking the nodes in the AST and executing each one.

After 1.9 YARV was added which compiles your program to an intermediate form that runs on the Ruby VM. So YARV outputs instructions and then the actual line by line execution still happens, but read from the YARV instructions, not the AST.

Blocks and Closures

When you define a block in Ruby it creates a rb_block_t and saves an ep pointer to the location of the current stack frame and its local variables. This is how we have closures in Ruby.

Blocks also store a pointer to self which is the value of self where the block was defined.

When you define a lambda the current stack frame is copied to the heap AND the current stack frame is now modified to point to this copied environment. Calling retrieves this stack frame by setting the new stack frame’s ep to point to the data on the heap that was previously set.

Tainted ^\_\ `Coders`