Rust for Rubyists
This is a WIP collection of notes for what I found interesting in the contrast between Rust and Ruby.
Ruby is a beautiful but high level language. You don't manage memory yourself, you don't worry about the stack vs the heap. You focus mostly on crafting a DSL to solve your domains specific problems.
Ruby is so high level that you can go a ways into your career without dealing with these lower level concerns.
So for a soft introduction lets build up our knowledge about Ruby's internals to make Rust easier to compare and contrast.
Stack and the Heap
In Ruby when we have code like:
a_string = "Hello world"
We are actually allocating a c struct called RString
onto the heap, and only a reference to that string is bound to the variable a_string
.
Because everything in Ruby (except integers, they are kinda special) is an object, and because objects are allocated on the heap, almost everything is in Ruby, except the references we bind to our local stack frame (the scope of a method).
These objects are deallocated by the garbage collector when all references to them on the stack are cleaned up.
Memory management
In Ruby, everything is an object and objects are always allocated on the heap.
The memory allocated for Ruby objects is managed by the Ruby interpreter, and it relies on a garbage collector to handle memory deallocation when objects are no longer being referenced (by using reference counting).
When you allocate an object in Ruby SomeClass.new
you are handed back a pointer with an c type of VALUE
which is a typedef unsigned long VALUE;
.
Underneath Ruby's interpreter uses an RVALUE
structure which contains fields to store the object's instance variables, flags, and a pointer to the object's class.
RVALUE
is a union of the following types:
struct RObject all things for which none of the following applies
struct RClass class object
struct RFloat small numbers
struct RString string
struct RArray array
struct RRegexp regular expression
struct RHash hash table
struct RFile IO, File, Socket, etc…
struct RData all the classes defined at C level, except the ones mentioned above
struct RStruct Ruby’s Struct class
struct RBignum big integers
The Ruby interpreter takes care of managing this memory and ensuring proper memory deallocation through the garbage collector by counting references.
When you define a Struct
in Ruby its actually just a high level object implementing the real structs underneath from a common union of the above types.
However a struct
in Rust is much lower level. It would be like defining your own RObject
.
Both modules and classes in Ruby use the same underlying c struct type: RClass
.
Defining methods
Ruby adds any defined method to the class corresponding to the current lexical scope, unless we add a prefix.
A lexical scope is represented as a struct in c called a cref
which points to:
nd_next
: The parent lexical scopend_clss
: The target class
The default lexical scope for a method definition without a prefix is its parents lexical scope target class.
When Ruby reads class
or module
or instance_eval
and friends it generates a new lexical scope and points nd_next
to the last one.
class Dog
def speak
"woof"
end
end
When we put class Dog
Ruby creates a new lexical scope and assigns the nd_clss
pointer to the RClass
of Dog
.
When Ruby reads our def
keyword it uses this nd_clss
pointer to find the proper method table to add the method to.
On the first pass of the compiler the method definition is turned into a snippet held by the YARV code. Later when that YARV code executes Ruby assigns this code to the target class according to the pointer, saving this method in its RClass
table.
What about class methods?
class Dog
def self.speak
"woof"
end
end
Here when Ruby encounters def
it notices our prefix. The prefix changes the algorithm for finding the target class. Ruby will set the target class to be the class of the prefix. We used self
as our prefix so whatever self
has as its super
pointer, which would be our metaclass (where we define class methods).
That's quite interesting because it means we can define methods outside of their classes using the same target class lookup:
class Dog; end
def Dog.speak
"woof"
end
Or we can create a new lexical scope:
class Dog
# Ruby reads the `self` part first and finds its class to set as the lexical
# scopes target class.
class << self
def speak
"woof"
end
end
end
We get the same class method as above defined on Dog
in either case.
We can also define methods on single instances, no other instances of dog will be able to call it:
dog = Dog.new
def dog.shout
"WOOF"
end
But how? Wouldn't Ruby search for the prefixes class and define this on Dog
? Well because the prefix is an object and not a class Ruby creates a hidden "singleton class" to define this method on.
Okay but it actually keeps going:
dog = Dog.new
class << dog
def shout
"WOOF"
end
end
Now we can see the metaprogramming call class <<
can be fed more than just self to create a new lexical scope for the defined methods, thereby setting the target class (which holds the method table to add this to).
Passing messages
In Ruby we say we are sending "messages" but what does that really mean?
When we call a method on an class it will look up the method in the m_tbl
(method table) and if it can't find it then work up to its super
and repeat the process again, until eventually something is found or we get a NoMethodError
.
When we send a message by calling a method we:
- Look through the
RClass
method table - If no method found, set the current class to the
RClass
of our superclass and try again - Continue until found and call the method or raise an error
During the execution the method body has access to the object's instance variables and other state loaded from the heap.
The method execution itself happens within the context of the object's class and can access both class and instance methods from the context of self
.
struct RBasic {
VALUE flags;
const VALUE klass; // A reference to the class of which we are
};
struct RObject {
struct RBasic basic; // Holds a pointer to our class inside
union {
struct {
uint32_t numiv; // Number of instance variables
VALUE *ivptr; // A pointer to an array of instance variables
void *iv_index_tbl; // shortcut for RCLASS_IV_INDEX_TBL(rb_obj_class(obj))
} heap;
VALUE ary[ROBJECT_EMBED_LEN_MAX];
} as;
};
Knowing all this we can see an insight into all Ruby objects: they are just a combination of a class pointer and arrays of instance variables.
Dynamic vs static dispatch
In Ruby all methods are dynamically dispatched by walking the RClass
method definitions until something is found. When it does find something it adds it to a method cache for faster lookup next time, and clears this cache when new methods are defined, even at runtime.
This kind of dynamic dispatch is at the heart of Ruby's amazing metaprogramming for method overriding, method_missing
, and other magic.
However Rust uses (mostly) static dispatch where method resolution is determined at compile-time based on the declared type of the variable or expression.
This swaps the runtime cost of method lookup to compile time which improves performance.
Parameters
Ruby has both positional and named arguments, but it didn't always. Pre 1.8 there was actually only positional arguments.
A common pattern at the time was to pass a hash as the last argument to hold something like keyword arguments:
def initialize(post, options = {})
@post = post
@style = options[:style]
end
In Ruby 2.0 we finally got keyword arguments which had better errors for missing keys than passing in the hash explicitly as a positional argument.
This is the reason that we have to put our keyword arguments last and our positional arguments first.
We have a ton of flexibility in specifying the types of parameters we can receive:
class Rectangle
def initialize(width:, height: 1, meta = {})
@width = width
@height = height
@meta = meta
end
def area
@width * @height
end
end
Rust on the other hand is quite limited. There are only positionals.
There are default values, but they come from implementing the trait Default
for your types. We also have to spread it in to any structs we create.
struct Rectangle {
width: u32,
height: u32,
}
impl Rectangle {
fn new(width: u32, height: u32) -> Rectangle {
Rectangle { width, height }
}
// Methods are just functions that take a reference (or owned value) of self
fn area(&self) -> u32 {
self.width * self.height
}
}
impl Default for Rectangle {
fn default() -> Self {
Self::new(0, 0);
}
}
In the example above we can never call Rectangle::new
by defining only the width unless we create a different function with a new name because Rust doesnt have any method overloading.
In Rust we care deeply about the types we can accept, and we can be more sure of our inputs because the compiler will tell us when it thinks we are wrong.
Classes
A Ruby class is a Ruby object that also contains method definitions, attribute names, a superclass pointer, and a constants table.
Classes hold references to their hierarchy by pointing to their superclass, if none is given then its set to the Object
class:
Classes are created as the RClass
struct and hold a table of their instance methods and a pointer to its superclass.
For class methods Ruby actually automatically creates a metaclass that holds them. Ruby sets this metaclass to be the class of your class, as in it sets the klass
pointer in the RClass
struct to point to this metaclass:
class Dog
def self.speak
"Woof"
end
end
Dog.singleton_class #=> #<Class:Dog> (this is the metaclass)
Dog.singleton_class.methods #=> [:speak, ...]
Okay, but what about class variables (not to be confused with class instance variables)? As a refresher, class variables are the ones with two @@
signs:
class Animal
@speak = "..." # A class INSTANCE variable @speak
@@shout = "..." # A class variable @@shout
def self.speak
@speak
end
def self.shout
@@shout
end
end
class Dog < Animal
@speak = "Woof" # A class INSTANCE variable @speak
@@shout = "WOOF" # A class variable @@shout
end
Dog.speak #=> "Woof"
Dog.shout #=> WOOF"
# Okay, that all looks good... But wait...
Animal.shout #=> "WOOF", uh oh...
Well actually both class variables and class instance variables both end up fitting into the RClass
under the iv_tbl
. However how it gets or sets each type of these variables is different.
When you get or set a class variable (@@
) it will look up in the ancestor chain (linked list of superclasses) for the same named instance variable and set or get it on that class instead.
Rust does not have a built-in object-oriented programming (OOP) system with classes, inheritance, and polymorphism.
In Ruby objects are instances of classes, and classes define both the data (attributes) and behavior (methods) of the objects. Inheritance allows classes to inherit attributes and behavior from parent classes, and polymorphism enables objects to be treated as instances of their parent classes or interfaces.
But in Rust, the primary abstraction mechanism is the "struct," which allows you to define custom data structures with associated fields. However, unlike classes in OOP languages, structs in Rust do not have inherent methods or inheritance.
Instead, Rust embraces a different paradigm called "composition over inheritance." It encourages the use of smaller, reusable components that can be composed together to create more complex behavior. This approach is facilitated by Rust's strong support for functions, traits, and generic programming.
Ruby actually also has "composition over inheritance" through its modules, which are the closest thing Ruby has to Rust's traits.
Modules
A Ruby module is a Ruby object that also contains method definitions, a superclass pointer, and a constants table.
Underneath modules are also RClass
structs. There is very little difference between them and classes other than not being able to spawn instances.
We know that the superclass we store in RClass
makes our class hierarchy a linked list structure. Each class points to its superclass and so on until reaching Object
which would return Nil
.
So when you include
a module Ruby is inserting a copy of the module into this linked list by having the copy point to the superclass and the class doing the including now points to this copy.
Of course extend
also does the same but by changing your metaclasses superclass to the module instead. And prepend
is some black magic where it surgically creates a clone of the class and moves methods into it, placing it after the included module in the chain.
Compiling
Compiling in Ruby happens in 3 separate acts:
- Tokenization -> tokens
- Parsing -> AST nodes
- Compiling
First Ruby turns the characters on the screen into "tokens" by reading the characters in the called Ruby file one at a time.
Then the parser takes these tokens and turns them into a structured representation of your code called an Abstract Syntax Tree (AST).
Before 1.8 your code was actually executed immediately as it is being read. It does this by walking the nodes in the AST and executing each one.
After 1.9 YARV was added which compiles your program to an intermediate form that runs on the Ruby VM. So YARV outputs instructions and then the actual line by line execution still happens, but read from the YARV instructions, not the AST.
Blocks and Closures
When you define a block in Ruby it creates a rb_block_t
and saves an ep
pointer to the location of the current stack frame and its local variables. This is how we have closures in Ruby.
Blocks also store a pointer to self
which is the value of self
where the block was defined.
When you define a lambda the current stack frame is copied to the heap AND the current stack frame is now modified to point to this copied environment. Calling retrieves this stack frame by setting the new stack frame's ep
to point to the data on the heap that was previously set.