Talks

A History of Compiling Ruby

A History of Compiling Ruby

by Chris Seaton

In the RubyConf 2021 presentation titled 'A History of Compiling Ruby,' Chris Seaton discusses the extensive history of Ruby compilers, highlighting the significant number of attempts to compile Ruby to machine code. While Ruby is primarily known as an interpreted language, Seaton reveals that there have been at least 25 different compiler attempts, showcasing the diversity of approaches and techniques in compiler design.

Key points of the presentation include:

- Development of Ruby Compilers: Seaton explains the evolution of Ruby compilers over the years, detailing how many have focused on just-in-time (JIT) compilation, adapting to Ruby's dynamic nature.

- Active and Historical Compilers: The talk categorizes compilers into 'active' and 'historical,' emphasizing ongoing projects like JRuby, TruffleRuby, and YJIT from Shopify, while also mentioning discontinued efforts such as Rubinius and Topaz.

- Research and Learning: Seaton argues that studying Ruby compilers can provide insights into broader compiler theory and practices, linking many compiler innovations to research conducted in the Ruby context.

- Compiler Architectures: He explains various architectures used in compilers, such as ahead-of-time (AOT) and just-in-time (JIT), illustrating with examples from prominent compilers and their specific mechanisms.

- Future of Ruby Compilers: The presentation also suggests avenues for future compiler research, including enhanced data structures for Ruby programs and polyglot approaches that might combine different languages' compilation processes.

- Practical Implications: Seaton encourages developers to engage with Ruby compiler development, noting the rich opportunities for experimentation and innovation in this field.

Ultimately, the talk is a comprehensive survey of the Ruby compilation landscape that highlights the richness of the topic both historically and in current research. Chris Seaton advocates for the preservation and study of Ruby compilers, indicating their importance in software development and compiler research.

00:00:11.200 Hello, I'm Chris Seaton. I work at Shopify on TruffleRuby, which is an implementation of a highly optimizing Ruby compiler built on the GraalVM and some cool technology known as self-specializing AST interpreters and partial evaluation. I've also worked in the past on another Ruby compiler called Rhizome, which is a just-in-time compiler built into MRI. Additionally, I maintain the Ruby bibliography, which is a list of academic writing on the Ruby programming language at rubybib.org.
00:00:41.840 It's a lovely day today here in Cheshire, United Kingdom. I apologize for not being with you in Denver; I would love to hear about what you're working on and share my work with you.
00:00:53.440 Today, I want to talk to you about compiling Ruby and provide some thoughts on the history of compiling Ruby, including what has been attempted in the past and what people are currently working on. Most people do not think of Ruby as being a compiled language; we seldom discuss the Ruby interpreter. However, a significant amount of work has been done on compiling Ruby over the years, and people have explored some fascinating and diverse approaches to Ruby compilation.
00:01:18.080 I think you will be surprised by how many Ruby compilers exist. We will learn a lot about compilers by examining Ruby due to the diversity of approaches. Even if you focus solely on Ruby, there is much to learn about compilers. Interestingly, we can trace major advances in compiler research of the last couple of decades through Ruby.
00:01:29.680 For my purposes, a Ruby compiler is defined as something that translates Ruby source code to native machine code that can run on your processor. This could involve some intermediate systems that generate machine code on your behalf, but the key point is that it must result in generated machine code. Therefore, I will not include systems like YARV, which compiles to Ruby bytecode that is later interpreted, as that is how MRI operates.
00:01:59.200 Compilation can be done ahead of time, similar to a traditional compiler for a language like C, where you provide a file and it generates an executable for you. Alternatively, it can be done just-in-time, meaning it gets compiled while your program runs, as is the case with Java and the JVM.
00:02:21.280 There is some judgment and opinion required on my part to interpret and categorize these compilers and to speak about them. I'm not trying to delineate what constitutes a compiler, and I'm also referencing a lot of others' work, which can involve filtering through a significant amount of information. I apologize in advance for any errors or omissions as I discuss what others have accomplished.
00:02:48.239 Did you know that there have been at least 25 different attempts to build a Ruby compiler? I was amazed when I discovered this many compilers while exploring the subject. I believe anyone in the Ruby community would have difficulty naming all 25 of these compiler projects. This is something from which anyone can learn by examining this list. It is actually more than 25, as some compilers have tried different approaches over the years. Specifically, JRuby and Rubinius have experimented with several compilation strategies.
00:03:12.960 So, why is it worth looking at Ruby compilers? Why should we even consider compilers at all? I find compilers fascinating; I've worked on them for a couple of decades. One great thing about compilers is that you can engage in conversations with anyone who is a developer about them. We all use compilers, and everyone has opinions on how languages should be designed, how fast the compiler should run, or how efficient the code should be after it has been compiled.
00:03:46.079 Compilers might seem like a deeply technical topic, but they are conceptually quite simple. They take a file as input, do something internally, and produce a file as output. This straightforward nature makes them fun and easy to understand, encouraging experimentation. It's surprising how many Ruby compilers there are.
00:04:06.240 Why have we seen such a large number of compilers for Ruby? Is there something special about Ruby that has driven this development? There is indeed renewed interest in Ruby compilers; just recently, a just-in-time compiler was integrated into MRI for version 3.0, and Shopify has introduced a new one for the next version of Ruby.
00:04:22.720 TruffleRuby and JRuby are also noteworthy examples. There seems to be an increase in discussions surrounding compiling Ruby these days, making it logical to reflect on its history and the wealth of existing work from which we can learn. It’s important to examine what has been attempted before, what didn’t succeed, and what has yet to be explored.
00:04:49.760 It’s fascinating that we can trace compiler research through Ruby compilers. You might not think of Ruby as being an academically intensive language, but numerous researchers have applied high-level concepts to Ruby compilers.
00:05:16.560 Finally, many projects I have cataloged in this initiative risk being lost to time. In fact, to get at least one compiler working, I had to ask someone to retrieve a hard drive from their attic to find some source code that would have likely been lost. Therefore, it is crucial to archive these compilers and gather them together.
00:05:32.960 What we are not doing here is comparing which compiler is better or conducting benchmarks or competitions. While opinions will inevitably surface, I strive to present this information fairly and objectively. Here’s a comprehensive list of the 25 compilers. As we progress through the presentation, I’ll break it down further, but it’s valuable to display them all at once.
00:06:20.400 For each compiler, I will identify its name, how long it has been active, whether it runs on top of MRI or some other virtual machine, if it operates just-in-time or ahead-of-time, what general approach it employs towards compilation, what the front-end processes are (the part that takes source code), what kind of interpreter it utilizes to support compilation, what kinds of data structures it employs, and who worked on them.
00:06:44.880 Let's delve deeper and discuss some of the key Ruby compilers currently in use today. First, there's JRuby, which is Ruby built on top of the JVM. It compiles down to machine code via JVM bytecode and is most likely the most widely deployed Ruby compiler in existence. It has been used in production for quite some time and is known to be very stable. There's also TruffleRuby, which operates on top of the GraalVM, compiled down to machine code via Graal's intermediate representation.
00:07:44.440 Next, there's YARV MJIT, which was introduced with Ruby 3.0, allowing Ruby to run three times faster. For differentiation, I refer to it as Jarvan JIT. There is also YJIT, Shopify's new JIT for MRI expected to be released soon, and Sorbet, which is an advanced Ruby type checker. These are the key compilers that currently have a realistic chance of being used in production, all backed by dedicated teams.
00:08:16.240 Unfortunately, we've lost some Ruby compilers over the years. Notable among them is Rubinius, a project that aimed to implement Ruby in Ruby itself, primarily written in C++ rather than Ruby. This compiler utilized LLVM, a framework for developing compilers. Another compiler is Topaz, which implemented Ruby on top of Python, akin to PyPy, but it is now discontinued.
00:09:08.800 There is also IronRuby, which was Ruby implemented in C# within the .NET ecosystem, as well as Ruby+OMR, which was Ruby built on IBM’s JVM, called J9, and reused components from J9 within MRI. Regrettably, these projects are currently stalled or discontinued.
00:09:51.680 Additionally, I’d highlight some intriguing niche compilers worth exploring. Hoxstad is a compiler that translates Ruby directly to ARM machine code, offering an easy-to-understand approach. Hyperdrive, by Jacob Matthews, implements a tracing compiler in Ruby. There’s also Natalie, which compiles Ruby to Super Pascal, and RubyX, which targets ARM machine code with a more sophisticated architecture.
00:10:13.760 We can categorize the compilers into several groups: serious compilers that are still under development, those from the past that have been discontinued, and fun niche demos, toys, and experiments.
00:10:42.560 This timeline illustrates the history of Ruby compilers, spanning from 2004 to the present day in 2021, with RubyComp being the earliest and TenDigits being the most recently identified compiler. I identify three distinct epochs in compiler development.
00:11:04.240 Around 2007, there was a significant surge of compiler implementations, including Hoxstad, Ludicrous, MagLev (Ruby on top of Smalltalk), Rubinius, Ruby.NET, YARV to LLVM, and MacRuby. These projects all emerged during this period.
00:11:47.800 The second wave, around 2017, marked the beginning of a lineage for the compiler present in MRI today—MJIT. Then, around 2020-2021, we saw another flurry of new compilers introduced, including YJIT and Sorbet.
00:12:29.280 While some of these compilers have faded away over time, many have persisted. Notably, JRuby has been active since 2006, with the project itself dating back even further, indicating the longevity of some compiler efforts.
00:12:50.320 Certain compilers have formed familial relationships with one another—Rubinius and JRuby were independent projects that contributed ideas to TruffleRuby. TruffleRuby utilized Rubinius’s core libraries and JRuby’s parser and encoding systems, illustrating how advancements can interconnect.
00:13:43.440 Moreover, LRB was one of the first more serious attempts to develop a just-in-time compiler within MRI, establishing groundwork for the YARV JIT compiler we see today. RTL MJIT was the initial implementation of MJIT, using a different architecture for object representation.
00:14:17.120 Although both RTL and RTL MJIT have been discontinued, the author of RTL MJIT is currently exploring the creation of a new just-in-time compiler, possibly called MEREM JIT, that could leverage a novel architecture called MIR.
00:14:37.519 To gain a clearer understanding of Ruby compilers, we can classify them into historical compilers and those still being actively developed. While a large number have fallen into disuse, there remains a substantial number that are actively maintained today.
00:15:09.119 We can also distinguish compilers based on whether they compile ahead-of-time or just-in-time. Most Ruby compilers employ just-in-time compilation to accommodate Ruby's dynamic nature, allowing them to generate machine code as the program runs.
00:15:27.440 Nonetheless, some compilers, like MacRuby, offer the flexibility to compile both ahead of time and just in time, while JRuby has evolved to accommodate ahead-of-time compilation.
00:15:43.760 Compilers can also be categorized as serious implementations versus demo implementations—those that signify dedicated efforts from teams to run substantial Ruby code.
00:16:10.880 Additionally, we can distinguish between compilers built on top of MRI and those that operate on custom VMs. The former modifies MRI to include a just-in-time or any other form of compiler, while the latter re-implements Ruby in another language.
00:16:38.360 Four attempts have been made to implement Ruby on top of the JVM, with XRuby and JRuby released around the same time. Ruby+OMR reused components from IBM's J9 JVM but operates within MRI while TruffleRuby was built atop the Graal VM.
00:17:01.600 Two notable attempts to implement Ruby on the .NET ecosystem were Ruby.NET and IronRuby, which emerged when Microsoft sought to create an ecosystem with several dynamic languages on .NET, a project that has since been discontinued.
00:17:28.560 Now, I’d like to take a deeper look at some of these compilers. Let’s start with Hoxstad, which compiles Ruby code ahead of time to ARM machine code. This compiler has a straightforward and traditional architecture.
00:18:05.200 Hoxstad scans Ruby's source code, identifying words, punctuation, and symbols in a process known as lexing. This produces a stream of tokens that proceed to a parser to create an Abstract Syntax Tree (AST), a data structure that represents the program. Then, the code goes through a preprocessor to analyze certain aspects of the AST, attempting to understand static elements before being handed to a generator that creates machine code.
00:18:50.720 The machine code generation process involves the creation of assembly code, a text-based version of machine code. Finally, the system’s assembler and linker run to create an executable, producing a clear and linear process that utilizes a single data structure, the AST.
00:19:14.799 Rubinius presents a fascinating case as it encompasses a complete re-implementation of Ruby, attempting several innovative strategies for compilation over the years. While Rubinius has incorporated just-in-time functionality, unfortunately, the project has stalled.
00:19:49.440 In Rubinius, Ruby source code is lexed in much the same manner as in Hoxstad, producing a stream of tokens. The tokens are then processed to form a C++ data structure—an Abstract Syntax Tree—which is subsequently converted using a Rubinius library into a Ruby data structure to facilitate compilation into bytecode that mimics YARV bytecode. The code is executed within the Rubinius interpreter, and when a method is called sufficiently often, it reaches a compilation threshold prompting the asynchronous compilation process.
00:20:45.760 This process translates the bytecode into LLVM intermediate representation (LLVMIR), enabling sophisticated optimization techniques. Rubinius leverages a series of LLVM passes, including both standard optimizations provided by LLVM and custom passes tailored for Ruby. The resulting machine code is stored in memory rather than output to disk.
00:21:36.160 If optimizations fail, the compiler can de-optimize back to the interpreter seamlessly. Considering the complexity of compilers, the aspirations around optimizations and the relationship between the components can shift dramatically.
00:22:20.480 Next, let’s look at JRuby, which presents additional complexity due to its numerous tiers of compilation. JRuby takes Ruby source code, lexes it, and utilizes its own representation known as its own IR. Initially, it runs through a startup interpreter, a simple interpreter for the IR. As execution continues, it can attain different states (or tiers) based on the intricacies of the running program.
00:23:05.200 A compiled method is marked as 'hot' once it crosses a certain execution threshold, triggering asynchronous compilation via a background thread. In JRuby, a control flow graph is generated and passed through various optimization processes that incorporate custom optimizations designed within JRuby.
00:23:55.760 This leads to the generation of JVM bytecode stored in memory as classes. The bytecode subsequently enters the JVM, where it faces further compilation from the JVM's just-in-time compilers, ultimately resulting in actual machine code. At any of these stages, if unexpected issues arise, the program can revert back to the bytecode interpreter.
00:24:57.440 Consequently, JRuby is capable of up to nine tiers of compilation, striving to optimize Ruby performance and efficiency.
00:25:29.440 We can track advancements in compiler research through Ruby's ecosystem, where research ideas introduced concepts such as tracing, meta tracing, lazy basic block versioning, self-specializing ASTs, and partial evaluation.
00:25:54.240 Tracing originated from a project called Dynamo and involves monitoring the instructions executed during the program’s run, generating machine code accordingly. Meta tracing, which involves tracing the code that comprises an interpreter, automatically generates a compiler from it.
00:26:29.280 Lazy basic block versioning refers to a mechanism whereby snippets of code without control flow are compiled into machine code. Self-specializing ASTs allow programs to optimize themselves at runtime by substituting less efficient components with improved versions. Partial evaluation offers another intriguing avenue, aiming to create a smaller equivalent program by executing portions of the program during its compilation.
00:27:17.760 In Ruby, the application of partial evaluation is happening through TruffleRuby and has been discussed in various RubyConf talks. Some themes to explore include the pursuit of automation in the generation of Ruby compilers, where efforts have been made to create a unified codebase from which multiple compilers can be derived.
00:27:54.960 Several years have seen attempts to push for the development of advanced data structures to represent Ruby programs. More advanced data structures can facilitate further optimizations, as they provide deeper insights into a program’s workings and behaviors.
00:28:26.960 In conclusion, there remains tremendous potential in polyglot combinations of compilers. For instance, TruffleRuby is undertaking efforts to merge a compiler for regular expressions with its Ruby compiler, demonstrating optimal speed improvements from combining both.
00:28:50.960 Future endeavors might consider integrating Ruby with other languages, such as JavaScript or SQL, into a unified just-in-time compilation framework.
00:29:13.760 For anyone interested in compilers, RubyConf offers a wealth of talks on various subjects related to compilers, showcasing their popularity within the Ruby community.
00:29:37.760 I recommend checking out presentations on the YARV MJIT compiler, TruffleRuby, and other Ruby compiler efforts. For resources and discussions, follow compilers like that of Tim Morgan, who streams his Natalie compiler development, and Vadar Hoxstad, who has published an insightful blog series on his Ruby compiler.
00:30:19.760 My Ruby compiler survey is available at rubycompilers.com, where I also maintain a repository of old compilers to help preserve their legacy. If you would like to discuss anything further regarding Ruby compilers, please feel free to reach out to me on Twitter. Thank you for listening!