Torsten Rüger

Compiling Ruby to Binary

Ruby Unconf 2019

00:00:02.540 Welcome everyone! As we learned from the previous session, we start punctually instead of six minutes late. It's now 1:00 sharp. Thank you for being here.
00:00:09.380 Great! You all made it back. Thanks for joining us. Up next, we have Torsten. Torsten actually builds wooden frames for a living before he started programming. So, please give a very warm welcome to him.
00:00:29.590 Hi there! Sorry, that was actually the wrong way around. I now build houses after coding for a relatively long time. Today, I’m going to show you how it's possible to compile Ruby to binary. This ties in nicely with what was mentioned earlier about not letting anyone tell us what's possible or not. People often claim that you need to interpret a dynamic language, and that’s just not true.
00:01:09.909 A little about me: You can find me on GitHub, although I’m not very social. I've been coding on and off—not professionally—mostly as a hobby. This project is truly a passion for me. I'm originally from Finland, and here's a typical Finnish way to enjoy your time, which may be seen as odd in other parts of the world.
00:01:37.719 I run a bed-and-breakfast halfway between Helsinki and Lapland with my wife, which has allowed us to spend some enjoyable time together. In the next half hour, I will show you that compiling Ruby is not difficult, although it can get relatively deep. I’m trying to find the balance between giving you an understanding of how it can be done without cheating and sticking to 100% Ruby, which would amount to about 10,000 lines of code.
00:02:38.150 I love Ruby and have for quite some time. I hope this could become one of the many contributions that the Ruby community brings to the larger programming community. Compiling dynamic languages like Ruby remains a challenge, primarily due to performance issues. Does anyone here know about Smalltalk? I found Smalltalk's approach quite impressive when I read about it in the 90s.
00:03:20.970 The major problem Ruby and most other dynamic languages face is speed, particularly when compared to big companies that devote lots of resources to enhance performance. I experienced this firsthand when I first bought a Raspberry Pi. Essentially, Ruby turned my laptop into a Raspberry Pi and a Raspberry Pi into an Arduino, which shouldn't happen—it all results from being interpreted. This exemplifies the classic two-language problem: having to choose between a nice language and a fast language. With this project, I hope we won't have to make that choice anymore.
00:04:34.710 By compiling Ruby, we can address these performance issues effectively. A faster implementation means we wouldn't have to choose between beauty and speed. I often refer to this phenomenon as the 'Rails effect', where developers empowered to modify their tools achieve a significant boost in productivity.
00:05:03.620 Before diving into the compiling part, let’s quickly review some basics. Any program takes input and produces output. In MRI Ruby, your source code acts as input that produces output as well. However, in compiling languages, we have two stages: first, the compiler processes a source file and produces an output file, which is binary.
00:05:36.530 So if we were to do this in Ruby, we would create a hello world program and run the compiler I’ve written, resulting in an executable binary. This approach is feasible with the code I created. Subsequently, we can execute the hello program, which functions seamlessly, similar to how one would expect a more complex program to run.
00:06:58.370 To achieve this with a Ruby compiler, we need to create a program called Ruby. When we call Ruby, it’s the output of another compilation process. In some architectures, we need to compile the compiler itself first. How many people are familiar with this concept? This is how many decent compilers are structured—the C compiler follows a similar process in the initial step.
00:07:43.820 I will delve deeper into this project today and discuss the different components involved. As of now, basic structures and classes are working. We have a calling convention that is easy to understand compared to the C calling conventions, which I will illustrate later. There’s also a memory layout and a process for creating binaries, which I won't cover in deep detail since there's a standard (the ELF standard) that specifies how binaries are laid out.
00:08:25.010 We have functional control structures including if-else statements, and while loops. However, no breaks or continues yet but they are part of the process. Dynamic method sending is key; it involves identifying the method to invoke and managing arguments, all while ensuring efficiency through caching.
00:09:03.660 I have even been able to implement blocks. In Ruby, blocks are implicit, allowing us to capture and pass them as arguments, performing ‘where’ and ‘return’ operations seamlessly. Ruby is a complex language and although there are features that haven't been implemented yet, the foundational aspects were crucial to tackle first.
00:10:15.360 Regarding runtime compilation for procs, it’s a bit trickier, but exceptions can be handled similarly, where you’d need a properly defined calling convention. The beauty of structuring a new Ruby compiler from scratch is that you can keep it simple and designed for today's needs.
00:10:51.390 Every object in this environment is well-defined. Has anyone worked on extensions or explored the MRI code? Unlike MRI where everything pretends to be an object, in RubyX, everything is genuinely treated as an object, even data types like integers. This allows for a standardized approach to managing lower-level functions and data.
00:12:09.200 Consequently, we face challenges due to Ruby’s dynamic nature. We can change classes at any moment, equating classes with types presents issues, as types must remain immutable. Each object possesses a type, which does not change, while classes constantly evolve; therefore, managing type references is crucial.
00:13:11.520 Let's summarize: the compiler is structured in layers, similar to any program. Starting with Ruby code parsed by a Ruby parser, which produces an untyped data structure before creating a typed version. Ultimately, we can generate machine code for the compiled output.
00:14:11.530 The compiler comprises various layers where each processes essentially folds down into greater details. Importantly, I have created a virtual object-oriented language to handle the complexity of Ruby, eliminating redundancies and simplifying operations. We limit to straightforward operations like 'if', avoiding excessive syntaxes that Ruby typically offers.
00:15:43.390 Moving on, machine layers utilize instruction lists instead of syntax trees, focusing on linear processing without complex structures such as control or memory variables. The architecture consists of a minimalistic model that can process tasks logically.
00:16:09.990 The ARM and RISC architectures are implemented for broader compatibility, and I aim at creating an interpreter for easier developing of the compiler itself. As a result, debugging binaries becomes more manageable, although the process might seem daunting initially.
00:17:52.091 By providing a visual interface for the interpreter, debugging becomes intuitive. The project is structured to enable tests, which is significantly helpful, and there's an online visualization tool that helps demonstrate this functionality.
00:19:31.630 The project represents about 500 lines of code and employs Opal to visualize activities within the RubyX project. Through this interaction, programmers can single step through the compilation, monitoring how objects load into registers and ensuring accurate evaluations.
00:20:21.850 In this debugging environment, users spot instructions and visualize outcomes such as 'hello world' and relevant return values produced by executed code. This method of testing results in comprehensive visual feedback.
00:21:31.530 There’s one last question opportunity. Yes, let’s take one. So you’re inquiring about benchmarks? While this isn’t a full implementation yet, I carried out preliminary micro benchmarks. The benchmarks indicate it runs at about 50% slower compared to C, but runtime compilation can substantially increase speed by inlining.