Talks

BYOJ: Build your own JIT with Ruby

A talk from RubyConfTH 2023, held in Bangkok, Thailand on October 6-7, 2023.
Find out more and register for updates for our next conference at https://rubyconfth.com/

RubyConf TH 2023

00:00:07.720 Hello, everyone. Good afternoon! For those who are arriving from lunch, don't worry, you're just in time. Today, I'll be talking about building your own Just-In-Time compiler, or JIT for short.
00:00:14.120 So, what is a compiler? There are basically two types of compilers. One is your ahead-of-time compiler, which is used by languages like C, C++, Rust, and others. What it does is first compile your code into machine code, allowing you to run it as you wish. On the other hand, a Just-In-Time compiler executes your code while it's running. It monitors how often you call certain methods, and if any of those methods become 'hot'—meaning they're called frequently enough—it starts compiling them into machine code.
00:00:28.039 This technique is used by languages like Ruby, Python, and Java. Ruby, in particular, has had an interesting journey with Just-In-Time compilers. The first one that Ruby implemented was MJIT, which as mentioned today was a good first start but did not compile Rails code into machine code. As a result, we didn’t see the performance improvements we wanted in our Ruby applications, leading to a new approach that led to the creation of a technique for JIT compilation called lazy basic block versioning. This technique is used by YJIT, which significantly boosts the performance of Ruby's JIT. This was released in Ruby 3.0 and made the default in Ruby 3.2, continually improving in speed with each release.
00:01:12.280 There’s also another one called ARJIT. The interesting part about ARJIT is that it is built in Ruby itself, unlike other JIT compilers. This allows you to use it as just another class or gem and customize it to create your own JIT compiler. So, let's take a look at how we can use this.
00:01:40.319 When we say compile down to machine code, we need to consider which machine we are compiling for. Essentially, there are two kinds of machines: physical machines, which include the CPU types like Intel and Apple M1, and virtual machines, which execute your code. In the context of Ruby, this would be the Ruby VM, also known as YARV (Yet Another Ruby VM). Let's dive into how these machines work under the hood.
00:02:10.560 Let's say you have a simple program that calculates 1 plus 2. This gets translated into certain instructions that the Ruby VM understands. Since YARV is a stack-based machine, when you want to access certain values, you must first push those values onto a stack. The Program Counter (PC) keeps track of where in the program we are. So, initially, we push 1 onto the stack, increment the stack pointer, and then we push 2. Upon reaching the add instruction, we pop those two values from the stack, calculate their sum, and push it back onto the stack. This is how a simple addition operation works in the Y machine.
00:03:07.680 Now, let’s explore a physical machine. A physical machine can be register-based, meaning it uses registers for storage inside your CPU. Registers provide the fastest access to memory in a computer. Unlike stack-based machines, when working with a register-based machine, you don't push values to a stack; instead, you directly manipulate registers. For instance, we can move 1 to the R1 register and directly add values through another register. This differentiation essentially illustrates how both machine types handle values differently.
00:04:08.560 Next, let’s try JIT compiling a simple method that returns nil. Suppose we have a method named `f` that returns nil. According to our stack machine definition, we will push nil onto the stack and then return it. But where does nil go? Who are we returning nil to? To understand this better, we need to explore Ruby's VM deeper. Ruby VM utilizes instruction sequences, which are sets of instructions your Ruby code gets compiled into. By employing the dump parameter set to 'insns' in any Ruby program, it generates a set of instructions formatted as blocks. If we examine the block at the bottom that says 'put nil and leave,' we can see that when we leave the method, we still don't have a clear understanding of where we return to, prompting further exploration into stack traces.
00:06:00.720 Most likely, you've seen an error report printed when Ruby encounters an unhandled exception. This report outlines the error value alongside the context from which it originated, effectively tracking your program's path back to the top level. This trace system relies on specific data structures that keep track of information like the current method context and program location. Each context layer or level in our program utilizes its structure that must be stacked for proper error handling, permitting traversal back to the top level. This mechanism, known as control frames, captures various values while maintaining the program's execution state. Each frame encodes details such as the program counter, stack pointer, and the value of 'self' to allow efficient handling of returns and exits in your Ruby code flows.
00:09:05.360 When you're preparing to return from a method, you pop the control frame and that’s how you return to something. Now, let’s discuss how we can harness these methods to bake our JIT compiler. Since ARJIT is built in Ruby, we can utilize it to design our own JIT from scratch. It offers several utilities. Essentially, we replace the existing JIT compiler with our compiler class. To do this, we create a class called JITCompiler and define a compile function, allowing ARJIT to streamline the process for us by taking care of the boilerplate code.
00:10:02.120 The ASMR class here helps with compiling, but you can opt for other gems, like Fiddle, for machine code generation from Ruby. Let’s revisit our method that returns nil, having two instructions: 'put nil' and 'leave'. When we handle 'put nil', we fetch our instruction and craft it into an object for usage. We analyze each instruction type to write corresponding assembly code. In a register-based machine, this means 'put nil' essentially becomes a move instruction to the appropriate register space, allowing us to maintain proper stack emulation while executing the operation.
00:11:33.839 To implement the 'leave' instruction, we embark on incrementing the control frame pointer (CFP). This involves manipulating the CFP and managing the execution context, ensuring we keep track of the computational state through each operational level. Returning from our assembly proceeds by making the topmost value accessible, effectively managing implicit return behaviors intrinsic to Ruby. This structured approach allows us to compile more complex methods, maximizing both functionality and performance in our JIT pipeline.
00:13:44.480 Let’s analyze a more complex scenario: a method that computes the sum of two and three. The instruction sequence for this operation will involve putting both values onto the stack and performing addition as previously demonstrated. Instead of stacking, we fetch values for direct register manipulation during addition. Drawing from our previous addition mechanism, we end up effectively managing values to output correctly through registers while decrementing stack pointers as needed, smoothly transitioning from one computational realm to another during this complex JIT compiling task.
00:16:02.080 In benchmarking these JITs, both MJIT and YJIT can significantly outperform standard execution, showing speeds up to twice as fast thanks to optimizations done within the Ruby framework and dedicated effort by contributors from Shopify. Thus, practical implementation of JIT can yield noticeable boosts in execution times, helping developers streamline their Ruby applications while maximizing performance. Moreover, their adaptability allows for advanced manipulation and project specialization, encouraging developers to extend functionality further to their needs.
00:18:43.320 For example, if I create a method called '2 + 3', but mistakenly run it and receive a result of 420, I might ponder what’s gone wrong on a JIT level. This suggests my JIT modifications were improperly implemented, leading the sum operation to consistently output 420. Consequently, this highlights how micro-level machine manipulations, like monkey patching at the instruction set level, can yield bizarre results, propelling the need for precise calibration within our JIT logic. All things considered, I would like to extend my sincere gratitude to Takashi Kokubun. This talk wouldn't have been possible without his invaluable contributions and support, and I highly recommend checking out the 'Ruby JIT Challenge' repository where I gathered many insights for this session.
00:20:03.240 Thank you!