Performance

Summarized using AI

Some Assembly Required

Aaron Patterson • November 08, 2022 • Denver, CO

In the presentation titled "Some Assembly Required," Aaron Patterson, an engineer at Shopify and a prominent figure in the Ruby community, dives into the process of creating a Just-In-Time (JIT) compiler named TenderJIT purely in Ruby. The talk, delivered at RubyConf 2021, outlines not just the theoretical aspects of JIT compilation, but also practical steps involved in building one from scratch.

Key points discussed in the video include:
- Overview of JIT Compilation: Aaron explains the concept of JIT compilation, which involves generating machine code at runtime. He emphasizes that any language capable of assembling byte sequences can eventually build a JIT compiler.
- Goals of TenderJIT: The project aims to be fun, purely written in Ruby, potentially installable as a gem, and to enhance performance. It primarily acted as a learning tool for Aaron to deepen his understanding of compiler design.
- Understanding Ruby's Virtual Machine (YARV): The talk provides insights into YARV, Ruby’s stack-based virtual machine, comparing it with actual CPUs that are register-based. He highlights the differences in how these machines handle operations such as addition.
- Machine Code Generation: An essential aspect of building a JIT is generating and executing machine code. Aaron introduces Fisk, an x86 assembler in Ruby, and demonstrates how to assemble code and execute it dynamically.
- Automating Code Translation: Aaron discusses the requirement for converting Ruby code into machine code automatically. He introduces a mini interpreter, showcasing how it can execute basic operations by translating Ruby bytecode into machine code via a virtual stack that maps stack-based values to CPU registers.
- Performance Optimizations: Toward the presentation's conclusion, Aaron elaborates on some optimizations that can be made in the compiled code, including reducing unnecessary instructions and directly returning computed values.

The culmination of the talk reveals that with the right components—a method for accessing Ruby bytecode, an assembler for x86, and the ability to write directly to memory—one can successfully implement a pure Ruby JIT. The key takeaway is an openness for attendees to contribute to TenderJIT or develop their JIT compilers using the methodologies presented.

Aaron closes the talk by encouraging community engagement in both TenderJIT and related projects, highlighting the collaborative spirit within the Ruby community.

Some Assembly Required
Aaron Patterson • November 08, 2022 • Denver, CO

Some Assembly Required by Aaron Patterson

Let's write a JIT for Ruby, in Ruby! We're going to learn how a JIT works from the ground up by building TenderJIT, a pure Ruby JIT compiler. First, we'll learn how Ruby's virtual machine works, then we'll learn how to plug a JIT in to the virtual machine. Finally, we'll generate machine code from our Ruby programs. By the end of the presentation we'll have a working JIT that converts Ruby code in to machine code all written in pure Ruby! But don't forget: some assembly is required!

RubyConf 2021

00:00:10.639 This mic hot? Oh my gosh, we're on! All right, um, hello everybody.
00:00:17.760 Welcome to RubyConf! I'm so happy to be the first person presenting to you here today. Actually, let's get going because I have 138 slides and only 29 minutes left, so let's move. Actually, all of you folks coming in from the back, don't worry, you're not late, you're just in time.
00:00:47.600 I'm terribly out of practice with speaking since I haven't presented in person for a while. I work for a company called Shopify and I really enjoy working there. One of the reasons I like it is because they actually pay us to talk shop.
00:01:12.960 All right, thank you! Yes, I'm known as Tenderlove online. If you don't recognize me, this is my icon on the internet. I decided to dress like that today so you'd be able to recognize me more easily. I'm on the Ruby Core Team and also the Rails Core Team, and I'm extremely happy to be here.
00:01:33.759 I'm really happy to be seeing all of you in three dimensions and in HD. For those of you watching on the live stream, I actually have my iPad set up here with Discord open; so if you type some stuff in there, I might see it.
00:01:58.000 I love local things! Anytime I go somewhere different, I try different local specialties. Last night, I had some Coors Banquet because that’s a local beer, and I was excited about that. Also, I went to lunch the other day with a couple of friends. We sat down, and I asked the waitress, "What is local here?" She thought I meant locally sourced food, but I meant local specialties.
00:02:26.239 Anyway, I was reading the menu and one thing caught my eye, which was chili colorado. I thought, 'Wow, it has Colorado in the name, so this must be something local!' But it turns out it is a traditional Mexican dish, not chili from Colorado; it just means it's red. This confused me because I feel like that's the default color—if you say chili, it just means red in my head. I was kind of disappointed. If you have recommendations for local specialties I should try, please let me know; I would love to try them.
00:03:01.920 Now, I'm going to talk to you a little bit about a project I've been working on called TenderJIT. You can find the repository here. It is a JIT for Ruby that is actually written in Ruby. This isn't the first JIT for Ruby that's written in Ruby; there are a couple of others I want to mention.
00:03:26.239 One of the predecessors is Rhizome by Chris Seaton, who's one of my colleagues. It's not really designed to be used as a JIT per se, but it shows you how JITs are supposed to work so you can understand how to build one. The other one, which I think is really interesting, is called the Ludicrous JIT Compiler. Chris told me about this project and it’s about 12 years old. It is probably the first JIT for Ruby written in Ruby, though it does have a C dependency, whereas TenderJIT does not.
00:03:52.640 TenderJIT's goals are simple. I’m not going to talk about TenderJIT the whole time, but I want to tell you a bit about the goals of the project. First off, it is to be fun! I think it is a fun project, so I will check that goal off—not to toot my own horn, but I enjoy the code that I write…unless I revisit it a few years later and think, 'This is awful!'
00:04:57.920 The other goal is to be written in pure Ruby, and indeed it is currently written in pure Ruby. I also want it to be installable as a gem, which it isn't right now, but it will be at some point in the future. The final goal is to speed things up. Maybe this isn't necessarily a hard and fast goal.
00:05:31.040 So why did I write this or work on this project? I mainly started this project essentially as a learning tool. My team at work was working on a project called WYJIT and you should check out Maxime's presentation later on in the conference to learn more about that JIT compiler. To be honest, I didn’t feel very comfortable working on this JIT compiler because I'm not a JIT expert necessarily.
00:06:20.560 So, I started working on TenderJIT to level up my skills and become better at my job. I wanted a low-stress place to practice and figure things out. The other reason was to see if I could build this thing. For me, programming is a creative and fun endeavor. I love to program, and many times I’ll write a project just to see if I can do it. This is one of those cases.
00:06:54.160 I think maybe people are asking if this actually works. Here’s some sample code to show you. It doesn’t automatically compile stuff yet—you have to specify, 'Hey, please go compile this thing,' and it will compile it.
00:07:04.880 If we run this and compare the JITted version versus the interpreter version, you'll see that the speed is actually better. The base case takes about 10 seconds to run, whereas the TenderJIT compiler takes about four seconds. Of course, people are probably asking, 'Well, how does this compare to a real JIT compiler?' So I benchmarked that too.
00:07:42.160 If we run it with -for YJ, you’ll see that it’s half the speed of TenderJIT. So, maybe Detroit is not that fast, but I think this can be fixed in the future, as actually the design of my JIT compiler is basically 90% borrowed from WYJIT. In the future, I think the top speed could probably match that of YJ.
00:08:24.320 Anyway, TenderJIT isn't why I'm here. What I really want to talk about is building a JIT compiler from scratch using pure Ruby. What I hope is that, by the end of this presentation, you'll be able to build your own TenderJIT, or at least contribute to TenderJIT or even contribute to WYJIT since the design is very similar.
00:09:07.600 The first question I want to answer, though, is 'What is a JIT?' To me, a JIT is something that emits machine code at runtime. Then you have to ask, 'Okay, what is machine code?' I think this is a very interesting question. To me, machine code is just a sequence of bytes.
00:09:44.720 The CPU interprets that. I really want to emphasize that machine code is just a sequence of bytes. So, any language that can put together a sequence of bytes is also capable of producing a JIT compiler. I think it might be fun in the future to write, say, a JIT compiler in SQL or even possibly in PostScript.
00:10:05.480 I’ve always had this idea: what if we could take all those printers that are just sitting around in offices and turn them into Bitcoin miners? We’d all be rich, right?
00:10:30.640 Anyway, we can actually assemble; we can put together sequences of binary data in Ruby. That's not a problem. Here's an example of doing that: we have a string, we just put this print on the string, and actually, if we pipe the sequence of bytes to a disassembler, we'll see that, in fact, it's machine code.
00:10:56.360 So we actually have machine code output here, and we're doing it at runtime. So does that mean we wrote a JIT? Maybe. What we're going to be doing today is writing a template compiler. A template compiler is exactly what it says: it's a compiler that uses templates.
00:11:32.960 I don’t mean this in a computer science jargon way; I mean it in a really literal way. For example, think about a Rails application where you have an ERB template—much of the HTML is just static HTML, but you have some sections that are dynamic, where you insert dynamic values. We’re basically doing the same thing, but instead of building up an HTML page, we’ll be doing this with machine code.
00:11:56.880 So let's build a JIT! We're going to build a legit JIT, and we need to use source control for this. So this is going to be a legit JIT stored in Git.
00:12:30.839 Okay, no laughs. Fine. If we refactor this sentence a little bit, then we can factor out the JIT. We're repeating that here, so we've got two legit JITs here in Git.
00:12:37.200 Am I too old for this audience? Thank you, thank you!
00:12:43.920 So, in order to build a JIT, we basically need two things: first, a way to generate machine code at runtime. The other thing we need is a way to translate our Ruby code into machine code, and in order to do this, we need to have some kind of translation process where we understand what we’re translating from and to.
00:13:27.840 I want to talk a little bit about the different machines that we’re dealing with. We have YARV, which stands for Yet Another Ruby VM, and that’s the virtual machine that backs your Ruby program. So we have a virtual machine there, and then we have another processor which is your actual CPU.
00:13:56.079 We need to be able to translate from one machine to another, and to do that, we need to understand the differences between these two machines. YARV is what we call a stack machine: it’s a stack-based virtual machine. Here’s an example: let’s say we have some sample code like 5 + 3. Maybe the compiler compiles that into three instructions: push 5, push 3, and then plus.
00:14:30.799 When the virtual machine executes this, it first calls push to put 5 onto the stack, then calls push again to put 3 onto the stack. The plus instruction pops two values off the stack, computes the addition, and then pushes the result back onto the stack.
00:15:20.720 This is a good intro to Ruby internals; so we have some vocabulary for talking about this stuff. This arrow right here points at the instructions. This is called a program counter (PC) which keeps track of what we're executing right now.
00:15:56.320 This other arrow is the stack pointer (SP) which keeps track of where the top of the stack is. In Ruby, we’re always pointing at the next place we're going to write to. Now, we can see these YARV instructions. The virtual machine code that I put here on the slide isn't actually YARV code, but we can see the real YARV bytecode.
00:16:26.240 Here's an example: we have a sample program that’s just 5 + 3, just like we looked at. We can call Ruby with '--dump' and it will show the instructions for that program, which is very similar to the fake instructions I showed earlier.
00:16:50.000 What's really nice about these stack machines is we don’t have to track where to store stuff or where temporary variables go. We can treat this as if we have infinite stack depth. It might not actually be infinite, but we can believe that. For example, if we call a function and pass a bunch of different values, our stack might be seven deep.
00:17:21.760 The next thing to look at is the CPU, which is a register machine. The register machine works a bit differently. It has a limited number of registers it can deal with. It also has a stack, but we mainly deal with registers in the CPU.
00:17:47.680 Rather than storing values in the stack, we'll store temporary values in registers. Here’s another sample program: 5 + 3. In the machine code for that, we say that we are copying 5 into register R1 and copying 3 into register R2. Then we want to add R1 and R2 together. In assembly language syntax, R1 will hold the result.
00:18:18.080 Next, we need to generate machine code. Now that we have an idea of the two different types of machines we're dealing with, we need to convert one into the other. I wrote an x86 assembler called Fisk, which you can check out on my GitHub. It allows you to write assembly and put it together at runtime.
00:18:56.440 Here's an example of it: on the left-hand side is the Fisk Ruby code that we write, and on the right-hand side is the disassembled output. You can see here we're writing 5 to R9 in the Ruby code, and on the right is the machine code.
00:19:29.040 Now that we have this machine code, we need to run these bytes. It’s cool that we assembled it and could disassemble it, but we want it to execute. To get this to run, we need to somehow point the CPU at those bytes, and Fisk actually comes with a helper to let you do that.
00:20:06.120 So here’s an example of using that helper. We write some assembly code, then allocate some executable memory to point the CPU at. We ask Fisk to write the bytes to that executable memory, and finally call a function returned to us from this method.
00:20:41.600 When we do that, we see the output is 8—we were able to write machine code into executable memory and execute it. What's nice is that since this returns a lambda, we can pass that into a method and say we want to define it.
00:21:17.520 Did we do it? I think we did! Yes! In pure Ruby, we did a JIT. I have to admit, we should put this 'pure Ruby' in quotes because yes, technically this is Ruby code, and this goes into machine code. Yet it looks a lot like machine code.
00:21:43.360 I don’t really want to put this inside my Rails controllers; I feel we might come across some maintainability issues. So, what we would like is some kind of automatic conversion—we want to write this method "five plus three" and have it automatically translated into machine code. We need to write a YARV to x86 converter, and that’s what we’re going to do now.
00:22:24.720 With the remaining time, it's pretty easy for us to dump the instructions with a command, but we can’t programmatically access them that way. There is a class called Ruby VM Instruction Sequence that gives us programmatic access to those instructions, and we can say we want the instructions for our add method.
00:23:13.920 With these, we can write our own mini YARV. Our mini YARV has a case statement for each of the instructions we’ll deal with, like put object and opt plus. Each of these blocks manipulates a stack. Remember, Ruby's virtual machine is a stack-based machine, so we’ve written our own stack using an array.
00:24:09.760 If we run this code, we see that we push 5 and 3, pop off 5 and 3, and finally return 8. Now, we want to convert this into machine code that the CPU can execute. The secret is this stack object. Returning to our point, the Ruby VM is a stack-based virtual machine, the CPU is a register-based machine.
00:25:04.800 We need to create essentially a virtual stack. The virtual stack maps registers to positions in the stack. This virtual stack only has a maximum depth of two. Whenever we push onto the stack, it will return the value or the location where we write. When we pop it, it returns where we should read.
00:25:56.480 Using this interface, we can integrate with our virtual machine here. When we call push, we write the value to our virtual stack and then when we do the ad, we read, add the two values together, and write back. Let's step through the CPU state during this process as well.
00:26:46.120 Now we can refactor this code to compile a particular method. We want to create a callable. We can now call the fast add method and see the result is 8. We’re nearly done! We’ve converted Ruby to YARV and YARV to x86.
00:27:30.560 But let’s optimize this quickly: every time we do this add with a copy, we can swap the values. We don't need to copy values into R10 and then immediately read them—since we know that A + B = B + A, we can simplify the operation.
00:28:16.920 When returning from a function, we know the stack depth is always one. We can change our virtual stack to manage only RAX and R9. From this, we know that the last value will always end up in RAX, and we'll eliminate even more instructions.
00:28:59.680 We have additional optimizations we could explore down the road, but in essence, we’ve built a legitimate JIT! Yes!
00:29:43.800 Now, we must define a new method—a new function to use this. In order to achieve that, I want to dive into how the virtual machine works. Here’s a VM entry point: VM exec. It calls MJIT exec, which manages whether to run normally or through JIT.
00:30:24.640 We can create a method that tells it we want the contents of memory, using Fiddle library—it allows us to read memory at a specific address. For example, if we have a frozen string, while trying to write to it would fail, we can use Fiddle to access its underlying object and write to that part of memory instead.
00:31:18.960 Here's how it works: you take a string, access its underlying pointer, and say to write there in memory. You can mutate the frozen string. Now if we create a method that assembles and writes to an address, we can control the outcome neatly.
00:32:01.200 So we need to write every memory address and understand where that information goes. Thus, we will utilize DWARF data to understand the layout of memory and assign our functions methodically.
00:32:42.560 That's the end goal—to successfully compile and run a JIT in Ruby that's functional and efficient! Thank you! Please check out the GitHub repository for TenderJIT, and contribute to the project!
00:33:27.760 And, thanks to Shopify for employing me and the organizers of RubyConf for such a wonderful opportunity. Thank you all for coming here in person and seeing me today. It was great to see you all in 3D!
Explore all talks recorded at RubyConf 2021
+95