Keynote: YJIT's Three Languages: The Fun of Code That Writes Code

00:00:15.780 Before I say anything else, I want to express my gratitude. I mean, I've traveled around the world to be here, and look at all these people in this room! All the speakers who have come here to try and make everybody better and all of you who showed up to improve yourselves—wow! I keep going to different places in the world, and people show up and do that. I'm genuinely honored to be here.

00:00:38.780 In return, I will try hard not to waste your time because you've all brought a lot of it here. I'll do everything I can to do right by you. So who am I? I work at Shopify, specifically in the area of performance optimization. If you go to speed.watchet.org, you can view various Wiggly graphs about performance, which I contributed to. Additionally, I wrote a book called 'Rebuilding Rails.' Some of you have said nice things about it, and I appreciate that. I'm currently writing a book called 'Rebuilding HTTP,' which follows the same general idea—learning HTTP by building one.

00:01:04.680 More importantly, I'm the guy with the bear stickers. If you want some, just ask me. There was a stack out there, and I might bring out more later. However, I might not, so just come to me for them! Now, this might shock everyone who has been listening to me for the last few seconds—I tend to talk fast. If you think the person next to you can't quite keep up, don't hesitate to yell out, 'Slow down!' I would appreciate it; it would truly be a favor to me.

00:01:30.479 I love questions. This topic can be a little complicated. If you have questions, the person next to you probably does too, so please ask! I'd love to answer it. The worst that happens is that I say I'm not answering that, like if you ask for my ATM pin number—I'm not answering that today! But if you wait for another slide, I will mention that there will be a resource URL toward the end of the talk. When I put up snippets of code or URLs, you won't have to type them down feverishly. You'll get one chance to type the URL at the end to access everything at once in your browser. I don't give the resource URL at the beginning because I'd rather you pay attention and ask me questions.

00:02:25.560 Okay, let's discuss multiple representations all at once. It sounds complex, but why should you care? Why is this worth 40 minutes of your time, which you could be using for anything else? Well, a lot of software development involves keeping multiple representations in your head simultaneously, and we're going to talk in more detail about what I mean by that. I'll provide a very specific example, but much of what you do involves juggling multiple concepts at once.

00:02:37.200 I believe 'widget' is a really good teaching example of this. It just so happens to be what I work on, and I think at least some of you probably care about YJIT because it will make your Ruby faster. It also serves as a fantastic example of the cool functionality we can implement. Additionally, YJIT does a lot of translating between representations, which is another excellent method of learning. For those of you who work on interpreters and compilers, you certainly understand what I mean here.

00:03:15.060 So, what exactly is YJIT? It's a just-in-time (JIT) compiler built into CRuby. It has been integrated into CRuby for almost a year now—it will be a year in two weeks. We have been working on it longer than that, of course. YJIT makes your frequently used code run faster by compiling a method after it's been executed a sufficient number of times. It waits to compile the method because it can gather more information about the actual values and types being passed into the method. This leads to significant optimizations.

00:03:42.180 When I refer to 'CRuby,' I mean the same thing as 'Matz’s Ruby'—just plain Ruby or vanilla Ruby. I do not mean JRuby or Rubinius; those are different implementations. Now, you might be wondering if Ruby has had a JIT compiler longer than this. Indeed, the 'IMJIT' was the older variant, and it is still available. You can use either one; Ruby 3.1 includes both, and Ruby 3.2 will as well. However, IMJIT is more experimental at this point, while YJIT is preferred for speed reasons.

00:04:06.840 So, how does YJIT work? When you start your Ruby process, nothing is compiled; the interpreter is simply there. The default for the new YJIT is set at 30 calls. You can change it, but after a certain number of calls, YJIT recognizes that you will probably call this method multiple times and compiles it. It turns Ruby code into native machine code—waiting as long as possible to do that. It does so by running your method and compiling it piece by piece, allowing it to understand the types and values of all items involved by the time it hits the 30th call.

00:04:42.180 YJIT frequently swaps back and forth between interpreting and compiling. However, it is not concurrent—it's the same process switching back and forth repeatedly. It's not just for compiling; there are instances where the interpreter handles cases that we cannot handle at compile time. YJIT tries to focus on the most commonly executed code paths where it can provide the most speed improvement. There are many operations that are either slow, weird, or just not frequently used, and we choose to let the interpreter handle these methods instead.

00:05:13.860 Sometimes, YJIT has been passing an integer into a method continually, but if you change your mind and start passing a class instead, YJIT runs into issues—‘Oh, well, that isn’t what we've compiled,’ so the interpreter will take over. Sometimes assumptions simply break. You've repeatedly used time.now for an operation, and suddenly, you change its definition, monkey-patching it. YJIT will essentially wipe the slate clean at that point and will need to recognize the new definition once the method is invoked again. After 30 more calls, it will compile it again with the updated definition.

00:06:04.140 This may sound a bit like YJIT theory, but I work on this all day, and I could talk about it forever. However, let’s keep it practical. YJIT is based on what's called 'basic block versioning.' We'll explore how that works in detail, but it originates from a research paper by Maxim Chevalier, who is now my tech lead. She worked with Alan Wu to build the original prototype of YJIT, which has now evolved into the team I work with. The ECOOP 2015 paper describes it in detail, which is linked from the resource URL I will provide.

00:06:41.280 The idea is simple: YJIT takes a method, divides it into basic blocks—small chunks of functionality—and compiles those into machine code. The first implementation was in JavaScript, but Maxim came to Shopify and thought of ways to improve Ruby's performance. Shopify permitted her to pursue that. YJIT compiles what Ruby calls 'I6' instructions, which are the bytecode instructions, into native code by breaking them into smaller, manageable chunks.

00:07:09.900 You can explore the details of how YJIT works if you have a development build. If you’re curious about how to access one, it will be included in the resource URL. A snippet of code allows you to dump all the assembly instructions. If you’re one of those people who love to debug assembly, we will be thrilled to show you what we've been working on, just ask us! But remember, this only works on development builds of YJIT; regular builds will only show you the Ruby bytecodes.

00:07:55.680 Now that you’re a bit more acquainted with what YJIT does, I can explain the multiple representations concept, and why I think you should care about it. I reiterate this because it is central to ensuring I do not waste your time today. As a software developer, you likely juggle many programming languages. Rails developers might juggle additional domain-specific languages, while others may engage with APIs—each engaging with languages in specific contexts.

00:08:45.180 This juggling act of languages is part of being a good developer. The longer you practice, the better you become at it. I have a feeling that many people in this room excel in this skill, not just those who work on compilers and interpreters. This brings us back to YJIT—it's intriguing because compilers articulate their languages specifically. They delineate Ruby bytecode and define how their various operational languages differ. Therefore, concrete examples are always preferred over abstract theories because YJIT provides a unique mixture of languages, delivering tremendous power from their combination.

00:09:59.640 Ruby's bytecode happens regardless of using YJIT. Ruby inherently operates this way, even though YJIT modifies its outputs. The bytecode operates as a stack machine, meaning it pushes arguments onto the stack and pops off return values once calculations are executed. Despite YJIT's involvement, this remains the underlying mechanism of how Ruby functions. Ruby can provide you with a bytecode disassembly, and this snippet will be in the resource URL. You can easily perform this action to observe the bytecode your Ruby method produces.

00:10:36.540 Let’s take a very basic method that simply prints 'self' and, if its single input parameter is true, executes another print call. The bytecode of this simple Ruby method can be visualized in the highlighted column, where you can see how it translates into bytecode instructions. The bytecodes include 'put self' and 'put string Yes' followed by 'send without block.' After that, you see a routine pop to clear the stack, as with any stack machine.

00:11:21.600 The branch-unless operation implies a conditional jump if its condition is false, enabling us to conclude the method execution. When YJIT compiles these bytecodes, they transform into sequential chunks of machine code. The chunks are organized encouragingly—if you develop a complex assembly, it favors linear flow for performance, as you want streamlined machine language execution as much as potential. The more you can avoid unnecessary jumps, the more efficient your code execution can be.

00:12:15.900 Now, for those of you who feel more curious about the bytecode, Kevin Newton is currently working on an extensive series that delves deeply into this topic. It's linked in the resource URL as well. Now, let us explore a distinct method—a compact version that still executes a method call based on checking a condition. It yields several basic blocks, which illustrates how different configurations can exist across a method. The number of blocks might seem sizable relative to the method's simplicity, but splitting into small blocks can yield outstanding performance benefits.

00:13:31.200 As I just discussed, multiple languages coming together is fascinating because it enriches functionality. The code I am about to show is a simplified yet accurate portrayal of a segment of YJIT. If you were to look at this code, it would function perfectly fine in its current form as it reflects the current state as of last month. Here’s a condensed form of Rust code. For some of you, this might be recognizable, while others may not care—it's still just a language with types attached.

00:14:10.920 Recognizing it as the string concatenation operator can be efficient when YJIT can significantly optimize it, as the basic idea is quick, but executing it through a full interpreter call incurs delays. Hence, you can gain immense advantages from this. YJIT's implementation of the built-in string concat involves numerous parts, and we’ll discuss each of those in this context. The context includes what block we are compiling and the instruction within that block, indicating our position in the assembly process.

00:15:02.220 This process requires anticipating everything about the type, so we need to verify whether what we're working on at runtime is a string. If it's not defined properly, it leads to errors, which we prefer to defer to the interpreter. YJIT aims to minimize unnecessary checks; if a type is unrecognized, we can pass control back to it. If the situation is unclear or results in an error, we simply hand it over to the interpreter's mechanisms instead of complicating the JIT’s responsibilities.

00:16:27.960 Now, if we generate side exits, we will be doing that only when necessary; we won’t generate code for unlikely cases. This keeps our space usage optimized. If the remedies are unnecessary, we skip over them and focus on main code flow. Therefore, we check for types and when elements need to run against the Ruby stack to ensure we’re managing the types in optimal ways. This compiler language logic, paired with runtime elements, creates efficiencies not often present in traditional compiler machinery.

00:17:23.640 This programming intermingling feels complex, making it dizzying for both programmers and interpreters alike. However, the result of this is profound power—in fact, the compounding nature of what occurs is that ‘cheating’ becomes the fastest option, as you mix compiled-type checks with runtime optimizations. By knowing what the variables are designed to be before execution begins, we minimize overhead greatly, leading to swift assembly evaluations.

00:18:57.120 This coding efficiency emerges notably because YJIT verifies its assumptions; through doing so, we can execute the 'simple case' quickly. For this specific example, we are certain that it will always yield a string, thus allowing us to avoid checks. This reflective dance between compile time and runtime inspection enables significant reduction in executing complex type checks, as confirmed earlier. The practicality of this dynamic showcases why introducing multi-language strategies can yield accelerated performance and make YJIT so remarkably valuable.

00:19:57.920 Now, to wrap this up and clarify why this is relevant and to provide useful context: there is indeed a balance we must maintain between the cleverness required for high-level execution and the tricks we have to juggle in programming languages. A poor consolidation of methods leads to confusion and decreased understandability. By evaluating multiple languages and compiling them strategically, we discover areas of enhancement that yield amplified power, yet the task lies in keeping it all simple enough for general comprehension.

00:21:20.700 You’re now aware of how YJIT evolves through various programming paradigms. Thus, we must work hard to clarify this system of interactions. After all, the objective extends beyond just making it all work—it should make this efficient mechanism transparent for everyone. I hope you can see the synthesizing of values from YJIT across implementations. Our utmost responsibility in this venture is to extend that clarity toward all who code using it.

00:22:14.520 So what’s next for YJIT? As you might expect, I’ve had the opportunity to face these challenges daily. My job primarily resides in validating this practical implementation as we delve into new features. The memory usage has improved significantly this past year. We've worked hard on garbage collection for generated code, meaning you won't run out of memory even with extended use.

00:22:52.320 I reiterate that YJIT operates well on both x86 and ARM architectures. If you own one of the newer Mac laptops, capable of running on ARM, you’ll find it running efficiently without breaking a sweat. The version that will be released this Christmas will feature reduced memory usage and enhancements aligned with the Ruby ecosystem's continuous growth. You can learn how to use it more effectively through the resource URL, which will provide all necessary details.

00:23:36.720 For those utilizing YJIT in production, we would love to hear about your experiences. Please share them with us! Your feedback can contribute substantially to our future enhancements. Please do not hesitate to reach out to us regarding YJIT or any particular aspect of the codebase; your insights are invaluable.

00:24:24.360 The C Ruby source code includes a link to the lazy BBV paper—the foundational work behind YJIT—and I recommend checking out the URL for further details. In summary, I appreciate your attention throughout this discussion on YJIT's languages. It is fulfilling not only to share the enhancements that emerged from this journey but to demonstrate how collective efforts taken toward overcoming obstacles resonate within this community. Together we can shape a better future in programming language development!

00:25:28.920 Thank you!