RubyKaigi 2023

Fitting Rust YJIT into CRuby

RubyKaigi 2023

00:00:13.340 There's a lot of complexity in this area, to the point where it's almost fractal. It feels like every time you zoom in, there's more detail.
00:00:22.380 So, we only have finite time today, and I came up with an analogy to help explain this infinite beast. If you squint a little bit, this is actually wordplay—the talk is about fitting in.
00:00:36.480 You can imagine the Rust part as YJ either changing schools, moving cities, or maybe even immigrating to a different country. So anyways, I'm going to start by giving you some background on why we wanted to use Rust.
00:00:54.719 A little bit after that, I'll talk about aspects of the code base relevant to plugging in the new language. I’ll compare and contrast C and Rust in this context and discuss how we've handled the differences.
00:01:06.659 To finish, I'll give a simplified model of object file formats and linking. Paseo has a fair bit of predictive power.
00:01:27.000 Okay, YJ was cited as a prototype in September 2020, when RNA strands encased in proteins were flying around my scene. I named the prototype microjet; it wasn't called YJ back then and worked by extending the interpreter runtime.
00:01:41.220 The C compiler picks a custom calling convention for the blobs of machine code that make up the interpreter. I wrote the hack to recover that calling convention for use at runtime.
00:01:53.280 The actual code generation mechanism was fairly simple. It was basically just straight-up templating. You look at what the instruction is and spit out a predetermined piece of code. If you're interested in more detail, you can check out my talk from last year.
00:02:12.239 After the prototypes showed some success, we got the go-ahead from Shopify to spend more time on the project. That's what eventually grew into YJ and what we shipped in Ruby 3.1 in 2021.
00:02:30.959 Aaron Patterson and Kevin Newton joined the effort that year, and John Hawthorne from GitHub also helped. The rendition of YJ was written in C, like the rest of CRuby.
00:02:43.739 We pretty much started to port YJ to Rust at the start of 2020, soon after we released the first version of YJ. In 2022, that's when development really sped up.
00:03:01.220 The team grew a little bit; Noah Gibbs, Jimmy Miller, and Kokubun joined us. We were able to do a lot. We reduced memory footprint and improved peak performance. We also added support for ARM chips, which is important for Apple laptops and some server deployments.
00:03:19.500 By the time the release date came around five months ago, last Christmas, we had enough production deployments of YJ that we felt comfortable removing the experimental label from YJ. That pretty much takes us to present day: May 2023.
00:03:37.560 Okay, a bit about why we wanted to use Rust. When we started working on YJ, the team was new to C. I had some experience, but not that much, especially in JIT compiler contexts, and the rest of the teammates had not that much experience with C.
00:04:04.019 It seemed like we had a lot of learning to do regardless of the language choice. Rust seemed to have a lot of learning resources available, and it came out of the box with a bunch of nice tools, like creature comforts such as not needing to declare functions twice, a built-in test runner, and all that.
00:04:20.959 There were also some more subtle complexities that it offered. We had some classes of bugs that we knew could be ruled out at build time if we approached things correctly with the borrow checker. So anyways, Rust is nice for us, but our users probably don't care.
00:04:46.919 It's a bit of a tautology to say that Ruby users care more about Ruby than Rust, but I feel that most Ruby users don't really interact with the language that implements Ruby underneath. So you know, if you don't use YJ, you probably also don't look at the C code that implements Ruby that much either.
00:05:15.180 We'll see in a little bit why using Rust can actually be a burden for our users. So what do users care about? I think YJ tries to build up this facade of being able to magically speed up Ruby code, right? We tried to boot quickly so short-lived processes like 'bundle exec' are still snappy.
00:05:35.360 We also tried to be completely compatible with existing code so you don't have to change anything to use YJ. Of course, there's no actual magic here, and like facades for buildings, there are holes that allow you to look into the inside.
00:05:53.820 By nature, generating more code at runtime is going to use more memory than not doing anything, but we try to be efficient anyways in service of this facade. Being able to obtain YJ in the first place is also a part of that. We care about the end-to-end experience.
00:06:30.840 This leads me to the next chapter: we care about the end-to-end user experience, but there are limits as to what we can influence due to the way users consume Ruby. These environments create some challenges, and YJ is influenced by these constraints.
00:06:53.160 If you visit the download page on the official website at ruby-lang.org, you'll see mentions of third-party solutions a lot. This is because the only thing officially on offer is the source code, and the page is trying to navigate you to perform source builds.
00:07:20.699 For many users, the first point of contact with YJ is actually the Ruby build process. Now, they might not actually be building manually; they may be using some tools that build from source for them, but still, they interact with the build process, and using Rust directly impacts this experience.
00:07:30.660 People now have to install Rust tools in order to get YJ up and running. Adding to the challenge is the incredible diversity of operating systems that Ruby supports. I think this big list from the official website isn't even exhaustive.
00:08:00.479 You can also build Ruby with various build configurations on each one of these platforms, so we'll quickly have a combinatorial explosion. Here are some past build breakages from this diversity.
00:08:20.659 For debugging development, we use a library called Capstone that parses machine code bytes and prints them out in a readable format for humans. At one point, we tried to detect Capstone on the system during the build process.
00:08:41.760 Unfortunately, just doing that broke builds for some people because the detection code was slightly wrong, and we tended to run it at the wrong time. It's just some setup that you couldn't even guess because it's a multi-dimensional configuration space.
00:09:12.240 In the same vein, we have code that is detecting whether Rust is available for use. For this, I tried using 'echo /n' because I thought, in Ruby, if you do double quotes and '/n', that's a new line—right? That must be coming from the shell. How naive!
00:09:43.740 It works on my machine like that, but if you look a bit closely, actually different shells interpret input differently, and the 'echo' command works differently in different shells. So, that approach doesn't work.
00:10:01.380 I picked these two issues because we had to delete code to fix both of them. This integration work really made me develop a taste for subtractive solutions. With so many options flying around, minimalism sort of becomes a coping mechanism.
00:10:36.060 The way YJ is distributed kind of subsets every bit of build system complexity we add, so we tried to minimize what we add to avoid breakages. This approach helps serve our users because that's what they interact with when they install Ruby.
00:11:11.820 But this isn't without downsides. Sometimes we can go overboard. Let me explain. One dimension of the build configuration space is the 'make' program, which takes care of calling out the various build tools, including the C compiler.
00:11:43.080 For setup, you write makefiles in this custom language. You've probably used GNU Make with the Linux kernel, where there's also BSD Make. Different platforms have different dialects.
00:12:15.060 Now, there's still a common small subset of basic features for implementing incremental builds. The general goal is to rebuild quickly by only compiling the code that you changed. This is important during development since you're editing and rebuilding all the time, and you don't want to rebuild from scratch every time.
00:12:47.100 For the C parts of CRuby, that's what we use to implement incremental builds. Rust, though, the easiest way is to use Cargo. It's the name of this tool that builds Rust for you and gives you incremental builds by default.
00:13:07.860 For release builds, we don't make users install Cargo again to minimize any breakages that might come with that—we just make them use the Rust compiler. Now note this bifurcation between the two modes; this can become a source of build breakages if we're not careful.
00:13:28.740 They have minimal differences. I told 'make' not to worry about caching the Rust build in development mode. Cargo already gives incremental builds, and in release mode, the Rust code is rebuilt from scratch every time.
00:13:55.740 The advantage of this was having just one piece of 'make' setup that works for everything, right? Same code for all the make flavors. Also, across development and release, it's one piece of code for everything.
00:14:23.220 However, this was too barebones. I thought rebuilding from scratch all the time in release mode was fine because it was targeted at users who wanted to build once and be done with it. They just want to install Ruby.
00:14:48.179 The problem is we sometimes also need to test and do development in those configurations, such as responding to bug reports because that's the configuration users use if they report bugs.
00:15:07.920 Anyways, there's a lot of nuances when it comes to deciding what to put in the build process. The maximally minimal solution isn't necessarily the best one. I went overboard with the old setup, and we use a more complex setup now, but it's better.
00:15:29.880 So, back to the moving analogy. When we're going to a new place or getting into larger organizations, we usually can't influence the environment that much, so we have to adapt. The smoothest Ruby installation experience probably doesn't involve building from source at all, right?
00:15:53.160 There's just too many variables, too many things that could go wrong. But YJ can really influence how Ruby is distributed, so we adapt to the environment and cope with minimalism.
00:16:20.680 Okay, chapter three: it has to work with the rest of the runtime system to do its job, and to do that, it has to speak C. It also needs to talk to the operating system, and C is usually the language that it is speaking.
00:16:38.420 Speaking C fluently is very important. Just to put things into perspective, here's a diagram that visualizes the amount of C and Rust in the codebase. Even undercounting some of the C here by excluding some of the headers and some extensions, you can see how Rust is absolutely dwarfed in the codebase.
00:16:55.639 So coming to speaking C, it’s kind of like how there's usually pressure in society for minorities to speak languages foreign to them.
00:17:05.259 Anyways, while talking C, a lot of the time we're dealing with integers. Rust gives a lot of nice contemporary simplifications. Much of it comes from being able to leverage the current hardware landscape.
00:17:24.240 For example, integer types with exact bit widths like u8, i32, and u64 are primitive in Rust, but in C, these types are optional. You can't necessarily trust that when you're writing in C, these types exist.
00:17:45.600 If you read the C standard, you can sort of infer why that's the case. You can see how the committee was trying to accommodate implementations that don't use two's complement representation for signed integers and implementations that have padding bits—bits in the integer that don't represent any part of the number.
00:18:04.500 These machines were more prevalent in 1999 when the standard was first drafted. Nowadays, these machines practically don't exist, so for C, it would be practically optional. Everything ends up just being checks that never fail.
00:18:23.460 So C wants to support 1999 hardware. What does Rust want? Rust prefers explicitness in areas that relate to memory safety. Integer overflow has been shown to be a common gadget for security exploits.
00:18:41.460 So unlike C, Rust does not automatically convert between integer types. Here's an example where it shows up in YJ. You know, Ruby methods implemented in C declare the number of arguments they take with a signed integer.
00:19:00.060 Here are the two instance methods on the Kernel module. We can see here that 'inspect' takes no argument (the zero at the end), and we can pass a variable number of arguments to the singleton methods (the negative one at the end).
00:19:19.740 This is what the negative one at the end means, and you can inspect this number from Ruby. This informs how the method needs to check against it to implement method calls.
00:19:39.060 Now, after YJ performs the control flow transfer to the actual C code, the C function might perform some additional checks, but for that sort of outside the peer review of YJ, we just take care of transferring the arguments, and we do the first check.
00:19:57.540 The C function then takes care of the rest. In the implementation for passing the arguments, we inevitably read from some array with an index derived from the arity, which is signed because you need to accommodate for the negative one case.
00:20:16.980 Now, it's fine to use signed integers to index in C, where the square brackets are actually just syntactic sugar for adding to a pointer with a number. So if it's negative, you just go backwards from the pointer and then dereference it.
00:20:37.240 Rust, on the other hand, observes that most of the time, you probably don't want to index backwards in that way. So in Rust, you can't combine that with the lack of automatic conversion.
00:20:57.680 If you try to take the C code and do a one-to-one port to Rust, you get build errors. Upon seeing this, it kind of stops you from doing a one-to-one port because it's akin to Rust giving you a bug report about your code—something you didn't know about, and it becomes this puzzle to figure out where and how to do the conversion in the code.
00:21:16.680 By the way, there are multiple ways to convert between integers in Rust, possibly for good reason because they have different trade-offs. This was a source of friction for us while porting the code.
00:21:36.540 Speaking of frustrations, I'll talk a bit about Rust's ownership and reference systems. You know, a common pain point with learning Rust is building intuition with the borrow checker.
00:21:55.380 When learning a new language, we try to lean on languages we know. The best C analogy for Rust references are restrict pointers and pointers to const, which at this point is odd because people don't really use restrict pointers in C, and you might not even know what it does.
00:22:14.040 It's partly because the rules around restrict pointers in C are too subtle and tricky, and it's hard to review code and say for sure whether it's following the rules.
00:22:32.460 Much of Rust is designed around machine checking these rules and giving clear messages when these rules around restrict pointers are broken. In Rust programming terminology, that’s what they call 'safe'.
00:22:52.620 The main rule with restrict pointers and mutable references in Rust is that they assert exclusiveness. Basically, you can't have two overlapping mutable references referring to the same thing. Rust gives you build errors whenever it detects this.
00:23:10.620 For us to do this, Rust requires implicit and explicit lifetime information collected from all around the codebase, and it runs through this complicated constraint solver, similar to Prolog, to analyze it.
00:23:28.920 This is not an analysis you can reasonably ask a human to do by hand. The no overlapping reference restriction is indeed restrictive; things like doubly linked lists and graphs cannot be expressed under this restriction.
00:23:46.560 For example, we do have graphs in YJ. In fact, YJ's graphs are essential because we retain them, contributing to our memory usage consumption. JIT compilers in general actually operate on more data than code.
00:24:05.640 If you think about it, that’s not too surprising because you want the code to be short so that it's fast, but you need the data to handle uncommon situations. So anyways, we have graphs in YJ.
00:24:24.420 How do we overcome the build errors from the no overlapping rule? First, we try using RefCell, which tracks borrows at runtime and crashes the whole program if there are any overlapping mutable references.
00:24:46.920 With borrow checking at build time, if a reference has any possibility of overlapping, the checker has to be conservative and say that they always overlap, so it just can't know and refuses to build.
00:25:09.420 By checking at runtime, a RefCell allows us to check each individual borrow at the time it happens, so if at that moment there's no overlap, then it's fine.
00:25:29.080 However, by deferring this check to runtime, it becomes our responsibility to manually enforce what the borrow checker does at build time. And as I mentioned, this is a hard job.
00:25:49.620 Again, it's really constraint solving, and there's this implicit lifetime information, and even when they're all explicit, they scatter around the codebase, making it really hard to do correctly.
00:26:10.320 Not surprisingly, we sometimes got that wrong, and we encountered crashes from it. The other thing about this is that to do this check at runtime, you need some sort of data to keep track of the borrows in flight.
00:26:29.940 But that consumes extra memory, which eats into our facade of magic. The C analogy also shows you how this got a little bit frustrating.
00:26:49.260 The kind of graph representation we wanted to do is probably pretty easy to manage with plain C pointers, which don't have any qualifiers. But with Rust, it's suddenly very hard.
00:27:09.720 So, the current solution we have is to use 'unsafe' Rust, which comes with its own set of complications because the semantics of unsafe Rust code haven't been fully decided yet.
00:27:29.160 Rust has an operational semantics team trying to pin down exactly what the rules are when it comes to unsafe Rust. So nowadays, if you want to follow the rules, the first thing you need to do is figure out what rules are already established and unlikely to change.
00:27:49.320 It’s kind of a messy situation and quite a process, but writing unsafe Rust code today is probably harder than writing unsafe Rust in the future because the uncontroversial rules get their status from being brutally simple.
00:28:08.520 That's why they are non-controversial; they're simple rules that you have to agree with. The final rule set will be more complex but accommodating, making it easier to write code that stays within that.
00:28:29.880 So I painted a rather bleak picture here about using Rust, but I think the general takeaway here is what to do when the rules are muddy.
00:28:49.560 What I did was find first-party sources as much as possible, read the discussions there, and ask questions. One of the leaders in this space is a professor from ETH Zurich, Ralph June.
00:29:07.200 They were very generous with their time when I asked questions. It takes a little bit of a leap of faith to ask questions and does require some effort, but people are helpful.
00:29:29.760 You just have to trust that they'll answer. Now, I traveled before linking.
00:29:49.680 Before I get into it, here's some Unix lore. This is an excerpt from the first edition of the Unix manual from 1971, even before C—the first incarnation of the C language. You can see how it refers to LD as a link editor.
00:30:11.640 Nowadays, 'linker' is the more popular choice of name, but 'link editor' has some advantages in that it’s a closer description of what actually happens when you run LD.
00:30:31.080 However, there's only so much two words can convey. For example, you might ask what a link is, but it’s still more information. I’m showing you this because it’s surprisingly relevant.
00:30:50.700 I actually only have four minutes left. I'm mainly going to talk about the linking that happens at build time. The type that works with files takes in a bunch of files.
00:31:07.440 Dynamic linking is really addictive, but I'm not going to go into detail. From fundamental principles, this process isn't necessary; nothing stops you from just taking in the source files and outputting a file at the end.
00:31:22.920 So why have linking as a step at all? The answer is that it helps during development, so you don't have to rebuild everything from scratch every time.
00:31:38.760 Because linking is a simpler computation than compiling, especially when you’re talking about optimizing compilers, incremental builds generally go faster.
00:31:55.560 You just run the linker—it's a simpler computation—and you only run the compiler for the parts you changed.
00:32:11.400 This is what being is edited. This is the call instruction on x64. The first byte tells you that it’s a call instruction, and the next four bytes is an offset that encodes what you’re calling.
00:32:29.760 Actually, it's a relative address, so you will never see the same address being called twice. You'll see that it always goes a certain number of bytes forwards or backwards.
00:32:50.820 But anyway, because we've split the code into different object files, you can have situations where you don't know the exact address of what you're calling. You need to compute this relative address, but at link time, the linker can see all the code at once.
00:33:10.080 This is how it can resolve everything and this is what linking is when talking about link editing.
00:33:30.540 So what information do we need? Sometimes we need to refer to code that doesn’t have an address yet. We need to be able to talk about that.
00:33:48.360 We also need to, on the other side, provide definitions so you can form the links, and you want to have information to know where to do the patching for those call instructions.
00:34:06.960 This is how linkers have their own languages, and this is just a model. It looks like I ran out of time, but fitting in is a continuous process and a collaborative one.
00:34:22.320 If you're more interested in linking, I wrote a blog post about the exact issues we ran into. But I didn't even know about these problems, and Nobu pointed them out to me.
00:34:45.060 The general message I want to deliver with this talk is that we're reducing the text, but we're actual humans behind the screens, and you'd be surprised how often people are happy to help if you just talk to them.
00:35:00.540 While it might feel a little bit weird to do this online, collaboration would often just turn into a cycle of reading each other's things. Yet, I think this fitting-in analogy can be helpful if you're looking to get more deeply into open source.