Compiling Ruby to Native Code with Sorbet & LLVM

by Jake Zimmerman and Trevor Elliott

In the RubyConf 2021 talk titled "Compiling Ruby to Native Code with Sorbet & LLVM", Jake Zimmerman and Trevor Elliott from Stripe discuss their innovative approach to improving the performance of Ruby applications, particularly within Stripe's extensive codebase. They focus on developing an ahead-of-time compiler using Sorbet and LLVM aimed at enhancing the latency of their multi-million line Ruby codebase without disrupting ongoing feature development.

Key Points Discussed:

Motivation for Performance Improvement:
- Stripe's API is essential for various business operations involving money management, and faster APIs are preferred by developers.
- Performance issues in Ruby code were identified as a significant factor in overall latency, necessitating an alternative approach to merely optimizing existing code.
Sorbet as a Type Tracker:
- Sorbet is presented as a powerful tool for static typing in Ruby, which provides numerous long-term benefits and allows the compiler to leverage these type annotations for optimization.
LLVM Overview:
- LLVM is introduced as a framework that simplifies compiler construction, making it easier to convert Ruby code into native code.
The Compiler Architecture:
- The Sorbet compiler consists of a type checker and a code generator. It dynamically generates code in LLVM IR (intermediate representation), optimizing performance by taking advantage of known types.
- An important feature is the ability to progressively integrate the compiler, allowing individual files to be compiled rather than requiring an all-or-nothing approach.
Implementation and Optimizations:
- Examples of optimizations include checking type signatures, avoiding virtual machine dispatch, and inlining function calls to improve execution speed.
- The transformation process from Ruby to efficient C-like constructs is outlined, showcasing how the original high-level code can be compiled into highly optimized native code.
Deployment Strategy:
- The team discusses a careful rollout strategy to ensure minimal disruption, including extensive testing with existing codebases and a blue-green deployment strategy that allows for seamless transitions between old and new code.
- By analyzing production performance data, the team can iteratively improve their compiler based on real-world metrics rather than relying solely on benchmarks.

Conclusions:

The talk concludes with the presenters emphasizing that the compiler not only enhances performance but does so while allowing developers to continue writing Ruby without worrying about the underlying complexities. The next steps involve increasing adoption across Stripe's codebase and refining the generated code further to achieve more performance gains.

00:00:11.120 My name is Jake, and I'm joined here by Trevor. We both work at Stripe on the Sorbet team, and we're going to be talking about compiling Ruby to native code with Sorbet and LLVM.

00:00:22.740 A couple of definitions here: Sorbet, if you're unfamiliar, is a type tracker for Ruby. It's something we're really excited about. You might have seen Matt mention it very early on in the keynote, stating that static typing in Ruby is very much happening right now.

00:00:39.360 If you've ever been curious about whether you should use static types in your Ruby code, my advice to you would be absolutely yes. I encourage you to try Sorbet; it's a lot easier to get started with than you might think. In our experience, and in the experience of several other companies that have started using it, it has a lot of compounding benefits over time.

00:00:56.340 The other definition we want to get out of the way is LLVM. LLVM is a kind of framework or toolkit for building compilers. It takes all the hard parts of building a compiler and puts them behind a nice, accessible interface. Our focus is on taking the source language we want to compile—in this case, Ruby—and converting it into native code using LLVM.

00:01:14.460 So when we combine these two components, we create the Sorbet compiler. I must apologize here, as I am a compiler writer, not a graphic designer. I tried to add wings to our Sorbet logo to make it look like a dragon, but it ended up looking like a sword bat. However, we do have stickers of this logo that you can grab from us after the talk.

00:01:36.299 Our agenda will look something like this: we'll first discuss the motivation for the whole project, starting from Stripe's motivation—why performance matters at a basic level. Then we'll transition to discuss why we believe building a compiler for Ruby is necessary to improve performance at Stripe. After that, I'll hand it off to Trevor, who will explain how the compiler works and detail all the cool things it can do.

00:01:59.040 Finally, we'll come back to talk about how we are actually adopting it in production.

00:02:11.940 So, let's dive into the first point: why Stripe cares about performance. For those who are unfamiliar, Stripe is a software company that provides an API allowing businesses to manage any money-related tasks, such as accepting payments, coordinating payouts, managing taxes, giving loans, receiving loans, and handling invoices and subscriptions.

00:02:22.620 One of the features we consider in our API is performance because if two developers are deciding which API to use to build their business, and one API is faster than the other, they are likely to choose the faster one. Therefore, we want to ensure our API serves our customers as quickly as possible.

00:02:40.680 However, this leads to the question of why we believe we need to build a compiler for Ruby to address that performance gap. To give some context, Stripe uses Ruby extensively. It powers our most critical services, including the Stripe API. Hundreds of engineers write Ruby code at Stripe daily, which amounts to millions of lines of code, all organized within a mono repo.

00:03:03.660 A key point to note is that Stripe's Ruby code is heavily typed. We developed Sorbet and have been adopting it for the past three years, meaning at this point, Stripe's codebase is likely the most typed Ruby codebase in the world. We wanted to leverage this type coverage, and we'll see how the compiler uses those type annotations later in the talk.

00:03:34.800 When we zoom in on where the API spends its time, we find two main components: the I/O component and the Ruby code component. The I/O component includes time spent waiting for the database or partner APIs to respond—for example, waiting on a bank's response. However, a substantial part of an API's request time is merely blocked by Ruby code.

00:04:10.200 Contrary to what some might say about simple card apps being mostly I/O-bound, at Stripe, we want to speed up the Ruby code itself to enhance API speed. The Ruby component of the API request is also quite fragmented. Numerous teams are responsible for maintaining various portions of the code across different API endpoints.

00:04:41.940 Instead of having one large block of code that we could optimize, it's distributed across many small parts, which can be challenging to optimize independently. This fragmented code amounts to contributions from dozens of teams.

00:05:10.320 One potential approach to improving the Ruby portion of the API request would be to identify the slowest requests—for instance, perhaps there's a specific code block that’s inefficient. We could determine which team owns this block of code, ask them to optimize it, and if they do it well, we might achieve a significant performance increase. However, even a 2x speed gain would only marginally improve the end-to-end API latency due to the various other factors involved.

00:05:46.440 If we pursued this piecemeal approach, we would have to ask nearly every team across Stripe to stop working on their prioritized features and bugs to focus on speed improvements for their API endpoints. Imagine the frustration this might cause if someone dictated that your entire roadmap should be replaced with a sole focus on performance improvement.

00:06:03.660 What we sought was a more effective solution—a kind of magic wand that could expedite all the Ruby code simultaneously. By accelerating all the components, every team in the company could return their attention to their core focus: crafting an excellent payments API. This gives us a solid justification for launching the project to enhance Ruby's performance.

00:06:44.700 There were various approaches we could have considered. A common question we get is why not use an alternative Ruby implementation that may be faster, like Truffle Ruby or JRuby.

00:07:09.900 We did explore those options but ultimately concluded that the scale and complexity of the Stripe API made it challenging to undertake some kind of incremental migration. Transitioning the entire Stripe API to run on Truffle Ruby or JRuby at once would be quite a difficult undertaking. Again, we were seeking a way to demonstrate incremental progress.

00:07:47.520 Moreover, one significant downside is that transitioning would require us to abandon the Ruby ecosystem that we had established; we'd have to adopt some JVM ecosystem or alternative Ruby implementations, which could level the operating knowledge we've built up around Ruby.

00:08:03.600 At that point, we decided to consider building our own implementation. One of the big questions was whether we would create a JIT compiler, similar to Truffle Ruby and JRuby, or an ahead-of-time compiler instead.

00:08:42.420 An important advantage of an ahead-of-time compiler, which is what the Sorbet compiler is, is that it allows us to leverage static type information. If we know that a specific variable is always of a certain type, we can optimize the code accordingly.

00:09:09.720 Additionally, ahead-of-time compilers tend to be simpler conceptually. The output of such a compiler is strictly a function of the code provided, whereas a JIT compiler generates code based on the input Ruby code and external factors, making it trickier to track down performance issues or bugs.

00:09:46.740 Importantly, we can also utilize both approaches: using the compiler for the parts that it can optimize effectively while running JIT for others. Trevor will talk more about how our chosen implementation allows the compiler to work alongside the existing Ruby VM while facilitating this incremental migration.

00:10:06.300 With that, I'll hand it off to Trevor to discuss how the compiler functions.

00:10:49.860 Thanks, Jake. As Jake mentioned earlier, the Sorbet compiler consists of two main components: the Sorbet type checker and a code generator developed using LLVM. To illustrate how Sorbet works when added to a Ruby program, let's run through a quick example.

00:11:09.720 Suppose we have a function that takes a single argument, X, which will call map on X, passing a block that increments whatever it's given. The way we add Sorbet to this function is by specifying a type signature.

00:11:25.440 This signature indicates that X must always be an array of integers, and F must always return an array of integers. After running Sorbet, we see that there are no errors, which is great.

00:11:40.020 However, if we introduce an error—for example, instead of returning the value from calling map on X, we append a string '2'—Sorbet will catch this, indicating that we're returning a string instead of an array. This type of validation demonstrates the critical type information we leverage in the compilation process.

00:12:07.920 The second component of the Sorbet compiler is the code generator, which utilizes LLVM. LLVM has its own language, essentially a target-independent assembly language, which we use. We take type-checked programs from Sorbet and produce programs in LLVM IR.

00:12:53.160 One might expect that adding a backend to a project like Sorbet would require extensive work, but in reality, it has only taken about 10,000 lines of additional C++ code to get to the point where we're generating shared objects. Additionally, there are about 5,000 lines of C code dedicated to runtime support.

00:13:40.080 LLVM comes structured with numerous optimizations built-in, allowing us to take advantage of these additional facilities for free. Many other industrial-strength compilers, like Clang for compiling C and C++ and Apple's Swift compiler, utilize LLVM.

00:14:25.320 The final part of this pipeline is generating native code, where we emit shared objects that utilize the Ruby VM's C API to efficiently interoperate with Ruby code. For instance, we have a simple Ruby program that defines a function, Foo, and we can also equivalently define this function in C.

00:15:18.300 These C extensions are initialized using a function that essentially serves as the starting point for the Ruby functionality.

00:15:47.460 Let's revisit the example we discussed while introducing Sorbet. This function calls map on an array, and I'll show how we can compile this program iteratively. The first modification required is to add a compiled sigil at the top of the file.

00:16:14.220 The goal is to allow opting in individual files to compilation instead of requiring an all-or-nothing approach to the entire codebase.

00:16:47.700 Next, the compiler will unconditionally check type signatures. We achieve this by moving type signatures into the method body and turning them into exceptions if the type test fails. This allows the Sorbet compiler to better leverage type information to produce better code.

00:17:19.200 For instance, we can raise an exception if X is not an array, and we do the same for the result of x.map to ensure it also returns an array.

00:17:49.080 Subsequently, we seek to avoid VM dispatch, which is quite expensive. By type-checking, we know that X is an array, and we can directly call the implementation in the array.c library, reducing overhead.

00:18:20.340 Once that is done, we can further optimize by inlining the implementation into the function being compiled, eliminating transitions back to the VM, thus getting a more efficient code path.

00:18:56.880 After that, we can inline function calls, iterating instead through X directly while adding one to each element. Each optimization makes the compiled code vastly more efficient than its Ruby counterpart.

00:19:38.160 The goal here is to keep refining the code to avoid unnecessary checks and calls to the VM, ensuring we finally land with compiled code that effectively performs the Ruby operations with almost no overhead.

00:20:06.180 Ultimately, all these optimizations culminate in generating C code that effectively mirrors the original Ruby function, achieving a significant performance improvement.

00:20:26.520 With that, I’ll hand it back to Jake, who will discuss how we are adopting this at Stripe.

00:20:50.760 Thanks, Trevor! I hope you all find this as exciting as I do. It is impressive that we can maintain the same Ruby syntax while achieving significant performance benefits behind the scenes—just run x.map and get an optimized C loop.

00:21:14.820 Now let's focus on what it means for this to actually work in production. Our rollout strategy has three main goals: first, to develop a robust plan for when things go wrong, as they inevitably will. Since Stripe is crucial to many businesses daily, we need to minimize disruptions.

00:21:54.700 Secondly, it's essential that we compare performance based on real traffic. It's insufficient to know our compiler speeds up some test benchmark. We want to ensure it effectively improves real-world Ruby applications.

00:22:33.420 Finally, we aim to ensure that every step in the rollout process is incremental. We want to determine the optimal areas for improvement at any point during our migration.

00:23:12.620 Our strategy for addressing potential issues encompasses multiple layers of defense. The first defense is straightforward: we write test cases for any compiler changes to catch bugs early.

00:23:52.140 We have amassed a strong suite of tests over Stripe's ten-year history, allowing us to run the entire suite against the new compiler. If a test fails, this raises a red flag that signals a potential compiler bug.

00:24:33.600 Additionally, we have a staging environment that mimics production as closely as possible without receiving live API traffic. This gives us one more opportunity to detect any potential compiler issues before they affect customers.

00:25:15.420 Next, we utilize blue-green deploys, which means deploying the new code to a separate set of servers (the green servers) and gradually shifting traffic over to those servers. If any issues arise during this migration, we can quickly revert traffic back to the previous blue servers.

00:25:59.520 Finally, if somehow a bug manages to bypass all these layers, we can still kill the compiled code on a specific host with a simple switch. This allows us to iteratively work on production performance data, using real traffic to gauge the impact of our compiler enhancements.

00:26:48.300 We take care to observe the performance impacts directly by tracking execution time on hosts running compiled code versus those running standard interpreted Ruby, allowing us to visualize improvements or regressions in real time.

00:27:31.620 Next, our strategy for determining what changes to make utilizes StackProf, a stack-based sampling profiler for Ruby. Since the generated code outputs appear like normal Ruby, StackProf seamlessly profiles it just as it would any Ruby application.

00:28:08.280 By sampling requests and identifying which files consume the most time, we can prioritize optimizing those particular files, ensuring we get the most significant benefit from our efforts.

00:28:50.760 This concrete punch list helps us understand which functionalities are most impactful to compile and guides our focus in implementing features for the Ruby language.

00:29:30.300 As we move forward, one of our primary short-term goals remains to increase internal adoption of compiled files. We will be compiling more and more sections of code, and once we've locked in a substantial portion of request handling through compiled code, we'll shift our focus to profiling and optimizing the generated code.

00:30:07.100 This iterative process allows us to continuously refine the generated code and improve its performance by leveraging LLVM as effectively as possible.

00:30:51.720 That wraps up our presentation. If you have any questions, we have roughly seven minutes left. Also, don't forget that we have stickers available. If you're interested in working on large scale problems like these that involve complex solutions, Stripe is hiring! Feel free to approach Trevor or me at the end.

00:31:12.020 Thank you!

00:31:36.620 That's actually really cool! Obviously, there are aspects of this work reliant on type information supplied by Sorbet annotations. I'm curious if there are parts that you've discovered are portable outside of a typed Ruby context.

00:32:13.720 That's a great question! One thing we considered—imagine you have a while loop that runs as normal Ruby code rather than VM bytecode. Evaluating a VM bytecode instruction involves multiple function calls and data structure manipulations, which is inefficient. By optimizing control flow, we see a performance boost regardless of type information.

00:32:49.620 You may have touched on this at the end, but what performance improvements have you observed? Is there a chart you can show us?

00:33:32.960 We nearly prepared a slide showing performance numbers but chose not to, as it could be confusing. Many benchmarks commonly used for Ruby don’t type check and are therefore irrelevant to our compiler's ability.

00:34:14.100 We haven't shared concrete numbers recently because doing so may inadvertently reveal sensitive information. Last I checked, we saw improvements between 2% to 170% in production traffic.

00:34:54.840 How does the compiler check that an array, in this example, only contains integers without iterating over the entire thing?

00:35:19.800 That's an excellent question. The compiler doesn't require iterating under certain conditions. Note that many optimizations won't require knowledge of the element type unless a further operation necessitates it, at which point the compiler would have added another check.

00:35:54.840 Next question: how does the compiler know to replace a method like array.each with a different construct? Is this something you have to manually specify?

00:36:16.080 The compiler has a mechanism for recognizing specific well-known functions, such as those defined by the Ruby VM, to optimize them. For instance, array.each is a common candidate for replacement with a more efficient loop.

00:36:50.920 In terms of user-defined methods, if you mark them as final in Sorbet, the compiler can optimize them similarly by compile them as C function pointers, yielding significant performance gains.

00:37:30.240 Although Stripe isn't primarily a Rails shop, have you tried the compiler on a Rails app or do you know of others using it, like Shopify?

00:38:02.760 Unfortunately, I don't believe anyone other than Stripe has tried the Sorbet compiler yet.

00:38:06.680 Lastly, concerning the type-checks you insert—does the Sorbet compiler take care of removing the final check, or do you leave that to LLVM?

00:38:33.960 We let LLVM handle that. The way we implement these checks allows LLVM to determine when they're redundant, leading to efficient compiled code without unnecessary checks.

00:39:05.320 One point that is vital: we need a minimum type level of typed true for this to work. Sorbet has varying strictness levels, which guide how the type system inspects the method bodies for useful optimizations.

00:39:30.060 With that, we conclude our session. Thank you all for your interest!