How to Load 1m Lines of Ruby in 5s

How to Load 1m Lines of Ruby in 5s by Andrew Metcalf

Applications written in Ruby, Python and several other popular dynamic languages become very slow to boot as they grow to millions of lines of code. Waiting to reload code in development becomes a major frustration and drain on productivity. This talk will discuss how we reduced the time to boot a service at Stripe from 35s to 5s by statically analyzing dependencies in our codebase to drive an autoloader.

GoRuCo 2017

00:00:15.930 Fortunately, I found out that the title is a little bit misleading, but hopefully, the talk will still be interesting. It addresses the problem without entirely solving it. My name is Andrew, and I work at Stripe on our product infrastructure team. This team is broadly responsible for the core developer experience in our main product codebase, which is almost exclusively written in Ruby.

00:00:22.949 For us, this isn't just about developer tooling; it also encompasses some of the core abstractions and how code fits together, as well as how people are actually authoring code on a day-to-day basis. To give you a sense of what we're working with, our main body of product code and the entire API is contained within a single monorepo, comprising a few million lines of Ruby. It's a macro service architecture, meaning that the API itself is a single service, and most of the logic that processes and creates a charge in Stripe happens within this one service.

00:00:43.960 One of our team's goals is to keep the iteration loop in development tight. I edit code, save it, run a test, and make a test request in development. I want that process to be very fast. However, with Ruby, we're not slowed down by compilation; we're slowed down by the necessity of loading the entire million lines of code each time I hit "Save." Initially, our problem was described as compiling when, in reality, it was about reloading. Developers were waiting 30 to 35 seconds every time they saved their work before they could run a test or make a request against the API. This situation was a major frustration and a drain on productivity.

00:01:09.850 This issue stems from the fact that our codebase is an organically grown monorepo with tight coupling between all the components. If you touched one component, there was often no clear way to isolate it from needing to reload all components across the codebase. This inherently slows down the development process, as it makes the iteration loop sluggish and requires developers to understand every component of the codebase to grasp what they are doing, given that all the components are deeply intertwined.

00:01:52.210 I want to discuss a project that helped solve these problems. Before doing this talk, I wanted to assess how unique this problem was to Ruby. I conducted a completely synthetic benchmark to tell my story, where I generated a decent template that produced a few million lines of roughly equivalent Ruby, Python, and JavaScript code to analyze their load times. I found that Python was actually the slowest from a cold start; however, their bytecode caching (i.e., PYC files) made it the fastest for incremental work, which represents most of development.

00:02:19.350 JavaScript proved to be the fastest from a cold start, likely due to the optimized V8 VM in Chrome, which is built to handle JavaScript efficiently over the wire. Unfortunately, Ruby was painfully slow, which explains the predicament we found ourselves in—developers waiting 30 seconds to reload their code. After Ruby 2.3, a capability to use an instruction sequence for precompiling to bytecode emerged, which could potentially be beneficial, though we're not using it yet. There's a library called "Snap" by Shopify that implements some loading optimizations that have shown great promise. With certain benchmarks applied, Ruby can be significantly faster.

00:03:39.530 I initially theorized that our heavy use of Domain Specific Language (DSL) and Ruby's tendency towards heavy DSL usage at load time was to blame, as it runs a considerable amount of code. However, in my synthetic benchmark, this turned out not to be the case—the slow load times from Ruby were primarily due to loading large amounts of code. Over about six months, we transformed the code-loading process from around 35 seconds down to five seconds by building an autoloader for the entire codebase.

00:04:57.780 This development allowed developers to stop writing "require_relative" statements. Instead, you could simply reference something, and it would work as needed without requiring explicit declarations for every referenced component. I would like to elaborate on how we achieved this.

00:05:30.160 To illustrate the original code-loading issues we faced, let's consider a scenario where, to boot the API and call the "make a charge" endpoint in development, we had to load several files in advance, even if we wouldn't need them immediately. This meant following numerous "require_relative" statements, which resulted in loading most of the codebase. In reality, to simply get the API listening on a socket, I didn't need to load all those files—just the main API file and the charge request file.

00:06:13.500 The key to expediting our code loading was condensing what we loaded; that is, not loading what we don't need. This became a consistent theme in my job—finding ways to ease the loading of code is difficult, but simply doing less of it is a more straightforward solution.

00:06:43.700 The approach we adopted involved writing an autoloader that auto-generates stubs resembling the following example. The simplified view I'll discuss later extends to various specific handling mechanics we had to deal with for edge cases. Essentially, we have a build daemon that runs in the background as you develop. Each time you save a file, it generates these stubs.

00:07:06.360 One significant advantage of using autoloaders is that these stubs change only when you modify a definition, such as adding or relocating a class. Therefore, for the most part, the building remains active, and you can continue coding and running tests without significant interruptions. Generally, you can ignore this running process, which facilitates a magical code-loading experience.

00:07:42.490 Creating this autoloader isn't disproportionately challenging. Once you parse all the files in your codebase into an Abstract Syntax Tree (AST), finding definitions becomes manageable. Definitions typically have distinct nodes within the AST and are straightforward to identify. At this point, we can ascertain that all charge definitions exist in the file named "charge.rb." When someone tries to reference this definition, the autoloader would automatically load "charge.rb" for you. Ruby provides built-in support for autoloading, or else this entire project would have been significantly more arduous.

00:08:17.500 However, we wanted to do more than merely understand definitions; we also needed to understand references within the codebase. This is why we strive for a comprehensive understanding of definitions and their related references. With the information gained from recognizing all the module definitions and their various elements, we also wanted to know how references connect within the related components.

00:08:53.580 This knowledge provides two major benefits. First, it greatly enhances our confidence in the analysis work, as we can traverse the entire codebase to detect if something expected to encounter in production does not resolve properly. The correct identification of unresolved references translates to discovering bugs within the production code as well as the static analysis code itself.

00:09:28.040 Second, it enables us to add further functionality to the build daemon—something I will detail later—which allows us to replace complicated metaprogramming with more straightforward code generation. Furthermore, understanding references enhances our pre-loading capability. If we comprehend how components link together through references, we can effectively ascertain all the files a service will ever utilize. By specifying an entry point, we can evaluate all dependencies and include those definitions in production.

00:10:59.300 This way, we avoid having requests hit uncommon paths that necessitate loading many files and slowing things down significantly. Towards the end of my talk, I will touch on some tools we've created based on this dependency mapping that enables interesting outcomes once we grasp the complete web of dependencies within our codebase.

00:11:42.230 How did we accomplish this resolution? The first method we explored, which consumed several months of my life, involved dynamically loading all the code in the codebase and querying the Ruby VM to see how it resolves within certain contexts. However, it turns out that randomly loading a few million lines of code introduces many ordering assumptions inherent in how our code is loaded, as they were previously hard-coded into where the require_relative statements existed.

00:12:20.300 There are also dependency-related side effects associated with what happens during that loading process. For example, if a developer was in the middle of working on a script without any conditions for execution, they could inadvertently execute various codepieces from their source tree that affect the background operation.

00:12:41.480 Consequently, we shifted towards primarily a static method of resolution, depending on an algorithm that analyzed the code in terms of the AST. The main challenge was figuring out how to resolve references without running the Ruby VM to check context.

00:13:08.520 Let's take a moment to discuss how Ruby manages this resolution. One of the first concepts I grappled with when delving into Ruby was that of nesting. A baffling aspect of nesting is that references resolve differently depending on how they are structured in code—e.g., whether they are broken across lines or left together.

00:13:50.040 The critical takeaway is that the arrangement in which a reference exists influences what it can connect to; Ruby will traverse the hierarchy to gather all valid ancestors to determine what it can ultimately link to. We can use this principle to create an effective tracking system for references.

00:14:41.800 To sufficiently handle this, we take each line of code needing resolution and subject it to the nesting and ancestor principle. This process has proven largely effective across the entirety of our resolution process. However, we did encounter many edge cases, especially in transitioning our codebase to accommodate this system.

00:15:22.890 If I were to start a codebase from scratch, the journey would likely be much more straightforward. Unfortunately, we had millions of lines of existing code when moving to this autoloader strategy, and we had to ensure that the entire Stripe API could function correctly without issues on deployment.

00:15:49.640 This meant we had to develop a series of strategies to make this transition safe. Ultimately, we divided these strategies into three broad categories: our methods for addressing edge cases, employing tools like RuboCop to aid with complex patterns, and ensuring failure points were handled gracefully.

00:16:28.060 I mentioned before that some of the methods I demonstrated were simplified. In practice, we identified and targeted potential failure points. For instance, in our autoloader files, we instituted a pre-declaration system for constants. This approach forces us to declare a needed constant ahead of time, ensuring the Ruby autoloader wouldn't run into issues where it expected a constant definition to be missing.

00:17:03.690 We also developed explicit handling abnormal behaviors for edge cases, especially those that could take place when dependencies occasionally conflict. This adjustment is paramount as we want our autoloader to work reliably.

00:17:46.500 An additional complexity is dealing with dynamic behaviors, such as when we load modules at runtime that could obscure the inherent dependency chain. Through the rules and restrictions we established, we are converting complex code into simpler forms, enabling our autoloader to efficiently process everything smoothly.

00:18:40.030 Upon rolling out this new system, we saw the beneficial effects of its implementation—our operations remained stable, and despite a few teething issues in some services, we managed to resolve them quickly. The question at hand is: What applications did we create as a result of this overhaul? With the improvements made to the autoloader, we were able to significantly reduce boot times, negating the need for many "require_relative" statements across our code.

00:19:50.480 However, we also innovated new use cases for the build tools we developed to capitalize on the capabilities of our autoloader. By laying out specific handling of dependencies, we enabled a selective test executor that allows tests to run on the most relevant changes, saving time and computing resources.

00:20:30.150 The outcome has led to a significant decrease in resource consumption as well as optimized performance across our tests. Various team members have even suggested visualizing the significant dependencies within our codebase through tools such as Graphviz.

00:21:42.830 We whimsically dubbed this initiative "Untangling the Gordian Knot.” The resultant visualization indicates a tightly coupled system that begs for dissection, allowing us to methodically peel apart highly interactive components for improved modular development.

00:22:23.270 Simultaneously, we began enacting a package organization system that encourages developers to declare namespaces around their code. This ensures only certain constants and modules are publicly available while maintaining control over what they import.

00:23:10.970 Our desire for enforceable modularity has since materialized, leading to the establishment of roughly 40 packaged modules—providing developers within the team a controlled means of managing their code. We've also introduced tools that permit detailed introspection over our codebase, allowing the identification of unique definitions and tracing pathways through dependencies.

00:23:54.200 Toward the end of my talk, I want to discuss our future aspirations. In the shorter term, we aim to significantly reduce the time spent on resolving dependencies for autoloading—shifting from our current 250 milliseconds to a target resolution time of just 50 milliseconds. Our long-term goal is to explore the potential of type-checking as a significant enhancement to our static analysis capabilities.

00:24:29.440 In conclusion, the essence of loading a million lines of Ruby in five seconds isn't merely about optimizing the process but streamlining code and refactoring approaches to facilitate efficient analysis.

00:25:04.020 The answers to the challenges we face might seem paradoxical: to enhance load time, it might mean asking developers to refactor and streamline their approaches. Ensuring new developers can comfortably reason about millions of lines of code also points towards clear frameworks and organization across code.

00:25:47.940 If you wish to reach out to me, I’m available at "AG mech F" on Twitter; however, I admit that I may need to remember how to access my own account! Alternatively, you can find me on various platforms related to Stripe or via my email.