A Bundle of Joy: Rewriting for Performance

Local gem management is a key part of our modern workflow, but our tools are layered on historical approaches and requirements. In this talk we explore a ground-up rewrite of local gem management, and ask whether careful implementation (and some feature-cutting) can produce a tool that meets most people's needs while outperforming the current options. In the process, we'll look at specific design choices that make common operations faster, and which might apply to your projects too.

RubyKaigi 2019 https://rubykaigi.org/2019/presentations/_matthewd.html#apr18

RubyKaigi 2019

00:00:03.710 Okay, let's get started. I like computers. Computers are fast, and I like Ruby. Ruby is slow—well, maybe it's not always the interpreter that's to blame; maybe it's our code.

00:00:10.940 There's a lot of work happening to make Ruby run all of our code faster, which is very exciting. But this isn't a talk about that. Sometimes we can make things much faster by rewriting our code or someone else's.

00:00:36.469 I'm Matthew Draper. I'm on the Rails core team, where I normally look a bit more like this. Sometimes like this; more often like this. I work at Build Code, and the CI service that runs your tests in build environments that you can control at scale.

00:00:58.289 So here's our code: slow. Let's start with a very simple example—running 'rake version.' I want to use the right version for my application, so we need Bundler.

00:01:11.790 You know, none of these numbers are from carefully controlled experiments. I just ran each command a few times on my laptop and picked a representative number. That means they're all using a hot disk and power system cache. Everything would be slower on a first run, but that's harder to measure, and hopefully linear, so I'm ignoring it.

00:01:31.290 Running 'rake version' takes 750 milliseconds. We're going to see a lot of process timings like this, and the interesting number is this one on the right. The others describe what sort of worker was doing the operation, but this is the amount of real-world time we had to wait. So Bundler is clearly doing some work.

00:01:44.700 Let's see what happens if we make it easier by specifying the exact version ourselves. We can skip Bundler and go straight to RubyGems. That takes just over 300 milliseconds—much faster—but we are doing work instead of the computer.

00:02:03.780 What if we run it without RubyGems? Now it's under 200 milliseconds. So while starting Ruby does take some time, RubyGems—and especially Bundler—are adding quite a lot on top. But we've had to do a lot of work ourselves to see that speed. It works because 'rake version' doesn't have any dependencies, and even then we need to supply a lot of information on the command line. We want our computers to do that work for us—that's why we like them.

00:02:23.329 But computers are fast, and we can see that Ruby can be pretty fast too. I'm old enough to remember the world before Bundler, and I definitely do not want to go back there. But maybe if Ruby is fast, we can shrink the time Bundler takes to choose which version to load without losing the feature.

00:02:46.549 Again though, that's a very simple example—a very simple command. I don't run 'rake version' very often—maybe you do. It's easy to measure and compare because we know 'rake' itself isn't doing much work.

00:03:14.870 It's not where our story starts, though. Our story starts with me doing a Rails upgrade on an old application. I was upgrading from Rails 3.2. Anyone who's done that knows it involves running 'bundle update' a lot—slowly. Expanding the list of gems you're updating until you find a set that can move together, changing version constraints so you don’t upgrade one gem too far into the future before the others catch up.

00:03:37.340 Every time I ran 'bundle update,' it would take a long time—about ten minutes, sometimes. There were ten minutes to produce a lock file that I could use to work on top of, run tests, and find the next thing to change. Other times it was ten minutes to give me a version conflict because I needed to upgrade another gem at the same time. I was spending a lot of time waiting, the bundle update.

00:04:04.069 Eventually, I decided that I should do something. I complained about it on Twitter, which made me feel better for a few minutes. But eventually, I had to figure out what was actually going on. Bundler can't be this slow for everyone—someone would have fixed it—and that turned out to be true.

00:04:40.479 Every time I ran 'bundle update,' Bundler was downloading thousands of gemspec files one at a time. That's the sort of thing someone might notice. So, what was going on? My first discovery was that it was downloading these same files every time I ran 'bundle update,' and they were version-specific gem specs, so they couldn't have changed.

00:05:15.159 The easy fix was to open up my local copy of Bundler and hack in a file system cache. It wasn’t a proper solution to anything; it wasn’t up-streamable. But with even one or two of those ten-minute runs, I had the future runs down to just two or three minutes.

00:05:39.909 My implementation was just a hack in this case, but adding caching is a very effective way to make something faster. Do the sleuthing once, and then remember the result. So I'd made the runs much faster when I remembered to use my modified Bundler. But I still had that question: why had no one else had this problem?

00:06:16.000 It was time to debug why those requests happen. I needed some background: Bundler used to make these requests, but it was slow. So Ruby Central paid for the development of a new, more efficient protocol, which was a few years ago. I definitely was using it.

00:06:39.260 So why wasn't it using the new protocol? The answer turned out to be a different cache. RubyGems uses a CDN to store copies of common responses close to users. This way, when a user requests a large set of files during a 'bundle update,' most answers come from the local cache, and only a few requests need to be forwarded to the origin server.

00:06:57.710 If no one nearby has requested those files recently, though, all the requests have to be forwarded along. That's what was happening to me. I discovered the origin server has a very strict rate limit. So when I ran 'bundle update,' it would request too many files, and eventually one of those responses would contain a rate limit error. Seeing this failure, Bundler would switch completely to the old protocol.

00:07:35.900 Anyone using a popular CDN location will only need a few files from the origin, and they'll get to use the new protocol. But if you're on a cold CDN, Bundler silently becomes much worse. This is a server configuration issue, but it also seems like something the client could avoid. I didn't really want to keep patching Bundler; instead, I wondered what I might learn by doing a scratch rewrite of Bundler.

00:08:06.690 I knew that even with only downloading the files we really had to, there could be a lot of them. An advantage of a scratch rewrite is that you can make architectural choices that make it easier to do what you need to do, because you have a much better idea of what that is.

00:08:41.729 In this case, I built it to handle concurrent downloads. If the slow thing you're doing is waiting on network activity, that's an excellent candidate for concurrency. It's easy to wait on I/O, so just do it ten times at once. I tried out my own implementation, and it seemed to download a full set of files very quickly.

00:09:07.220 If I could write a faster version of the downloading process fairly easily, I started to wonder if there was a bit of a project here. After you've downloaded some gems, the next thing to do is compile and install them. And as I already had the technology to run the downloads in parallel, I could parallelize the compilation.

00:09:38.060 This did need a little more care because while gems can be downloaded in any order, they can only be compiled once all of their dependencies are available. So I could download and install gems seemingly faster than RubyGems and Bundler, which is very nice because those are quite slow operations, especially if you have high network latency.

00:10:10.399 But we don't do those every day. What we do a lot is loading files from our installed gems. It's not ten minutes slow, so any improvement isn't going to be life-changing. But even a slight improvement could be noticeable, and this is where our 'rake version' example comes back in.

00:10:39.790 700 milliseconds, 300 milliseconds, and 200 milliseconds—all of those differences are in choosing and then loading the right gem and the right files. We can't do this in parallel, but maybe we can cache. The square part here is the file system access, so it's not so easy.

00:11:20.370 Any cache we introduced will still need to be read from disk, but I experimented with storing specialized data structures acting like a database index or NoSQL store. The standard Ruby behavior involves looping over the load path and then checking each directory for the file that we're seeking.

00:11:52.290 I could make 'require' faster by skipping that load path iteration and just doing a hash lookup. Both versions of this code are hugely simplified, but less file system access for each 'require' call means faster loading of files from gems.

00:12:05.850 By reconstructing the hash of which files are available, the runtime work is reduced. This is similar to one of Bootsnap's optimizations, but it's system-wide and doesn't need any application context. We've added a lot of complexity here to something that used to be very simple, but it does give us a tiny gain in a frequently called code path.

00:12:49.000 Now, this doesn't actually help our 700 milliseconds that are spent requiring some files, but it's fun for specific use cases. It must be mostly the time spent loading these files. For that, I do write my own implementation, but while I used general knowledge of which Ruby constructs are faster and slower, I didn't make any special implementation decisions.

00:13:04.950 The lock file is already a cache that can be read with a single file system read, so there isn't a lot to algorithmically improve. Any speed gain over Bundler is probably from missing features. I've aimed to get as close as possible to a clean room implementation of Bundler's behavior to keep me focused on the features that matter most to everyday users.

00:13:41.169 Because of that, I might have just skipped some features that are taking part of their 700 milliseconds. I don't know if that's what's happening, but that's the point: if I haven't noticed it's missing, maybe it's not worth whatever delay it's adding.

00:14:06.910 With gem installation and loading covered, there's a final leg of Bundler's behavior to look at: dependency resolution. A lot of performance work has been done on Bundler's resolver, so I was sure I wouldn't be able to speed it up by just rewriting it. My only hope for a real improvement would be a totally different algorithm, and I'm not that clever.

00:14:37.110 Dependency resolution is a hard problem. It's NP-hard. The challenge is to consider all the possible combinations of versions for the gems listed in the Gemfile and their dependencies, and then find a combination that meets all the known constraints, choosing the best one that favors newer versions.

00:15:05.890 Luckily, I didn't need to be clever. Around this time last year, Natalie Weizenbaum—who you might know from her work on Sass—published a new dependency resolver for the Dart language that she spent quite a lot of time developing, called 'pub-grub.' Along with the implementation inside that package manager, which she had developed, she wrote a detailed article describing how it worked.

00:15:31.560 Even more luckily, John Hawthorn had seen that and ported the algorithm as a Ruby gem. I didn't know anything about this, so my experiments sat untouched for some time. They were interesting standalone thoughts but nothing more.

00:16:01.890 Then John and I met at RubyConf. He mentioned a library he had been working on earlier in the year, and I nearly dropped my drink—he was talking about the exact piece that I was missing: a new resolver that could use the gem catalogs I was downloading and produce an answer. It came with two features: it was often faster than Bundler's existing resolver in finding a solution, and when it couldn't find one, it could also give a clearer description of which gems were causing the conflict.

00:16:45.040 I went trying to describe how pub-grub actually works. If the problem space is interesting to you, I highly recommend Natalie's article. The oversimplified description is that it does a backtracking search like most other attempts at dependency resolution, but it does a much better job of learning and propagating dead ends, so it does much less work reconsidering similar options that it knows will eventually fail.

00:17:25.400 So the recommended technique here is to meet someone who recently implemented a solution to your exact problem. That may not be quite so easy, but switching to a better or more specialized algorithm is probably the number one option for big performance wins.

00:17:41.400 After that lucky meeting, John helped me integrate his pub-grub gem with my earlier experiments. Now that I had implementations for gem installation, dependency resolution, and runtime gem loading, it started to seem like it might be interesting to pull those experiments together and make a thing.

00:18:02.390 The result is Gel. It's a drop-in substitute for Bundler and RubyGems with restricted functionality, and it can be used locally without any changes to your project's or co-workers' production environments.

00:18:17.340 It's not a replacement, though. It supports 'bundle install,' 'bundle update,' 'bundle lock,' and 'bundle exec.' It doesn't support any of the gem authoring tools that come with Bundler, nor does it support local directory gems, and it doesn't support that other feature you're thinking of.

00:18:44.789 But it doesn't have to support those things because it's not intended as a replacement. It's an alternative that you can use if it meets your needs. You always have the option to switch to Bundler if you need something more.

00:19:09.500 It's also a very early-stage piece of software. The documentation needs work, the UI and error messages need work, and the platform compatibility needs work.

00:19:51.440 So 'bundle install' becomes 'gel install.' It currently doesn't support any options—we'll add some in time—but others, like bin stubs and deployment mode, I probably had a scope next under 'lock gel.'

00:20:01.120 'Lock' is a less common Bundler command. It generates a lock file from your Gemfile without installing gems. Afterwards, 'gel update' is equivalent to 'bundle update.' For example, 'gel update rails.'

00:20:30.620 There's a theme here: the simple commands work the same, just without the options. Finally, 'bundle exec rake.' This one is a bit different. We have 'gel exec rake,' but you can also just run 'rake,' and it will do what you mean.

00:21:07.440 Here, we will run 'rake' outside of our project. It uses the latest installed version. Using Gel, we get the same behavior inside our project. 'Bundle exec rake' uses the version selected by the lock file, and 'gel exec rake' does the same thing. But if we just run 'rake' here in the project, RubyGems will use the newer one and ignore the lock file.

00:21:34.530 Now, let's talk about the important performance numbers. We looked at 'rake version' earlier, so let's start there. We know that 'rake version' takes 300 milliseconds. With Gel, it's 200 milliseconds—just a bit under 35 milliseconds slower than when we specified all the gems manually.

00:22:06.220 Switching into a project directory, we know 'bundle exec rake version' is around 700 milliseconds. With Gel, it's 330 milliseconds. The same time is achieved by a non-Bundler invocation through RubyGems in about half the time it takes when using Bundler to select the right version.

00:22:56.340 But 'rake' is boring; 'rake' is easy. We need a more complicated command. We need an over-complicated command. We need Rails: 'bundle exec rails version' takes 1.3 seconds. With Gel, it's 500 milliseconds.

00:23:35.190 We can't spend all day asking things for their versions, though. So to benchmark the Rails application, we use 'rails runner' to boot it, do nothing else, and just return nil. We also note that this is without Spring and without Bootsnap.

00:24:26.790 We want a direct comparison between the two libraries, so it's simplest to just remove those completely. With Bundler, we boot it in 9 seconds. This is booting the Bundler-cut main Rails application, so it's not a 15-minute slog; it's a well-maintained Rails 5.2 application. Gel does it in 5 seconds.

00:24:56.180 That's still not super fast, but we've trimmed a lot of fat. Most of what remains is time spent in Rails itself. Next up is install from an existing lock file. A fresh 'bundle install' on the same Buildkite Rails application is a bit under 6 minutes.

00:25:34.250 There are a bit over 200 gems, and packages like Aguirre and Sassy take just over a minute each to compile. Gel install does more work concurrently, getting the time down to 2.5 minutes. That's over twice as fast, but it's hard to compare.

00:26:01.490 We're talking about writing faster Ruby code, but a lot of that time is spent running C compilers. In a different example, we can consider this single line in a Gemfile: TTY is a module library for writing terminal applications. For us, it's an easy way to install TTY for pure Ruby gems.

00:26:40.559 We're again using a pre-calculated lock file, so this is equivalent to running 'install' after an initial clone on a new machine or perhaps in your Docker image. 'Bundle install' is 4.3 seconds. There are fewer gems, but still, those C compilers are slow.

00:27:10.879 Maybe they needed JIT. With Gel, it's 1.3 seconds. Even with a small gem list and no extensions to compile, Gel gets the job done in less time. The fast action we'll measure is locking—a gem file and choosing the right set of versions and dependencies that satisfy it and writing out the gemspec block without the download and install steps.

00:27:46.439 You probably don't run this much yourself, but here we want to measure it in isolation. We've already seen that the install step is faster on its own, but that same TTY gem file 'bundle lock' takes 1.5 seconds.

00:28:19.370 Gel lock is 1 second. That's the easy part. There are two steps to lock resolution, so there are two parts that can make it slow. One is having lots of gems from lots of sources. That’s what the Buildkite app looks like; it needs to fetch data from all our sources.

00:28:55.160 The main RubyGems repository, some smaller pro and enterprise-type ones, and a few GitHub repositories are mixed in as well.

00:29:26.740 After that, we move on to step two, considering combinations of gems and versions that might meet our needs. We can't see CPU usage here, but in this case, it's not actually working as hard as it seems. This is slow because it's still downloading more things.

00:29:55.210 So that took 90 seconds, and we can now serve it at an average of just 18% CPU. The rest was waiting on network activity. Gel needs to do the same things—refreshing sources and then choosing versions takes 4 seconds. Caching makes things faster.

00:30:02.960 Here, the better the pre-warmed cache, the faster it goes. Without any caches, Bundler takes 2 minutes, and Gel takes 30 seconds. But there is another way that lock resolution can get slowed down.

00:30:51.220 Sometimes we can have a very small gem pile that is just hard to solve. Two gems and all the recent versions of one are incompatible with the other. In this example, Quiet Assets is only compatible with Rails 3 and 4. The correct solution is to use the latest available version of Quiet Assets and the last four-point release of Active Record, because we always prefer the newest version possible.

00:31:25.709 To find this solution, all 5.x versions of Active Record and Active Support must be considered first. We won't watch this one run, but despite being a much shorter Gemfile, this takes 2 minutes with Bundler. We can also see that this time is all in the CPU. Our improved caching is not going to help you, so it's down to the pub-grub algorithm. Gel does it in 4.5 seconds.

00:32:19.000 So if you'd like to give Gel a trial, install gems with 'gem install gel.' Everyone has RubyGems, so that's the easiest option for now. Soon, I hope to support 'brew install gel' as well.

00:32:49.660 To activate in your shell, you can use a shell initializer. This just exports some environment variables. This one adds Gel's bin stubs to your path, so when you run 'rake,' it will work. And this is the important part—it takes over for RubyGems so Gel can handle the requires to build your app.

00:33:19.270 I recommend adding this to your bash profile. Now, either way, once it's activated, you're ready to 'gel install' in your project. After that, you can just run 'rake' or 'rails,' and they should just work.

00:33:51.260 As for what’s next, in the short term, I'm going to see what projects Gel doesn't work for and expand support as people report issues. There are some things I already know need to be fixed. For example, it doesn't know how to compile extension gems from git sources. Fixing that only requires some programming; I just haven't had time yet.

00:34:29.230 In the long term, I think some of these ideas can be implemented in Bundler and RubyGems themselves. It's harder because they are mature projects with solidified architectural choices, but it’d be great to make things speedy for everyone.

00:35:01.140 So in conclusion, consider a rewrite. It's well-known that rewrites can be a very bad idea, but treated as contained experiments that you're willing to throw away, there's a lot to learn. It's freeing to be able to just test ideas without fitting them into the architecture of an established project.

00:35:50.010 Question the tools you use as developers. We have many awesome tools; maybe they can still be improved. Don't be afraid to have a go, and finally, try Gel. It'll probably break in some way, but report any problems, and I can fix them. Thank you! Questions, please.

00:36:37.790 Please.

00:36:50.030 Great job, Matthew! That was—I think you undersold how much faster some of that is.

00:37:05.250 So my question is, if I'm using a Ruby version manager like RVM, and I have a similar sort of shell expansion thing in my bash rc or whatever, how does Gel handle gem stubs? How does it handle me having multiple versions installed with something like RVM or rbenv?

00:37:32.900 If you have Gel installed as a gem, then it gets a little more complicated because, when you switch, Gel might be gone even though the shell expansion is still there.

00:37:45.210 That's one of the main reasons that I'm looking to switch to a binary-based installation so that you can have a single Gel version across all of your Ruby versions. Right now, I think you would need your version manager to cooperate in switching to the appropriate Gel shell expansion.

00:38:01.400 Thanks! I was wondering, can you use Gel and Bundler kind of at the same time to do different things? Could you use Gel, you know, to make a speedy lock file, for example, which you then install with Bundler or vice versa?

00:38:17.300 If you wanted to do it slowly, yes. Gel uses the same files as Bundler. It writes out the same gemspec that should be Bundler-compatible, and if it's not, then it's a bug.

00:38:33.590 Great, thank you! Yeah, that actually gets to exactly what I was going to ask: is Gel 100% behavior-compatible with Bundler, or are there known differences or exceptions to that?

00:38:50.560 It has a limited internal API, so if you start talking to the actual Bundler or Gem constants, it has a few things there to make most projects work. But if you really start exploring, it'll break down pretty quickly. However, the commands you have should be 100% behavior-compatible.

00:39:07.510 It will be behavior-compatible with files, but the commands don’t have the same options. Bundler has a plugin system, so it is theoretically possible to use Gel as a plugin of Bundler. I haven't looked into that yet, and that's definitely a possibility, but I'm not sure how much of the performance you would sacrifice by having Bundler involved.

00:39:39.230 Okay! Thank you! Hey, great talk!

00:39:52.450 So can I use your current require format? Yeah, yes! I think yes! This reads the same Gemfile as Bundler 1.

00:40:01.180 And if Bundler introduces changes to the work while I'm at it, it's more difficult, but for now, it works great.

00:40:13.950 Yeah, thank you!

00:40:19.830 Thank you!

00:40:25.660 Other questions? Okay, thank you!