Performance Optimization

Summarized using AI

Building a Compacting GC for MRI

Aaron Patterson • November 29, 2017 • New Orleans, LA

In Aaron Patterson's talk, "Building a Compacting GC for MRI" at RubyConf 2017, he delves into the intricate process of implementing a compacting garbage collector (GC) for Ruby's MRI (Matz's Ruby Interpreter). The talk provides a comprehensive overview of memory management in Ruby, emphasizing the importance of compaction for optimizing memory usage and enhancing performance.

Key Points Discussed:

- Introduction to the Speaker: Aaron Patterson, a prominent member of the Ruby core and Rails core teams, discusses his role at GitHub and his passion for Ruby.

- Importance of Ruby Development Support: Patterson highlights the significance of community support and sponsorship from companies that aid Ruby development.

- Copy-on-Write Optimizations: He explains the concept of copy-on-write (COW) which delays copying data until it is modified, helping optimize memory use in Ruby applications.

- Mark-and-Compact GC Fundamentals: The talk introduces the mark-and-compact garbage collector, detailing how it operates and its benefits over traditional garbage collection methods.

- Challenges in Forking Web Servers: Patterson discusses how forking processes can lead to increased memory usage due to multiple copies of Rails being loaded and how COW typically mitigates this issue.

- Compaction Algorithms: He elaborates on the two-finger compaction algorithm, explaining its simplicity and effectiveness in reducing free space fragmentation in memory.

- Implementation Details: Patterson describes the modifications made to Ruby's garbage collector, including the addition of functions for moving objects and updating references to ensure memory integrity.

- Memory Inspection Tools: He provides insights into tools like object_space.dump for analyzing memory usage and the benefits of understanding memory layouts for GC developers.

- Statistics from Analysis: The concluding part of the talk presents findings on memory savings achieved through compaction, encouraging developers to question existing assumptions regarding Ruby's capabilities.

The overall message of the presentation stresses the necessity for innovation in memory management within Ruby, challenging developers to rethink what is achievable through advancements like GC compaction.

In closing, Patterson shares a brief, humorous anecdote about how he gained temporary fame in Japan through a viral tweet, adding a personal touch to the technical discussion.

Building a Compacting GC for MRI
Aaron Patterson • November 29, 2017 • New Orleans, LA

Building a Compacting GC for MRI by Aaron Patterson

We will talk about implementing a compacting GC for MRI. This talk will cover compaction algorithms, along with implementation details, and challenges specific to MRI. Well cover low level and high level memory measurement tools, how to use them, and what they're telling us. We'll cover copy-on-write optimizations and how our GC impacts them. After this talk you should have a better understanding of how Ruby's memory management interacts with the system memory, how you can measure it, and how to adjust your application to work in concert with the system's memory.

RubyConf 2017

00:00:00.000 Oh God, the music faded. I guess that means I have to entertain you now. Hold on, I need a photo. Everyone say thank you, okay?
00:00:14.969 So, every time I give a presentation, I get extremely nervous. When I get nervous, I keep adding slides to my deck, so I'm up to 217 slides. That gives me about 11 seconds per slide, which means I think I’m actually over time right now.
00:00:31.500 First off, I want to tell you that I am right, and I will tell you how I know that I am right. It's because I have the microphone, and you do not. So there is an easy way for you to tell whether or not you're right.
00:00:55.170 This talk is about implementing a compacting garbage collector for MRI. I wanted to keep in theme with the keynote talks. So, this talk is about a rather morbid subject. Actually, it’s not. What I really want to say is that we need a lettuce emoji. The reason we need a lettuce emoji is so I can type ‘lettuce pair program.’ Now I have to type ‘salad pair program,’ and people just say that doesn’t make any sense.
00:01:21.520 Anyway, I really love pairing; I think I go particularly well with a nice Merlot. Today, we're going to talk about exploring memory in Ruby and building a compacting garbage collector.
00:01:45.610 My name is Aaron Patterson, but I go by the name of Tender Love on the Internet. You may not recognize me in person; you might actually recognize my icon. I'm on the Ruby core team and the Rails core team, and I like to talk about Ruby and Rails stuff. So if you want to discuss those topics, please come and say hi.
00:02:21.430 I work for a small startup called GitHub, which is an online code hosting website. Our mission is to make collaboration easier. I really enjoy using Git.
00:02:34.870 But I do not want to force push that on you. Yes, I am still on 11 seconds per slide. I wanted to mention that we are hiring. Usually, I don’t like to put up hiring slides because we are always hiring. If we’re always hiring, does that mean we actually are hiring? Yes, we are!
00:02:57.220 But the important thing about this position is that it is extremely special. This position is to be my boss. I am NOT joking.
00:03:16.019 This is not a joke. Please write down that URL or type it into your computer. I was going to pause longer and have you fill out the application now.
00:03:35.190 So, I have two cats who help me write code. This is Gorbachev Puff-Puff Thunderhorse, and this is my other cat, SeaTac Airport facebook YouTube Instagram Snapchat. Please come say hello to me.
00:03:49.859 I also have stickers of my cat. If you don't know what to say to me, just ask for a sticker of my cat. I will give you one. Thank you for being here. I couldn't be here without you, and I sincerely mean that.
00:04:07.290 I mean that in two senses: first, I’m glad that you came to my talk. Second, I don’t think Ruby could exist without you. I don’t mean this in a vague way like ‘Oh, we’re all Ruby programmers.’ I mean this in a much more concrete way.
00:04:31.560 I want to call out some of the hidden sponsors of Ruby development. These companies don’t get enough praise or publicity in our community. I want to contrast them with non-hidden sponsors of Ruby development.
00:04:50.550 For example, all these companies support me: GitHub sponsors, Kuroko's, and other core team members. These companies are directly sponsoring developers working on Ruby or Rails, and they are the ones we think about when we talk about developer sponsorship.
00:05:10.770 I feel that the large sponsors of open-source development make us feel guilty because we think, ‘What am I doing for Ruby? What is my company doing for Ruby?’ This guilty feeling can be dangerous because it leads to a cycle of feeling bad about ourselves.
00:05:42.990 I want to introduce the hidden sponsors of Ruby so that you don’t feel bad about yourselves. The way I want to demonstrate this is by describing Ruby Central. Ruby Central controls rubygems.org and pays for all its hosting costs. This is how all of your gems are funded.
00:06:09.380 Now, how does Ruby Central get funding? They put on conferences like RubyConf and RailsConf. These conferences gain money from attendees like you, who pay to come. So I want to do something Matsuda-san does at RubyKaya, which is asking how many of you are here on your company's dime? Raise your hands.
00:06:39.910 That is a lot of you! Because your company is sponsoring Ruby development and infrastructure directly by attending this conference. So give yourselves a round of applause!
00:07:14.480 You should thank your boss. Tweet about it online and get some sponsorship going for your company! So if you’re ever wondering how to help Ruby, the easiest way is to be here, to be present, and to spread the word.
00:07:37.490 Let’s talk about some technical stuff. I want to discuss a mark-and-compact garbage collector. I’ve been working on GC at GitHub for a while and want to focus on improving copy-on-write performance with Ruby itself.
00:08:03.600 We’ll look at memory and Ruby and I hope to get through all these slides because I have a nice treat for you at the end—a lightning talk that I could not give at the lightning talks yesterday.
00:08:29.780 I want to cover three different topics today: copy-on-write, building a compacting garbage collector in MRI, and memory inspection tools. This is a pretty low-level talk, and I realize that there are many new people in the audience today, so I want to call out specific things for new people to learn.
00:08:56.420 New people, focus on understanding what copy-on-write is and what compaction is. You don’t need to know exactly how these things work, but keep them in the back of your head.
00:09:07.920 One day, when you hear these terms or you encounter related problems, you can remember them and know the keywords to Google later. Any of you attendees can understand what’s in my presentation; it may just take some googling.
00:09:31.470 Experienced people, if you already know these terms, I want you to focus on the algorithms and implementation details.
00:09:40.390 First, let’s talk about copy-on-write optimizations, or as we shorten it to COW. I always think about a cow when we talk about this. Copy-on-write means we don’t copy something until we write to it. Here’s an example using Ruby's strings.
00:10:11.930 We first allocate an initial string, and if we look at the size of that string, it’s slightly over 9,000. If we duplicate the string and look at the size, however, it’s actually 40, because the memory was not actually copied. But after we write to the string, you can see that the size of the second string is now over 9,000.
00:10:47.288 So we copied that string on write. We can do the same thing with arrays, though, unfortunately, we don’t do this with hashes. I’m not sure why it’s implemented this way, but that's how it works today.
00:11:06.640 Why can we get away with this particular optimization? When we duplicate something, if we copy it, there’s no observable difference between the copy and the original. Since there’s no difference, we don’t need to copy it until something changes.
00:11:32.639 This optimization isn’t just limited to Ruby; it’s used in many places, one being your operating system. The operating system takes advantage of this when you create multiple processes using the fork system call.
00:11:44.809 In Ruby, when we allocate a very large string and then create a new child process by calling the fork method, the fork method copies the current process to the child process. However, since nothing has changed between the parent and child processes, it doesn’t actually copy anything.
00:12:14.550 When we write to the string, the operating system copies the memory that changed to that child process, implementing copy-on-write.
00:12:37.550 If we imagine this string laid out in the operating system’s memory, the parent process contains this large string across many pages.
00:12:57.690 Typically, each page is about 4k, depending on your operating system. After the program forks, the child process will create pages that point back to the parent’s process.
00:13:23.480 When the child process writes to a part of the memory, only that page being written to is actually copied, rather than copying the entire memory.
00:13:59.460 When this happens, it’s called a copy-on-write page fault. We can use operating system instrumentation to find and measure these faults. To improve copy-on-write performance, we want to decrease the number of page faults.
00:14:19.720 Why is copy-on-write important to us? At GitHub, we use a forking web server, which means it creates a parent process with many children. The parent process handles multiple web requests.
00:14:49.790 If each child process loads a copy of Rails, we end up with three copies of Rails in memory. This uses a lot of memory and means that we spend time loading Rails into each child process.
00:15:23.100 If we load Rails in the parent process and then have all the children fork after that, all the child processes will point back to the parent’s copy of Rails.
00:15:43.540 This actually reduces our boot time because we only load Rails once and decreases memory usage since we have only one copy in memory.
00:15:59.360 This is how it works today if you’re using Unicorn in production, but this isn’t the type of optimization I’m interested in. I'm focused on reducing page faults, which happen when something writes to a shared memory page.
00:16:22.220 And, what mutates shared memory? The garbage collector. The collector changes shared memory, and during the mark phase, it would go through and cause your entire program to be copied to the child.
00:16:43.780 For example, if we have a parent process with objects in memory and a child process pointing at these objects, during the mark phase, the garbage collector goes through and writes a flag on each object.
00:17:17.370 Once the flag is written, the object could be copied to the child process, resulting in all objects being duplicated.
00:17:42.880 This was worsened when the operating system copied an entire page. This was solved with a technique called bitmap marking.
00:18:01.180 With bitmap marking, instead of storing that mark bit on each object, it actually stores those bits in a table off to the side. This table does get copied between processes, but it’s much smaller than all objects in memory.
00:18:23.560 Another unhandy element is object generation. Ruby has a generational garbage collector, meaning as objects get older, they age. We keep track of the generation, increasing the age each time an object survives a collection.
00:18:51.280 The oldest an object can get is three GCs old if I remember correctly, and this age is stored on the object itself. This results in a similar problem to the mark bit. Every time we do a garbage collection, if we increase an object's age, it’s written to and copied to the child process.
00:19:17.670 A workaround for this is a gem called Nakayoshi Fork. If you're using a forking web server in production like we do, check out this gem. It works on the principle that generations are bounded: when an object reaches age three, it does not go over.
00:19:41.410 Another principle is that objects created before calling fork are likely to get old. The solution is to run the garbage collector a few times before forking.
00:20:06.550 If you check the source of Nakayoshi Fork, that's essentially what happens: we run GC a few times and then call fork.
00:20:33.060 The garbage collector also writes to memory during object allocation. Allocating an object can cause memory to be copied from one process to another.
00:20:54.730 For example, if we have a parent and child process, the child allocates a new object. However, a Ruby object is smaller than a system page. Since we write to that page, the operating system copies the entire page.
00:21:22.370 To fix this problem, we can reduce the space used, which can be done through garbage collector compaction. Suppose we have a process with two pages; if we fork the process, both pages can potentially be copied.
00:21:45.440 If we move some of those objects around, only one of those pages with free space will be copied to the child process.
00:22:04.840 So, GC compaction simplifies the task of moving objects around until there is no empty space left. All space should be grouped in one area, and we aim for the program to work after moving these objects.
00:22:25.290 One reason for compacting is to reduce memory usage and another is that someone told me this was impossible last year at RubyConf.
00:22:49.180 I presented a garbage collection technique last year, emphasizing that moving objects was impossible. Koichi approached me afterward and inquired why I said that. I hadn’t tried it yet.
00:23:16.670 So, I had to try it, and it turns out that it is not impossible to move objects.
00:23:37.050 Let’s discuss a compaction algorithm called two-finger compaction. This algorithm has some disadvantages; it’s slow and moves objects to random places throughout memory.
00:23:58.500 However, it has one significant advantage: it’s easy to understand and to implement. The general idea of garbage compaction algorithms is to move objects and then update all the references.
00:24:18.870 Moving looks something like this: we have a heap layout with objects sprinkled with free spaces. Two-finger compaction uses two pointers called fingers.
00:24:39.490 The free pointer moves right until it finds free space, while the scan pointer moves left until it finds a filled space. When they find a matching pair, they swap them.
00:24:56.540 In this case, neither pointer needs to move as they’re already at a matching location. We then leave a forwarding address in the place where the object used to reside.
00:25:13.600 We continue this process until the two fingers meet. This part marks the end of the compaction process.
00:25:31.910 After compaction, the heap might look like this, with the string and symbol moved to different locations. We may need to update the references.
00:25:55.390 We start at the beginning of the heap, checking if objects have references. If they do, we find and update them. Once all references have been updated, we change the forwarding addresses back to free spaces and finish compacting.
00:26:14.090 All this code is open source; you can find it at this fork on GitHub. This fork introduces a new method, GC.compact, allowing you to manually compact your heap.
00:26:31.520 In the pseudocode, we load all of Rails and its dependencies. Then, we compact all that’s in memory and fork off our child processes.
00:26:53.240 I had to make several key changes to Ruby in order to implement this in GC. I introduced a function called GC.move that swaps two locations, and another type called T_MOVED_OBJECT for marking forwarding addresses.
00:27:19.370 I also created a function called GC.update_object_references, which updates multiple references. This function is the most complicated part of the implementation.
00:27:38.580 It has to consider the update of references for every type of object within the system, hence it works as a giant switch statement.
00:27:58.930 Now, not every object in the system can move. I introduced a table called a pin bits table. This was a crucial addition.
00:28:16.130 Until now, we stated that every object can move. However, when finding references to objects, we have to ensure that we can actually find and update them.
00:28:29.950 If we have Ruby and C objects involved, we must mark them properly to prevent move-related crashes.
00:28:51.160 The way we handle that is by implementing a function where a C object calls a Ruby object to mark it. This ensures we know the reference cannot move.
00:29:13.330 Now, we also need to reference global variables properly. The GC does not know how to update global variables in C when they reference Ruby objects.
00:29:34.520 In this case, we started applying heuristics to pin these objects as they were generated using certain Ruby functions.
00:29:55.920 Now, we have a situation where string literals also cannot move. Ruby compiles code into instruction sequences, turning literals into objects.
00:30:18.220 If the literal moves, the byte code needs to update its reference as well. This makes the situation difficult, as updating byte code is a complex task.
00:30:41.380 As such, apart from the objects marked with our BGC mark, hash keys, dual references, objects created with RB_define_class, and string literals, most objects can indeed move.
00:31:00.920 However, there is good news! Most of these cases can be corrected, except for the RB_GC mark case. In a Rails application, 46% of objects in memory can indeed move.
00:31:15.440 Before compaction, this represents a heap where red indicates unmovable objects, green indicates movable objects, and white is free space.
00:31:44.250 After compaction, the heap shows how the potentially movable objects were grouped together.
00:32:01.350 Now, let’s talk about inspecting memory. I’ll go through this quickly because I really want to show you my lightning talk at the end.
00:32:27.420 One favorite way to inspect memory is using object_space.dump. This dumps all your memory, is the easiest way, and the fastest way to understand how your Ruby program uses memory.
00:32:54.920 You can use this function to dump your heap into a JSON file, measuring things such as Rails boot time. The output will consist of one line per object with a JSON hash representation.
00:33:21.390 This format allows you to view the memory address of each object, their referenced addresses, and their size. This information is crucial for GC developers to understand the object’s memory locations.
00:33:45.540 Understanding this layout can help generate a graph of what objects look like in memory. Dumping the heap allows you to depict how it appears before and after compaction.
00:34:03.100 You can find this on GitHub as well, and there’s code you can use to create these graphs.
00:34:20.870 In the proc file system, you can find a lot of useful information, such as address ranges that calculate RSS and PSS of your program.
00:34:40.350 TL;DR; use PSS instead of RSS when we're using forking programs like Unicorn.
00:35:05.600 Let’s look at the GC compaction savings. After performing this analysis, we saw only about 150k bytes of savings, but that's on a small program.
00:35:25.750 Our testing needs to include much larger applications to give us accurate figures.
00:35:44.430 In conclusion, Ruby needs compaction for several reasons, one being that I previously believed it impossible. As Ruby developers, we must question the limitations and assumptions of what we can achieve.
00:36:15.520 Most things are possible if we try hard enough—specifically, if I try harder.
00:36:34.870 Thank you all for being here, and let’s give it up for Ruby!
00:37:06.520 Okay, I have two more minutes for my lightning talk! Let me share how I became extremely famous in Japan for one day...
00:37:28.040 I made a tweet with one character that received fifty-three thousand retweets and eighty-three thousand likes. I want to share how this unfolded.
00:37:54.990 My love for studying Japanese began because, back then, Ruby blogs were mostly in Japanese. I wanted to understand those contents.
00:38:21.250 Ten years later, I learned how to read those posts and share that experience. Japanese consists of three writing systems: kanji, hiragana, and katakana, plus romaji.
00:38:47.360 Interestingly, katakana is typically used for borrowed words, while the go-to phonetic sounds consist of 46 characters. Sounds can be ambiguous, as illustrated with characters in pronunciation.
00:39:16.130 Japanese sounds closely resemble English, making it easier for English speakers to pronounce them compared to German.
00:39:43.570 The unique point of the language is the evolution of its characters from pictographs to more defined scripts.
00:40:09.600 So, about my tweet: It refers to something simple—'to laugh,' translated as ‘w’ for laughter. It quickly escalated to fifty-three thousand retweets, blowing up unexpectedly.
00:40:30.660 While in Tokyo, my friend even asked the bartender if she had seen my tweet, which she actually had!
00:40:57.570 I humorously recommend people to avoid using this simplistic expression to improve their Japanese.
00:41:17.020 Moreover, this expression amusingly corresponds to an offensive term in Chinese culture.
00:41:32.960 Look up ‘grass mud horse’ on Wikipedia, and you will discover more about it.
00:41:51.970 In summary, learning foreign languages provides exceptional insights. If you want to start studying, I recommend Japanese.
00:42:06.780 Thank you for this engaging talk!
Explore all talks recorded at RubyConf 2017
+83