Ruby Internals

Compacting GC for MRI

Balkan Ruby 2019

00:00:16.460 Calm down! Alright, we did it! We did it! Okay, okay. This talk is called 'Compacting GC for MRI.'
00:00:22.170 Hello everybody! My name is Aaron Patterson, also known as Tenderlove. If you don't recognize me in person, this is what I look like on the internet; that is my icon.
00:00:35.160 So this is my online self, but this is me in real life. Now, I have two cats, and this is one of them: Gorbachev, or Gorbachev Puff-Puff Thunderhorse the Third.
00:00:43.980 I think that some of you may not know him because I was airdropping his picture to people, and some people declined.
00:00:49.440 Maybe now you know where this is coming from. This is my other cat, SeaTac, or 'Oops Sea-Tac Airport Facebook YouTube Instagram Snapchat' as her full name. But we just call her Choo Choo.
00:01:06.720 I have stickers of my cat, so if you would like a sticker of my cat, come say hello to me! I'm a pretty, I don’t know, shy person and I know other people are too, so I have a hard time making conversations. But if you want to come say hi to me, I'll have stickers and we can talk about cats. That is something we can do together.
00:01:29.820 I work for a small startup company called GitHub. I don't know if you've heard of GitHub, but it's a place where you can host code online. You could say that working at GitHub is my active job.
00:01:41.909 Alright, okay, I'm really excited to be here in Bulgaria. It's my very first time here, and it's amazing! I'm having a great time. The food is amazing, the people are so friendly, and I just want to say thank you so much for having me here.
00:02:16.230 Whenever I go somewhere, I like to support local culture, try out the local foods, and do things that local people do. So I tried out some local drinks.
00:02:30.120 I can’t read this drink, but I had it and it was interesting. I also tried out Rakia, and there are photos missing of this, but it was very good; I enjoyed that.
00:02:57.239 I also wanted to check out local computers because I thought it would be cool to try local computers. So I tried one of these. As mentioned when I was being introduced, I also wanted to get some local cultures, so I brought home Bulgarian yogurt culture. Thank you so much for that!
00:03:30.180 I didn't know how famous Bulgarian yogurt was until I told a friend of mine that I was going to Bulgaria. She is in Japan and said, 'Oh, yogurt, yogurt!' I asked her what she was talking about, and she said, 'They're famous for yogurt.' I had no idea!
00:04:00.299 But the most famous yogurt in Japan is called Bulgarian yogurt; it literally says Bulgarian yogurt on it. That's how I learned about it.
00:04:14.040 Anyway, I’m on the Ruby core team, which is responsible for Ruby development. I actually just found out that I've been on the team for almost ten years; this October will mark my 10th year on the team.
00:04:57.270 This is my very first commit! I'm also on the Rails core team. I looked up my oldest commits on Ruby, and I found my very first commit on Rails was in 2009; it was merged by Jeremy, but I wasn't on the Rails core team until 2011.
00:05:21.570 I've been on the Rails core team for eight years. I’m not bringing this up just to say I've been doing this for a long time, but I actually wanted to discuss why I got involved in the community and why I’ve been here for so long.
00:06:04.010 The reason is that I love Ruby. I love the Ruby programming language and I’ve been doing development work since 1999. I got into Ruby programming around 2005 when someone at work attended the 'No Fluff, Just Stuff' conference where Dave Thomas gave a presentation about Ruby.
00:06:55.170 They brought it back to work and told me about it. I tried it out and thought it was amazing! At the time, I was actually a reluctant Java programmer; I previously was a Perl programmer.
00:07:16.149 I was forced to switch to Java, thinking that once Perl 6 came out, I wouldn’t have to deal with Java anymore. However, we all know how that ended up. I learned about Ruby, and it felt like a breath of fresh air as a Java developer.
00:08:34.389 Everything was so easy for me. Everything just worked the way I thought it would, and when I made a mistake, it was easy to identify why. The language was very easy to pick up, and there was no boilerplate code.
00:09:15.850 I want to show you what I mean by boilerplate code. Back then, I was doing Java 1.3, so I looked up how to do a map in Java 1.3, and I will provide an example.
00:10:56.220 Here we have a list and we're iterating over a list of strings, trying to convert them into integers. On the left is the Java example, similar to what I would have written at the time, and on the right is our Ruby example.
00:11:16.030 What is actually annoying about this example is that the Java example doesn't actually work because it returns an int, which is not an object, and you can't just put that onto an ArrayList.
00:12:21.320 This doesn't work; you need to think about primitive types versus object types. In Ruby, we don't have to think about that stuff because everything is just an object, which lowers the amount of mental overhead required for programming.
00:13:19.740 Since I am a very forgetful person, lowering that mental overhead is very important. I want to talk about my very first serious Ruby program.
00:14:32.880 In 2005, the third Lord of the Rings movie was released, and I wanted to go see it. It was going to be released on my birthday, December 16th, 2005.
00:15:05.569 All three movies were going to be played on the same day, and I wanted to go see this. I tried to buy the tickets online, but the website kept crashing.
00:15:36.710 I was at work, and our application took ten minutes to compile. Every time I made a change, I'd have to wait ten minutes. So, I decided to write a Ruby program that would automatically attempt to purchase the tickets for me.
00:16:39.550 I even put my credit card information into the Ruby program that submitted to the website. However, the problem was that while it could submit the form, I was aware of what the error page looked like, but I didn't know what the success page looked like.
00:17:29.370 So, I made it retry if there was an error. If the page changed, the program would log a message and then retry. I let the program run and forgot about it. Later, I noticed that the output in the terminal had changed, and I wondered what was happening.
00:18:48.220 When I went through the process, I discovered that it was actually succeeding over and over again! I had a mild panic attack.
00:19:02.220 I called my credit card company and asked a very shady question: 'Hi, how many times have I charged my credit card today?' They said, 'Oh, you charged your credit card once.' I felt relieved.
00:19:40.590 Then, I called the ticketing company because I wasn’t sure if they had processed my ticket purchase. I gave them my details, and they said, 'Oh yeah, you purchased two tickets, but it looks like we tried to charge your credit card hundreds of times.'
00:20:14.930 So the moral of the story is: never write infinite loops. You should always have an exit case. At the same time, I really love Rails. Around this time, DHH published the 'Whoops' video where I discovered Rails, and it had all the same features as Ruby.
00:21:20.170 It was so easy to use; everything just worked the way I thought it would work. I didn’t have to write any boilerplate code, but beyond that, what I really enjoyed was that the community valued these particular things.
00:22:04.490 The values of the community truly resonated with me. We didn't want to endure writing a myriad of XML configuration files or waiting long times for compilation. We wanted the burden to be on the computer, not on us, the programmers.
00:23:09.500 This made me feel really good, probably for the first time I genuinely enjoyed programming and thought, 'I need to do this as my day job.' This is in 2005-2006.
00:23:52.360 I thought I must write Rails! This is a picture of me from 2006. Unfortunately at that time, nobody was hiring Rails developers because it was so new. There were no companies doing this.
00:24:36.440 But I really wanted to pursue it. I ended up joining a startup where some of my coworkers quit their jobs to start a startup using Rails, and I took the leap to join them.
00:25:36.990 However, since there were no hiring opportunities for Rails developers, I had to take a salary cut, which was about 25%. That was a significant loss for me.
00:26:48.410 At that time, however, it motivated me to become involved with the Ruby and Rails communities. I wanted to improve the language and the framework, increasing the number of people developing Rails applications.
00:27:41.790 My goal was to enhance the community so that I could work somewhere else, possibly receiving a raise. I also wanted to ensure that more people could feel the productivity boost and that sense of, 'Oh, the machine should be doing the work for me, not me doing all the labor.' I wanted everyone to experience that.
00:28:34.370 This is a long-winded way of saying thank you to each of you for being Ruby and Rails developers. You are the ones creating companies, jobs, and writing Ruby code so that I can also write Ruby code during my day job.
00:29:13.210 If it weren't for you writing Ruby and Rails code, I don't think I would have the job I have today, or have the opportunity to write in the language I love so much. So thank you all.
00:30:10.200 Now, we will move on to the actual technical part of my presentation. I want to talk to you today about compacting GC for MRI, a feature I wrote for Ruby that will be shipping with Ruby 2.7.
00:30:35.960 This feature took me three years to develop. To be honest, this presentation focuses on the most difficult code I have ever written in my entire life, and I feel strange summarizing it here for you in a forty-minute presentation.
00:32:24.460 However, I am going to explain this to you, and I hope that I will do a good job.
00:32:32.220 First off, let's address the question: what is compaction? Compaction essentially refers to rearranging allocated memory.
00:33:38.080 Imagine we have memory in our computer, and we are rearranging it so that free and allocated memory are next to each other. All we're doing is taking the free chunks and the allocated chunks and trying to group them together.
00:34:24.480 You may remember the process of defragmenting your hard drive; we are doing something similar but to the memory inside of your computer.
00:35:19.820 So why bother with compaction? What are its benefits? Some benefits include efficient memory usage. For instance, if we have a memory layout that looks something like this and need to allocate a new item that is the same size as the existing free slots, it won't fit.
00:36:04.640 However, rearranging the objects can allow this new allocation to fit, enabling more efficient memory usage.
00:36:51.360 Another benefit of compaction is efficient usage of CPU caches. When your program accesses memory, it reads that memory in chunks.
00:37:28.960 If our memory is compacted, we can leverage that cache more efficiently, improving performance and locality.
00:38:04.780 Compaction also provides advantages for techniques like copy-on-write, which can increase the efficiency of our memory usage system. These advantages are primarily a solution to the problem of fragmentation.
00:38:53.900 Fragmented memory means we have chunks of free space interspersed throughout our memory. By compacting, we eliminate that fragmentation by ensuring that similar types of data are grouped together.
00:39:35.510 Before getting into the algorithms, I want to explain that our Ruby programs have two heaps: the malloc heap and Ruby's object heap.
00:40:06.920 We can visualize our system memory as a malloc heap, which allocates memory from the operating system. Inside that malloc heap is Ruby's object heap, where all Ruby objects are allocated.
00:40:55.400 Each Ruby object requires a fixed amount of memory, typically 40 bytes, which we can refer to as a slot. Each slot can be either empty or filled. Memory allocation and fragmentation can occur in both heaps, the malloc heap and Ruby's object heap.
00:41:38.600 At work, we utilize je malloc for the malloc heap, and for dealing with fragmentation within the Ruby object heap, we use GC compact, which we will discuss further.
00:42:24.050 Let's delve into Ruby's heap, which is stored inside that malloc heap. Ruby's heap consists of slots, each of which can either contain an object or be empty.
00:43:18.890 Now, let’s talk about the algorithm implemented for compaction: it’s called two-finger compaction. This method originated in Lisp and involves two main steps: moving the objects and updating their references.
00:44:05.530 Let’s say we have a heap layout with six objects, each with fixed addresses. The algorithm places two pointers on opposite sides of the heap: one (the free pointer) moves right until it finds a free slot, and the scan pointer moves left until it finds a filled slot.
00:44:55.890 When it discovers these two locations, it swaps them. For example, if a filled slot is found on the left and a free slot on the right, those two would be swapped.
00:45:40.830 After moving, a forwarding address is left stating where the object moved from, and this process continues until the two pointers meet. Once they meet, all objects have been moved.
00:46:31.800 The next step is updating the references. With an object relationship such as A pointing at D and B pointing at F, once the objects are moved, the references are outdated and need updating.
00:47:33.540 The process checks each object, and if it detects that B points at an object that has moved, it updates that reference to the new location.
00:48:03.100 After completing the updates, those new locations transform back into free slots, finishing the compaction process.
00:48:45.960 Now let’s explore the reference updating step in detail. The process involves iterating over each object within the heap and adjusting their respective references to their new locations.
00:49:29.520 Understanding how objects hold references is key. We need to know how to update references for all Ruby classes, such as hashes and arrays.
00:50:03.860 The core challenge is how C extensions interact with this process. If C objects point at Ruby objects, and those Ruby objects move, the C objects will crash.
00:50:50.480 To prevent this, I developed a scheme to identify where references are stored. For known types, we have a straightforward way of updating references.
00:51:25.610 For example, if an array is accessing a Ruby object, the GC knows how to update that reference. However, unknown types, like objects in C without Ruby visibility, pose a challenge.
00:52:10.260 In scenarios where we want to ensure C extension references do not cause crashes post-compaction, I introduced a mechanism of pinning bits.
00:52:54.160 The pin bits essentially flag whether an object can move. This way, if a C extension holds a reference to an object, and we call our mark function, we ensure it doesn't move.
00:53:30.130 This ensures that every object that has been marked with a pin bit remains stationary throughout compaction.
00:54:10.640 However, there was a limitation: if a C extension didn't mark an object, it might unintentionally hold a reference to a moved object, resulting in a crash.
00:54:55.790 To ensure that GC doesn’t lose track of hidden references, we established a rule: if a C extension references a Ruby object, it must mark that reference.
00:55:38.540 This simplicity emphasizes the rule: holding a reference requires marking.
00:56:27.330 Another challenge arises with the object ID method, which also necessitates direct memory access. Typically, the object ID corresponds to its memory location.
00:57:08.250 However, if the object moves, we must determine what value to cache. If an object ID is created and compacted, we return the location it was first assigned.
00:57:58.450 This involved creating a map of seen object IDs, ensuring that anytime the GC runs, the object IDs are updated accordingly.
00:58:40.720 I hope you all can see the multifaceted nature of garbage collection and compaction. The original purpose of this was to improve Ruby's memory efficiency, which it undoubtedly has.
00:59:21.050 To see the actual impact of these changes, I will show you a basic Rails application. The graph reflects the heap before and after compaction.
00:59:58.600 After running compaction, you can see the decline in the number of slots.
01:00:45.270 Ultimately, this compaction process reduces fragmentation significantly. Looking ahead, I seek to refine this approach further to improve performance.
01:01:32.520 This transition will include reducing the number of required steps under garbage collection operations.
01:02:10.320 I want to adopt a sliding compactor that organizes memory more effectively and promotes better locality for CPU caches.
01:02:53.169 In the future, I hope to implement a variable width allocation process, giving Ruby full control over memory, which in turn would enhance performance.
01:03:45.270 Thank you for your time. If you have any questions, I would love to engage during the Q&A.
01:04:34.360 And remember, come say hi to me later; I will have stickers!