MongoDB
The Bug that Forced Me to Understand Memory Compaction

Summarized using AI

The Bug that Forced Me to Understand Memory Compaction

Emily Giurleo • December 18, 2020 • Online

In the talk titled "The Bug that Forced Me to Understand Memory Compaction," Emily Giurleo, a software engineer and Rubyist, discusses her journey of learning about memory management in Ruby after encountering a bug related to the BSON gem she maintained. The presentation outlines several key concepts concerning Ruby's memory allocation, garbage collection, and the newly introduced memory compaction feature in Ruby 2.7.

Key Points Discussed:

  • Starting Point: Giurleo's knowledge of Ruby's garbage collection was limited until a user reported a segfault error associated with gc.compact in the BSON gem, a C extension used for serializing data for MongoDB.
  • Understanding Memory and the Heap: She explains that memory is stored in RAM, specifically within what Ruby calls the Ruby heap, which is organized into slots on heap pages. Each object in Ruby occupies a slot in this heap while the program runs.
  • Garbage Collection (GC): Ruby employs a mark-and-sweep algorithm for garbage collection, where unused objects are marked and subsequently swept away to free up memory. Despite this, freed slots can lead to memory bloat since they cannot be easily reused if they are not contiguous.
  • Memory Compaction: The gc.compact method introduced in Ruby 2.7 allows for memory compaction. This process moves used memory slots to the start of the heap, thereby freeing up space at the end for future allocations. Giurleo likens effective memory management to organizing clutter, much like Marie Kondo.
  • Interplay with C Extensions: Issues arise when using memory compaction and C extensions because the addresses used in C must align with Ruby's memory structure. Giurleo outlines that if the Ruby garbage collector does not know about pointers and their relationships, memory issues, including segmentation faults, can occur.

Resolving the Segfault:

  • She discovered that her BSON gem failed to mark a long-lived object, leading to mishandling during garbage collection, which caused the segfault. The solution involved using rb_gc_register_mark_object to properly mark long-lived variables as root objects, preventing them from being relocated during compaction.
  • Giurleo emphasizes two main lessons drawn from this experience: the importance of understanding systems design in addition to coding skills, and the value of knowledge sharing in the tech community for overcoming challenges.

Conclusion:

  • Giurleo concludes with essential takeaways regarding memory management in Ruby 2.7 and offers advice on implementing C extensions for better compatibility and function. She encourages developers to explore and share knowledge, underscoring the collective growth of the developer community.

The Bug that Forced Me to Understand Memory Compaction
Emily Giurleo • December 18, 2020 • Online

Did you know that Ruby 2.7 introduces a new method for manual memory compaction?

Neither did I.

Then a user reported a bug on a gem I maintain, and well...

In this talk, I’ll tell you a story about how one bug forced me to learn all about memory management in Ruby. By the end of this talk, you should understand how memory is allocated on the heap, how Ruby implements garbage collection, and what memory compaction is all about!

Emily Giurleo
Emily Giurleo is a software engineer and avid Rubyist. This December, she'll start working at Numero, where she'll help build the next generation of campaign finance tools. In her spare time, she enjoys creating tech for good causes, reading fantasy novels, and hanging out with her pets.

RubyConf 2020

00:00:04.319 Well, here we are, my friends! It is that time of the day when we are about to jump into the next live talk. I have a couple of announcements that I want to make sure are out there for all of you who are just wondering about some important things. First, I just announced the RubyComp 5K or 30-minute exercise challenge. Check that out in the open chat if you have not yet; this is very important, and I hope to see you participating.
00:00:17.199 The other thing is that there is an ever-growing list of amazing Slack channels around various topics. One of which I want to highlight, just because I think it's fun, is the Great Ruby Coffe Sarnich Make Off. If you're interested or want to know what that’s about, don’t forget to check out that channel too.
00:00:32.079 Also, we had heard from the facility manager that there's been some crowding in the kitchen. The current facility has a lot of crowding, hallway traffic, and even blanket couch crowding, and there’s a slew of furry friends that are making their way into the conference without a ticket. Not to say that this is bad—we welcome everybody, of course—but it’s really important to be nice to each other in your facility. If you're going to make a sandwich, breakfast, or coffee, that’s fine. The talks are recorded, so you can take your time. So, without further ado, here we go! I would like to introduce our next speaker, Emily! If you want to join me on stage, here we are!
00:01:24.960 Alright, good, good, good! Without further ado, to all my conference friends out there, Emily, it is all yours. Hello, everyone! Thank you so much for coming. My name is Emily Giurleo, and this talk is called 'The Bug That Forced Me to Understand Memory Compaction.' In this talk, I'm going to tell you a story, and it starts with a job that I had until recently working at MongoDB.
00:02:07.280 At MongoDB, I helped maintain three fairly popular gems: the MongoDB Ruby Driver, BSON, and the BSON gems. The BSON gem, which is the resident troublemaker of this talk, is a Ruby gem with a C extension. This gem basically just serializes data to and from BSON, which is the data format that MongoDB uses to send information. That’s not super relevant, but it will come back later.
00:02:25.280 One day, I was going about my business when I received a user-submitted ticket. The user said that the BSON gem was segfaulting whenever they called gc.compact. Now, as a fantastic gem maintainer, my first reaction to this ticket was, 'What the heck is GC.compact?' Clearly, I had a lot to learn before I was ready to fix this bug. This talk is the story of how I gained the knowledge that I needed to understand how Ruby manages memory.
00:03:10.720 In this talk, I'll teach you everything I learned about how Ruby manages memory, starting with what memory actually is, to how Ruby implements garbage collection, and what memory compaction is all about. Then, we're going to talk about how this gets more complicated in C extensions and, finally, how I managed to fix the bug. There should be about five minutes at the end of this session if anybody has any questions, but I will also be answering questions in the Slack channel for the rest of the conference, so feel free to join me in there.
00:03:39.920 Let’s get started! So, back to my original question: what the heck is gc.compact? GC.compact is a new method introduced in Ruby 2.7, and it does something called memory compaction. If you are a fantastic gem maintainer like I was, you will ask, 'But what is memory, and why would you want to compact it?' So clearly, I had to go back to the basics if I wanted to get anywhere with this bug.
00:04:03.280 Memory is where your computer stores information, and there are many different types. For the purpose of this talk, we're going to talk about RAM, or Random Access Memory. RAM is like the short-term storage of your computer; it’s where your computer keeps all the information that it’s going to need for the next little while in order to run the programs that you are currently using. Ruby uses a section of RAM called the Ruby heap to store the data it creates while it’s running a program.
00:04:38.639 The heap is made up of many slots, and each slot is about 40 bytes, organized into heap pages, which are about 16 kilobytes. You can think of the Ruby heap as being made up of many of these heap pages. Every time your Ruby program creates an object, it takes up one slot in the heap where it stores the data for that object. That’s not entirely true, but for the purposes of this talk, it is true. It’s also important to know that every slot has an address, and this is going to be really important later.
00:05:05.919 As your Ruby program runs, it’s going to use up more and more memory in the Ruby heap. Once it uses up all the allocated memory, it’s going to create more heap pages that it can fill up to continue running the program. However, memory is a physical component of your computer, which means that it’s not infinite. Eventually, it’s going to run out. So, Ruby needs a way to reuse the same memory to make your program as efficient as possible, and this process is called garbage collection.
00:05:40.400 You can think of garbage collection as the Marie Kondo of the Ruby language. Ruby keeps track of all of the objects that you’ve created while running a program. Once an object is no longer being used—meaning it goes out of scope—it’s not referenced by any other object in your program, and Ruby will destroy it and free up the memory that that object was previously using. The Ruby garbage collection algorithm is called mark and sweep. There are a lot of variations of a mark and sweep algorithm, so I’m going to go over it at the highest level.
00:06:31.120 Essentially, this is what's called a tracing garbage collector because it uses the traces between object references in order to properly garbage collect the objects in your program. On the left, I have a representation of the heap, and on the right, I have a representation of objects in Ruby and their relationship. The first part of mark and sweep, I’m betting you could guess, is called the mark phase. During the mark phase, the Ruby garbage collector marks all of the Ruby objects that are still in use by the program. This just means essentially flipping a bit so that each object has that mark indicating it's still in use. It starts out with marked root objects. These are objects that will be used throughout your program.
00:07:31.760 The Ruby garbage collector knows that they are still in use. Then, the Ruby garbage collector will find all of the objects referenced by those root objects and mark them as well. In this example, it would also mark the blue object referenced by the yellow object, and then it would mark the gray object, which is also referenced by the yellow object. If the blue and gray objects referenced other objects, those would be marked as well. You'll notice that the green object isn’t a root and it’s not referenced by any other object, so it doesn't get marked. Now we move to the next phase of mark and sweep, which is you guessed it: sweep.
00:08:07.680 During this phase, the Ruby garbage collector goes through all these objects again and finds the ones that aren’t marked, meaning they’re no longer in use in the program. Since these objects are no longer in use, the Ruby garbage collector will destroy them and free up their space in the heap so that it can reuse them in the future. Once you perform garbage collection, your heap is going to have some slots that are full with objects and some free slots.
00:08:44.880 You would think that once this happens, Ruby can reuse those free slots to allocate more objects and keep running your program. However, it isn’t that simple. Ruby can only reallocate used heap pages once they are completely empty. The empty slots that exist on heap pages that still have some data in them are not reusable. This can create a problem known as memory bloat.
00:09:06.320 Memory bloat is when a program has allocated way more memory than it’s actually filling up with data. According to your computer, your program may be using pages and pages of memory on the heap, but those pages might only have a few slots filled. In the absolute worst case, it could look like this example, where you have heap pages upon heap pages allocated but only one slot on each of those heap pages is actually being used.
00:09:45.120 This is where memory compaction comes in. Using memory compaction, you can compact all of the used memory to the start of the heap, which frees up space at the end of the heap so that that memory can be reallocated. Imagine if you have one heap page’s worth of memory scattered across multiple heap pages, and then you condense that into one heap page. This frees up those heap pages at the end of the heap to be reused, which is fantastic!
00:10:35.920 So this is what gc.compact does. GC.compact is a method introduced in Ruby 2.7 that implements memory compaction. This method has been called manual memory compaction because, unlike garbage collection which just runs by itself in the background, you actually have to call this method yourself.
00:11:01.680 To recap what I’ve learned so far in this process: Ruby objects take up space in memory, specifically in the Ruby heap. Garbage collection automatically deletes objects that are no longer being used to free up memory; however, free memory can’t always be reused if it’s not all contiguous slots at the end of the heap. Memory compaction is a process by which in-use memory is grouped into contiguous chunks at the start of the heap, freeing up the end of the heap to be reallocated for use later in the program.
00:11:48.720 All of this is well and good, but it did not answer my question: why was gc.compact causing a segfault? I can confirm this is what I look like when I’m angry—you do not want to see it; it is not good. Luckily, a different user came to my rescue and left a comment on the ticket along the lines of, 'Hey, have you read this page about using gc.compact with C extensions?' Needless to say, I had not read that page, and so I did.
00:12:10.880 As you remember, I mentioned at the beginning of the talk that the BSON gem has a C extension. It turns out that this is really important when it comes to compatibility with memory compaction. For some background, a C extension is C code that's integrated into a Ruby gem. You can use a C extension to include a pre-built C library into your Ruby gem if you don’t want to rebuild some functionality that’s already out there. Or you could implement some feature from your Ruby gem in C for increased performance or to take advantage of some feature of the C language that you really want to use.
00:13:09.760 C extensions are cool because you can create and manipulate Ruby objects in C code using the C extension API provided by the Ruby maintainers. This is an example of creating a string called 'greeting' with the value 'hello' in both Ruby and C. The first line is plain Ruby, while the second line is the C equivalent where you start by saying the type of the variable you're creating; in this case, it’s 'value'. The name of the variable is 'greeting', and then you use a Ruby C extension method 'rb_string_new2' that creates a new Ruby string and you pass it the value that you want the string to have, which is 'hello'.
00:14:18.720 So that’s how you create a new string from a C extension. As I said, we’re going to use a variable type called 'value'. A 'value' is a pointer to a Ruby object. Remember earlier how I said that every slot in the heap has an address? Well, this is where that point comes back. A pointer is a variable that contains the address to some slot in the heap. So when you create the 'greeting' variable, what you're doing is essentially creating a variable that is storing the address of the heap.
00:15:06.720 It’s saying, 'I know that there is a string at the heap address 2.' Then, anytime you use the 'greeting' variable in your C code, you're essentially saying, 'Hey, go get me that thing that lives at heap address 2.' This is how a lot of C code and code in other languages that use pointer references work. You could even use this same logic to declare new Ruby types, which is why C extensions are so cool. This is an example that I took from a great article by Josh Haberman. I will make sure to post these slides in the Slack channel at the end so you can click through these links and check out this article.
00:16:03.440 In this article, we create a new type called a Ruby pair. A Ruby pair has two references to two other Ruby objects, which we call 'first' and 'second'. Those references are values, which means they are pointers to two locations on the heap. You could declare a Ruby pair with the pointer to the first object at heap slot one and the pointer to the second object at heap slot four. This is awesome, right? It’s really flexible, and you can extend the Ruby language in a powerful way; however, it creates some complications when it comes to memory compaction.
00:16:52.320 If you remember, memory compaction has the effect of moving things around in the heap. If you think a little more closely, this could be a major problem when you’re writing code with pointers, where you’re basically just referencing addresses in the heap and hoping they contain the right objects. Let’s go back to our Ruby pair example. If you remember, a Ruby pair contains two objects called 'first' and 'second', which are at positions one and five in the heap right now. Let’s say that we run memory compaction and our objects move around in the heap. Suddenly, the reference in our Ruby pair is wrong.
00:17:38.640 The object that we called 'second' used to be at position five, now it’s at position four. Whenever the Ruby pair goes to reference that object, one of two things can happen. The best-case scenario is that your program crashes either because there's nothing in the heap at that location or whatever is in the heap at that location is so incompatible with whatever you’re trying to do that your program segfaults and stops running. The worst-case scenario is if you have another object in that location in the heap, and it doesn’t crash your program, and your program keeps running with this silent bug that eventually returns wrong data to your users or otherwise messes up your gem.
00:18:23.439 This is the absolute worst-case scenario. This could be really, really bad. I realized this must be what was happening in my BSON gem. Some object somewhere was moving around during memory compaction, and future references to it were being broken because it wasn’t where the program expected it to be in memory, causing a segmentation fault.
00:19:02.880 So how can I figure out what object is causing this problem? To do this, I first had to learn more about how Ruby manages memory from the perspective of a C extension. The major thing you need to know is that when you create a C extension, there are two scenarios where you have to tell the Ruby garbage collector how to properly garbage collect your objects. The first scenario is when you create a new Ruby type, like our Ruby pair example.
00:19:51.680 So, in our Ruby pair, references two objects called 'first' and 'second'. At the C level, the C code understands these references, so it knows that a Ruby pair references these two objects. However, at the Ruby level, the Ruby garbage collector does not know about these references, and that can cause some very strange behavior. Let’s say the Ruby garbage collector is running garbage collection and is about to mark the Ruby pair object because it’s still in use. If it doesn’t know about the references to the 'first' and 'second' objects from that Ruby pair, it could sweep them thinking they’re no longer being used by your program.
00:20:56.080 Then, when you go to reference them, there’s nothing left in memory, or like I said earlier, there’s something else there that could cause a silent bug and really mess up your program. This is the kind of thing that could cause a segmentation fault. Luckily, the Ruby maintainers anticipated this because they're very smart and they provided a way to prevent this from happening called mark callbacks.
00:21:38.880 A marked callback is a method that gets called every single time your new Ruby type gets marked during garbage collection. It tells the Ruby garbage collector how to mark related objects in order to not break your program. This is an example of a mark callback for our Ruby pair. You’ll see that the last two lines of the method call the method 'rb_gc_mark' on the first reference and then the second reference. This method is important and will come back later.
00:22:30.720 Once we’ve implemented this mark callback, Ruby knows about the relationship between the Ruby pair and the 'first' and 'second' objects. So after it marks the Ruby pair object, it knows how to mark its two references as well, using the 'rb_gc_mark' method. It's a great day to be a Ruby programmer! Another scenario where you want to make sure you are properly marking an object in a C extension is if you have a long-lived object. What I mean by that is, let's say you create an object in your C extension initializer—this is a method that gets called when your C extension is loaded for the first time.
00:23:37.440 If Ruby doesn’t know about this, it could garbage collect this object, and then when you try to reference it later, your program breaks. This is why you want to use the 'rb_gc_register_mark_object' method to mark a long-lived variable as a root object during garbage collection. This way, Ruby knows not to garbage collect it, and once again, it’s a great day to be a Ruby programmer because your program is not going to break.
00:24:45.760 As it turns out, how you mark your objects in a C extension is really important to memory compaction. This explains why. Let’s look at how gc.compact is actually implemented in Ruby 2.7. There are four steps. First is garbage collection, and I’m going to give you a hint that something happens here that helps prevent the kind of issue we saw earlier where memory moves around and you end up with a segmentation fault.
00:25:34.760 The second step is to move objects and collect them at the start of the heap. The third step is to update any references to those moved objects, so you’re not referencing objects that are in a different location. Finally, the last step involves another round of garbage collection to clean up anything that’s left behind. As I mentioned, something special happens during the first round of garbage collection, and that’s because the Ruby maintainers did something important in Ruby 2.7 to make it compatible with existing C extensions.
00:26:52.640 They realized that objects would move around in memory and break C extensions, so it was important to safeguard against that happening. To handle this, they changed the behavior of the existing C extension API so that marking an object during garbage collection also pins it in memory. You can think of pinning as pinning a piece of paper onto a board; you know the paper is not going anywhere. An object that’s pinned in memory will not go anywhere, and this is what happens when you mark an object in Ruby 2.7.
00:27:57.440 If you remember our friend rb_gc_mark, this is the method that was modified to also pin objects in addition to marking them. Let’s go through our garbage collection process with the addition of pinning. If we take an example of a Ruby pair with two references—'first' and 'second'—the Ruby pair object will first be marked and also pinned in memory. Then it will mark its first reference and pin that in memory as well, and do the same for its second reference.
00:28:30.880 When all is said and done, those pinned objects in memory are not going to move, so when the Ruby pair goes to reference them in the future, they will still be there, and memory compaction will not have broken your program.
00:29:05.600 Going back to the bigger picture, the implementation of gc.compact starts with a first round of garbage collection at the beginning of the compaction process, and in that process, you are pinning the objects in memory that are still in use so that they do not move around and they do not break your C extensions. This gave me all the information I needed to figure out what was going on in my gem.
00:29:44.799 Ruby garbage collection pins all marked objects in Ruby 2.7. We know that any object that is pinned does not move around in memory. Some object in my gem is moving around in memory, which means that it must not be getting pinned and thus that it must not be marked. As I mentioned earlier, there are two cases in a C extension where you want to ensure you mark an object: if an object is long-lived or if an object is a reference.
00:30:31.919 from a custom Ruby type. As it turns out, a long-lived object that was not properly marked was the culprit in the BSON gem. If we look in the BSON gem in the BSON C extension initializer, there is a line that declares a variable called rb_bson_registry, which is just a module defined in the Ruby Jab. We never marked this variable or this object and historically, that was okay because I don’t think that modules get garbage collected.
00:31:30.000 But once memory compaction was introduced, it started to be a problem: even if it’s not being marked, it isn’t going to stay in the same place, and it’s going to cause segfaults. So, in order to fix it, I used 'rb_gc_register_mark_object' to mark it as a root object so that it is pinned in memory and cannot move around, which fixed the gem.
00:32:09.120 Everything I’ve covered so far has been about maintaining compatibility with an older C extension as you transition to Ruby 2.7, but this doesn’t take full advantage of memory compaction. If you’re pinning objects in memory, it means you cannot move them to the start of the heap, resulting in objects hanging around in your heap pages. Those heap pages can’t be reused.
00:32:40.800 If you’re creating a gem with a C extension and plan to use it just with Ruby 2.7 and newer versions of Ruby, you can take some different steps to allow you to take full advantage of memory compaction. The first step is to use 'rb_gc_mark_no_pin'. I believe this method name has changed to 'rb_gc_mark_movable' since Ruby 2.7.
00:33:19.680 As we said earlier, 'rb_gc_mark' was modified to pin objects in memory, so we’re going to use a different version of this method that only marks them and does not pin them. Then, we will implement a new callback, much like our mark callback, that gets called every single time an object is marked. We’re going to implement a compaction callback that gets run every time an object is going to be compacted.
00:33:45.600 What that means is you want to make sure you can update the references in your object so you know what their new location is in case they’ve moved. To do that, you'll use the 'rb_gc_new_location' method in a very similar way to how you would have used 'rb_gc_mark' in your mark callback.
00:34:26.320 I learned two main things from this process. The first is that systems design and understanding system design can be just as important as coding. In this case, I was a perfectly adequate coder, but because I didn’t understand how Ruby memory management was implemented and the high-level design thinking behind that, I was having a hard time fixing this bug.
00:35:27.040 The second thing I learned is that knowledge sharing is absolutely crucial. If people hadn't written the blog posts or made the videos they did, I wouldn’t have been able to read them, nor would I have been able to fix this bug. You never know when the knowledge you put into the world will get someone else out of a big jam. So if you're thinking about writing a blog post or making an informative video and have any doubts about it, I say always do it. The more knowledge we share with each other, the more we empower one another, and the better developers we all become.
00:36:16.320 These are my sources. Like I said earlier, I will post these slides in the Slack channel so that you can click through these links and read these awesome articles. To summarize, Ruby 2.7 includes a manual memory compaction method to maintain compatibility with older C extensions. Make sure you're properly marking long-lived objects and their references from any custom Ruby types that you create.
00:36:54.560 To implement a new C extension, you can use the 'rb_gc_mark_no_pin' method and make sure to add compaction callbacks to your custom Ruby types. Lastly, nothing is impossible if you believe in yourself and your thousand browser tabs! Thank you very much.
00:37:30.560 There might be a couple of minutes for Q&A right now, but I will be taking questions in the Slack channel, and you can always reach me on Twitter if you would like. Awesome! So, Rose asks, 'Do pins need to be removed or managed later on?'
00:37:59.360 I believe that they are automatically handled in the garbage collection process. They are removed at some point, so you, as a user, don’t need to do anything, and the Ruby maintainers have built that into the garbage collection process.
00:38:14.560 Alright, we have time for one more question! Awesome! Angela asks, 'How long did this entire investigation take?' I don’t know if I’m embarrassed by this, but it definitely took a couple of weeks. I think I wasted a lot of time flailing about uselessly, which is just not a helpful way to debug anything.
00:38:41.760 I was really down on myself, thinking, 'I don't know how to fix this bug. I don't know anything about C extensions. I don’t know anything about memory management.' Instead, I should have stepped back and said, 'Okay, I don't know anything about memory management; let's learn about that.' Once I got to that step, I was finally able to make some headway. Then it took maybe a week from that point of just reading and trying to piece everything together.
00:39:08.000 Awesome! I think that's all we have time for, but please ask your questions in the chat, and I’ll see you later. Thank you so much for coming, everyone!
00:39:35.360 Alright, thank you, Emily! So, we are now moving into our next big break for the day. Coffee service is available right down the hallway for each and every one of you.
00:39:59.280 Service-relatedly, we’ve actually done a really good job with catering services this year. Every single person has exactly what’s in their own fridge, so thank your local organizer! Let me tell you, thank your local organizer!
00:40:29.680 Alright, with that being said, we are heading into another break. Talks are going to resume here at 1:50 Central Time, which from about now is... if I got my time right, 1:50 Central Time. I am not doing the math today, but with that being said, check out Slack, participate in the communities, head over to Emily's talk channel if you’ve got any additional questions—she’s going to be there to answer those. See you soon!
Explore all talks recorded at RubyConf 2020
+17