00:00:23.720
I'm really excited to be here today. I have a 40-minute Ignite-style presentation, which was the slide I planned to put up, but my tethering is too slow. So instead of putting the joke in slide form, just imagine that there was an Ignite slide here.
00:00:37.499
I'm wearing this hamburger hat today because I heard I needed to dress up in my formal attire. However, I’m going to take this off now because I feel it might be bad for the recording. I don't want my face to get all dark. So, hello everybody! Hamburger hat off, and let's get started!
00:00:58.289
Today, we're going to discuss a compacting garbage collector for MRI in Ruby 2.7. I apologize if this presentation feels a bit technical, as it is the end of the third day of the conference. My name is Aaron Patterson, but you can also call me Tenderlove.
00:01:20.600
If you don’t recognize me, let me introduce myself properly. I have two cats. One is named Gorbachev Puff-Puff Thunderhorse, the most famous of my cats. I have stickers of him—please come say hello if you want one! My other cat’s name is SeaTac Airport Facebook YouTube Instagram.
00:01:43.470
I have to admit, I'm extremely nervous in front of all of you today. I have been a committer on the Rails Core team since 2009, and my first commit was in 2009. I have been on the Ruby core team since 2011, and my very first commit to the Ruby core was ten years ago in October. So, this is my ten-year anniversary on the Ruby core team!
00:02:21.640
This presentation does not mean that I know everything about this topic, so please lower your expectations a little, then lower them just a bit more. I got onto the Ruby core team because I was able to crash Ruby many times, which led to segmentation faults and produced Ruby core files.
00:02:40.840
Today, I will be talking about compacting GC for MRI. However, before that, I want to touch on some controversial features in Ruby 2.7 that Matz mentioned. The first one is the pipeline operator, which, unfortunately, got removed. But I want to discuss it a little anyway. This operator caused quite a bit of controversy, as many people opposed this implementation.
00:03:10.330
In my own work, I proposed changing the pipeline operator into a smile operator because Ruby is designed for developer happiness, as Matz says. Unfortunately, that pull request was rejected.
00:03:40.840
Another feature I want to bring up is typed Ruby. Many of you seem excited about getting types integrated into Ruby, but I must admit, I thought we already had types in Ruby. I understand that some might copy and paste from Stack Overflow, but typically we do type things.
00:04:01.780
Now let’s discuss the real topic of today’s presentation: manual heap compaction for MRI. This work took me about three years to complete, and I must confess that I've never worked on a project so challenging in my life.
00:04:40.010
However, during this journey, I experienced a level of difficulty that I did not expect. I usually don't come up with many new ideas, but I finally had one for this work. Chris Seaton from the Truffle team called it a novel solution for not moving objects' references from C.
00:05:00.850
Before we dive into more technical aspects, I want to express my gratitude to a few people, specifically Cochise and Alison McMillan. When I first considered implementing compaction in MRI, I thought it was impossible. I shared my concerns with Cochise three years ago, and he encouraged me to rethink my stance.
00:05:44.659
Additionally, when I was feeling stalled and somewhat discouraged, Alison reached out to me and offered to work with me on this project. Every week, we would pair up, making the process more manageable for me.
00:06:10.300
Today, I will cover several key topics: Ruby's heap, the compaction algorithm, implementation details, results, and, if we have time, some debugging techniques. Not many people discuss debugging garbage collectors, but I think it’s an essential topic.
00:06:40.670
So, what is compaction? Compaction is essentially rearranging allocated and free memory. Imagine memory with allocated and free spaces interspersed. We can rearrange it so that all free spaces are contiguously combined, similar to defragmenting a hard drive, but this is done for data in memory instead.
00:07:15.650
You may wonder why we would want to do this. One benefit is efficient memory usage. For example, when you attempt to allocate a new chunk of memory, it might not fit due to fragmentation. However, if we compact the heap, we would have enough contiguous space to allocate the new chunk. This makes memory usage more efficient.
00:07:53.360
Another reason to compact is to ensure better utilization of CPU caches. When reading memory, data has to go through the CPU, and the CPU will read chunks of memory at a time. Reading from RAM is relatively slow, and the CPU has to access RAM, which involves cache.
00:08:40.080
Good locality refers to keeping related data close together, increasing the chance that all needed data will be processed from cache without needing to hop over to RAM. Another compelling reason for compaction is copy-on-write friendliness. For example, Unicorn, which we use as our web server at work, saves memory with a copy-on-write technique.
00:09:38.309
When a parent process creates child processes, unless modified, the memory stays linked to the parent process. However, when data is written to this shared memory, that connection gets broken, and the operating system copies the necessary data, potentially leading to memory waste by copying unnecessary data.
00:10:02.910
By eliminating fragmentation, the efficiency of memory allocation and overall memory usage is improved. Essentially, compaction solves the problem of fragmentation. In Ruby, we deal with two different kinds of heaps: the malloc heap and Ruby’s object heap.
00:10:52.030
Normally, we allocate memory from the system using malloc. However, when we allocate objects in Ruby using commands like `Object.new`, we’re drawing from Ruby’s object heap, which is where fragmentation can occur. Malloc can also lead to fragmentation.
00:11:39.700
To tackle this issue, we can use a new 2-finger compaction algorithm—a simple yet effective solution. The algorithm involves two pointers: a free pointer that moves to the right until it finds a free slot, and a scan pointer that moves to the left until it finds a filled slot.
00:12:08.000
When the free pointer finds a filled slot and the scan pointer finds a free slot, they swap the two locations and leave a forwarding address in the previous location of the moved object.
00:12:44.279
This process continues until the two pointers meet. Once they do, we update the references of each object in the heap, checking if they point to a forwarding address and modifying them accordingly. After all references are updated, we convert moved addresses into free slots and have successfully compacted the heap.
00:13:32.040
If I were to provide a code example, keep in mind that this is a Ruby conference, not a C conference. Here's how the algorithm would look: we loop until the two pointers meet, swapping locations and updating the references as necessary.
00:14:16.520
However, implementing the reference updating portion is where things become complicated as it involves determining how references are stored for various objects.
00:14:55.200
Efficiently updating these references across all Ruby data types can be a complex task, so it’s one of the more challenging aspects to implement. In fact, probably around 80% of the patch was just dedicated to reference updates.
00:15:43.660
Thus, this was the most complicated part of the patch; we must update references while also considering C extensions. This leads to needing a scheme that manages reference updates effectively.
00:16:08.790
In short, any C extension created needs to provide a marking function, which ensures that references are maintained during garbage collection. Every time something is marked, we make it impossible for it to be moved.
00:16:36.500
The idea is to have a separate bit table known as pin bits that mark which objects cannot be moved. When the scan pointer identifies a pinned object, it skips it during compaction. This way, the algorithm maintains references and prevents any moving of critical objects.
00:17:43.480
During the updating phase of references, if issues occur with C extensions, marking must be done leveraging the BGC mark function. Using this function ensures that the references will stay put and not lead to crashes.
00:18:01.320
I found this marking to be the most critical aspect of the project, as certain behaviors in libraries like the JSON parser needed to be considered to ensure stable reference management. We want to ensure pure Ruby code does not inadvertently cause crashes.
00:18:47.460
Object ID handling is another interesting issue to consider when implementing compaction. In older Ruby versions, the Object ID was based on the memory address of the object. However, this introduces complications when objects move during compaction.
00:19:10.740
So instead of replicating object IDs based on memory addresses, we shifted toward providing unique identifiers that do not rely on memory locations. This change drastically improved our management of object IDs and safety.
00:19:32.480
With the implementation of Ruby 2.7, Object IDs become monotonic; every call to Object ID ensures that it provides a unique, continuously increasing number, eliminating previous issues and enhancing reliability.
00:20:29.000
In terms of practical performance, implementing the compaction algorithm has resulted in reducing the heap size by around 10%, providing efficiency gains in memory allocation and utilization.
00:21:27.110
Looking ahead, my future plans for Ruby include addressing performance improvements in the compaction process, possibly transitioning to automatic compaction in Ruby 3.0. This means that users may not need to call for a manual compact at all.
00:22:14.190
The proposal includes adopting sliding compaction techniques for better memory locality and implementing variable width allocation to enhance memory efficiency.
00:23:00.960
In summary, I hope everyone recognizes the effort the Ruby Core team puts into improving Ruby. We are excited about the future of Ruby and what it has to offer! Thank you all for attending!