Performance Optimization

Summarized using AI

Automatic GC Compaction in MRI

Aaron Patterson • December 18, 2020 • Online

This presentation by Aaron Patterson, known online as Tenderlove, at RubyConf 2020 focuses on the topic of automatic GC (Garbage Collection) compaction in Ruby, particularly as it will be implemented in Ruby 3.0. The primary goal is to automate the memory compaction process, which was introduced in Ruby 2.7 through manual calls to GC.compact, allowing developers to save memory by eliminating fragmentation in the application’s heap.

Key Points Discussed:

- Introduction to GC Compaction: Aaron explains how Ruby 2.7's feature, GC.compact, operates similarly to the defrag utility in MS-DOS by reorganizing memory to reduce fragmentation.

- Compaction Process: The process involves moving live objects to fill empty slots in the heap and requires updating references to moved objects afterward.

- Goals for Ruby 3.0: The focus is on three main aims:

- Memory Savings: Achieving a narrower heap representation without changing the number of objects, leading to lower memory usage.

- Automation: Developers should not need to manually invoke compaction; it should happen seamlessly in the background.

- Concurrency: Enabling the compaction process to run concurrently with other processes to enhance speed and reduce program pauses.

- Challenges: The complexity arises from needing to ensure that object references remain valid during the compaction while the garbage collector must work in a strict order.

- Proposed Solutions: Aaron shares his plan for concurrent execution of the sweeping and moving phases of the garbage collection process. This includes the introduction of read barriers and considerations like using the mprotect call to handle memory access violations after compaction.

- Real-World Testing: Aaron discusses the rigorous testing and benchmarks conducted which illustrated improved memory utilization with new patches and their implications for overall program performance.

Conclusion: The implementation of automatic GC compaction in Ruby 3.0 has notable potential to improve memory management, reduce the need for manual intervention by developers, and enhance application performance. Aaron encourages the audience to explore these new enhancements in the upcoming Ruby 3.0 release, emphasizing the importance of understanding memory management mechanisms to foster improvements in programming practices.

Automatic GC Compaction in MRI
Aaron Patterson • December 18, 2020 • Online

Ruby 2.7 introduced GC compaction, a way to reduce memory memory usage in your application. Unfortunately the compaction process is completely manual, requiring developers to determine the best time to compact their heap. This talk is about making the compaction process automatic and the challenges in doing so. We'll discuss various implementation techniques including read barriers and running the GC concurrently with the main program. This talk will have read barriers, but it won't have watch barriers, so come watch the talk!

Aaron Patterson
Aaron is a full time Ruby developer working at Shopify, whose interests include: Keyboards, Cats, and Code. Aaron is known online as tenderlove, but he also answers by "Aaron".

RubyConf 2020

00:00:00.160 Why hello! I didn't see you come in. Welcome to RubyConf 2020.
00:00:05.440 Happy Friday! I know that maybe today isn’t actually Friday, but the thing is, it’s always Friday somewhere.
00:00:12.240 So we'll have that going for us, at least. I'm going to be talking to you about automatic GC compaction in Ruby 3.0.
00:00:19.359 Now, this talk is a 45-minute presentation, but I've done my best to compact it into a 30-minute talk.
00:00:24.880 I really wanted to call this presentation 'Reduce Memory by Using More,' and I will discuss that as it is an interesting phenomenon.
00:00:30.160 However, the main topic of this presentation is about automatic compaction in Ruby 3.0. So that will be the bulk of what I discuss today.
00:00:41.440 My name is Aaron Patterson, and I'm also known online as Tenderlove. This year I started a new job.
00:00:47.760 I work for a small startup company called Shopify, where I am a full-time Ruby and Rails developer. I mean that literally, as I'm working on both Ruby and Rails itself.
00:01:01.840 I feel really lucky to have this position, and I want to share what I’ve been developing in my job so far, which is mainly the topic of my talk today.
00:01:14.560 I’m on the Ruby and Rails core team and I'm also the inventor of the analog terminal bell, which if we have time at the end, I will tell you about.
00:01:26.479 Now let's talk about Ruby 2.7. In Ruby 2.7, I introduced a feature called GC.compact, which allows us to save memory by filling in spaces in our heap and eliminating fragmentation.
00:01:34.640 We can compare this to an old MS-DOS utility called defrag, but instead of cleaning up your hard drive, it cleans up your memory. Instead of rearranging fragmentation on your physical media, it works in memory.
00:01:46.399 This program demonstrates the effectiveness of compaction. Essentially, it loads a Rails application, runs GC to remove any garbage in memory, and then compacts the heap to remove any spaces. Using this approach, we can analyze the effectiveness of compaction.
00:02:34.640 Here’s a graph showing the memory layout of your application before compaction. The green and black dots represent objects, while the white dots indicate free space. As we can see, this is a fairly fragmented heap.
00:02:41.680 Now, here’s the layout of the heap after compaction. The green and black dots are regular objects, while the red dots are objects that couldn't be moved. You can see in this graph that the objects are much closer together, indicating less memory consumption.
00:02:54.080 Before we dive into my current work for Ruby 3.0, I want to cover the existing algorithm for compaction, as it will help us understand the new implementation.
00:03:11.680 The compaction process has two steps: first, it moves objects around, and then it updates any references. Imagine a heap layout similar to what I describe next.
00:03:25.519 In this example, blue squares represent live Ruby objects, white squares represent empty slots, and yellow squares represent a forwarding address. The compaction process requires two pointers: one at the beginning of the heap, called the free pointer, and the other at the end, known as the scan pointer. The free pointer scans for free slots, while the scan pointer scans for objects to be moved.
00:04:01.280 Initially, the two pointers begin at either end of the heap. The free pointer scans right until it finds a free slot, while the scan pointer scans left until it locates a filled slot. When they meet the necessary conditions, they swap the two and leave a forwarding address.
00:04:26.000 This process continues, with the free pointer moving to find free slots and the scan pointer finding filled slots, leaving behind forwarding addresses after each swap. We repeat this until the two pointers meet, at which point we have finished moving all the objects.
00:04:49.440 Here’s the same layout as before, but this is the state before compaction. Some objects have references to each other. After moving the objects, the references become invalid, pointing at forwarding addresses.
00:05:01.039 Since these references are broken, we need to update them. To do this, we iterate over each object in the heap and update the references accordingly.
00:05:19.199 Once we've updated the references, we can clear out any forwarding addresses, and the compaction process will be complete. This describes the existing compaction algorithm.
00:05:38.560 Now, I want to discuss my goals for compaction. I want it to save memory, which is a significant initial goal.
00:05:52.080 It should also be automatic, so developers shouldn’t need to run GC compaction themselves; it should occur automatically.
00:06:04.639 Additionally, the compaction process should be concurrent, meaning it can occur simultaneously with other processes. The goal is to be fast, so I will discuss each of these goals.
00:06:18.160 First, let’s talk about saving memory. We can see what the heap looks like before and after compaction. The number of objects remains the same, but after compaction, the heap becomes more narrow, indicating lower memory usage.
00:06:36.000 Since objects are packed together more closely, we benefit from improved copy-on-write performance.
00:06:47.360 I want the compactor to be automatic, so developers can focus on their code without worrying about compaction. Unfortunately, as it stands, you have to manually call GC.compact in order to execute it.
00:07:06.000 Typically, we would load our entire Rails application and then call GC.compact before forking any processes in our web server. This requirement means it isn’t automatic.
00:07:24.639 Now let’s explore concurrency. Since we have to call GC.compact ourselves, we already know that the compactor is not concurrent.
00:07:42.160 When objects are moved, the destination slot must be empty. This doesn’t initially seem like a problem, but it creates specific requirements for the GC steps, which need to occur in a precise order.
00:08:03.200 We must perform a full GC to clear the slots before running the compaction phase. This means executing the sweep phase before compaction can commence.
00:08:26.080 Since these GC steps have strict order requirements, we cannot run them concurrently with other operations. Hence, we cannot say that the process is concurrent.
00:08:50.960 Now, the final goal is speed. While the compactor should be fast, we cannot discuss speed until it can run concurrently with other processes.
00:09:01.839 Today, we simply run GC.compact and it takes however long it takes, prior to serving any requests. Thus, speed is not a priority at this stage.
00:09:21.760 In order for the compactor to be automatic and fast, we need to make it concurrent. This is the primary challenge I want to discuss today: how we can achieve concurrency in this process.
00:09:39.360 Ideally, the compactor would run concurrently with Ruby code, which would minimize program pauses. However, achieving this is difficult at first, so I aimed for something easier.
00:09:56.640 Instead of making it concurrent with Ruby code, I decided to make it concurrent with the free step, or sweeping phase, in the garbage collector.
00:10:15.680 For the initial implementation, I aim to adjust the GC steps slightly so that moving objects can happen concurrently with sweeping. Instead of a linear process, we’ll have sweeping and moving occur simultaneously.
00:10:32.079 Now let’s look at the sweep process and how we can integrate object movement into this. I won’t explain the marking step, but after this step, any objects that will remain alive will be marked.
00:10:54.240 The sweep cursor can be combined with our free cursor, which looks very similar. First, we had a scanning cursor at one end of the heap, as in the original compaction algorithm.
00:11:19.760 As we sweep, instead of freeing an object, we know that it’s empty, and we can tell the scan cursor to find an object to move into that swept slot. After swapping, we leave a forwarding address and repeat this process for the whole heap.
00:11:37.839 When the sweep cursor and scan cursor meet, we know that we have moved all the objects in the heap. At this point, we pause the sweep phase while we update references, just like before.
00:11:54.240 This allows us to update the references correctly. After reference updating finishes, we can resume sweeping the heap as usual. One nice aspect of this is that the sweep step will automatically remove any forwarding addresses.
00:12:06.399 With this, compaction is complete. I implemented this algorithm, and it seemed to work fine. The pull request is available online, and everything appeared smooth, but after I committed the patch, the build began to fail, leading to a revert.
00:12:24.960 What was the issue? The algorithm appeared fine, but the build failure was puzzling. I spent a long time diagnosing the problem. In Ruby, we work with classes, and classes are objects.
00:12:50.880 Ruby internally tracks the parent-child hierarchy among classes. If a class gets garbage collected while still being referenced, the references could become invalid after compaction.
00:13:11.680 During the GC sweep phase, the cursor must access the parent to remove itself from the child’s list, but if the class has moved, it points to an incorrect location, resulting in a crash.
00:13:35.600 To address this, I introduced read barriers, which are functions called whenever something is read. For example, if a class attempts to read another class’ memory, the read barrier can redirect it to the right location.
00:13:52.240 Yet, I sought a simpler solution with the use of a system call called 'mprotect.' This function allows protection of a memory region, automatically handling exceptions when memory access violations occur.
00:14:05.680 By applying 'mprotect', if class C attempts to read an address that’s been moved through compaction, an exception happens, leading to a handler that corrects the access and speeds up the process.
00:14:35.760 However, using 'mprotect' introduces several issues, such as the requirement that the protected region must align with 4KB. Ruby allocates memory in those chunk sizes, meeting this requirement.
00:14:56.799 Emerging complications arose, as the page size must also be multiple of 4KB. The page size, however, couldn’t always be regulated in Ruby’s use. With careful management, I expanded the heap size to allow for the exact memory requirements.
00:15:14.240 Upon testing, I discovered the optimal performance of every allocated page through effective reformulation, resulting in the generation of less memory over usage during document generation.
00:15:32.320 Tests and benchmarks indicated better memory utilization when the patches were applied, showing that memory management can be optimized through extensive research and development.
00:15:59.040 As we examine the interactions between compaction and process management, it’s clear by analyzing the statistical data that the maintenance of efficient memory usage leads to overall improved program performance.
00:16:23.520 As I explored the implications of GC compaction, I reinforced that implementing a read barrier within the garbage collector enhances the ability to manage memory effectively.
00:16:49.600 Through extensive research, I hope to further refine this process of automatic compaction within Ruby 3.0, significantly minimizing the need for manual intervention, thus allowing for increased efficiency in applications.
00:17:17.920 I detailed how this implementation could improve overall performance by introducing statistics around compaction and workload handling. It is crucial to continue refining and enhancing these methods to provide a seamless experience to developers.
00:19:03.400 As we conclude this discussion today, I emphasize the importance of understanding the underlying mechanisms of memory management. Ensuring we apply techniques effectively will lead us to robust solutions aiding optimal performance.
00:19:46.560 I implore you all to explore Ruby 3.0 with these new enhancements at your disposal. Together, we’ll refine and develop this vital aspect of programming; incorporating these elements will enrich the Ruby experience.
00:20:36.560 Thank you for your attention today! I’m looking forward to seeing you all in person next year.
Explore all talks recorded at RubyConf 2020
+17