Garbage Collection

Don't @ me! Instance Variable Performance in Ruby

Japanese version: https://youtu.be/iDW93fAp2I8

How do instance variables work? We've all used instance variables in our programs, but how do they actually work? In this presentation we'll look at how instance variables are implemented in Ruby. We'll start with a very strange benchmark, then dive in to Ruby internals to understand why the code behaves the way it does. Once we've figured out this strange behavior, we'll follow up with a patch to increase instance variable performance. Be prepared for a deep dive in to a weird area of Ruby, and remember: don't @ me!

RubyKaigi Takeout 2020

00:00:00.880 Hi, this is Aaron. I originally gave this talk in Japanese, but the organizers wanted it to be in English. So today, I'm going to be interpreting myself, and I've never done that before, so let's see how it goes.
00:00:29.279 Hi everyone. Today, I'm going to be talking about reducing memory usage by using more memory. Most of the talk will be about concurrent compaction development in Ruby. However, I thought this name was better, so I used it. I promise that I'll discuss reducing memory usage as well.
00:01:08.479 On the internet, I am known as Tenderlove. This year, I was really lucky to change jobs, and I started working at a company called Shopify. Currently, I'm working as a full-time Ruby developer at Shopify. In other words, every day, I'm doing Ruby development. Today, I want to tell you what I've been working on so far.
00:01:44.159 So far in Ruby 2.7, I introduced a new functionality called GC compact. This feature allows us to save memory by filling in empty space in our heap and eliminating fragmentation.
00:02:03.920 This is similar to the defrag utility in DOS, where DOS cleans up a hard drive by addressing fragmentation. The compaction functionality works in essentially the same way—it's cleaning up memory instead of space on your hard drive.
00:02:33.280 This program demonstrates the effectiveness of compaction. It loads a Rails application, then runs garbage collection to remove garbage, and compacts the heap to remove empty space. If we make a graph of the heap, it will look something like this.
00:02:51.440 Before the compactor runs, the layout of the graph shows green and black dots representing living objects, while white dots indicate free space. After compaction, the heap looks different: green and black dots are regular objects, and red dots represent objects that could not be moved. In this graph, it’s more narrow, meaning that we are actually consuming less memory.
00:04:01.840 But before we dive into my current work, I want to cover how the existing algorithm operates. First, it moves the objects around and then updates all the references. Let's take a closer look at the compaction process.
00:04:20.880 Imagine a memory layout where the blue squares represent live Ruby objects, the white squares represent empty slots, and the yellow squares represent forwarding addresses. The compaction process uses two pointers: one at the beginning of the heap, called the free pointer, and one at the end, called the scan pointer.
00:04:58.400 The free pointer scans right until it finds a free slot, while the scan pointer moves left until it finds a filled slot. When they meet, the process of moving objects is finished, and all references need to be updated to ensure accurate access.
00:06:19.520 I want to discuss my goals for compaction. First and foremost, I want to save memory, and I want it to be automatic. Developers shouldn't need to know that it's running. It should also be concurrent, meaning it can happen simultaneously with any other processing, and it should be fast.
00:07:29.520 When we look at the heap before and after compaction, we see that the number of objects remains the same, but the compacted heap is narrower, which means it uses less memory. With all the objects packed together, our copy-on-write performance improves, thus meeting our goal of saving memory.
00:08:15.050 However, developers shouldn’t need to know that the GC compact utility is running. Right now, they have to manually call GC compact after loading the Rails application in order to compact the heap and fork a web server. Therefore, this goal is not yet met.
00:09:42.000 Another issue is concurrency; since developers have to manually call GC compact, it is not concurrent. The empty slot problem comes into play here. When objects are moved, the destination slots must be empty, which means that sweeping must occur in a specific order.
00:10:54.240 This empty slot requirement is one reason compaction cannot run concurrently with other processes. Because GC steps require a certain order, we cannot say that it's concurrent. Speed is another relative issue; while the compactor may perform quickly, its inability to run concurrently complicates discussions about speed.
00:12:05.920 Both automatic operation and speed depend on making compaction concurrent. When we discuss concurrency, we refer to reducing the amount of time that the program pauses. However, I wanted to start with something simpler: making the sweep step concurrent.
00:13:01.120 For the first implementation, we'll change the move step to be concurrent with the sweep step, allowing us to combine these processes. After the marking step, any objects that we keep will be marked, while any objects we collect will not.
00:14:06.639 Similar to the previous process, the sweep process examines each object, moving to the next if the object is marked and freeing it if it is not. When the sweep pointer and scan pointer meet, we temporarily stop the sweep step and update all the references, just like in the normal compaction process.
00:16:10.720 After implementing these changes, I ran all the CI tests, and everything appeared to be working fine. However, upon merging the patch, the build started failing, requiring me to revert it.
00:17:11.439 The problem arose because Ruby internally tracks the class hierarchy, which requires that when a class is deleted, its references must also be removed. This complexity can lead to crashes and unexpected behavior during garbage collection.
00:18:40.720 To address this problem, I decided to introduce read barriers—functions that are executed when an object is about to be read. In our case, if class C tries to read class A, the read barrier will trigger and can move A back into place, ensuring that references remain valid.
00:20:40.320 However, implementing read barriers is complicated. They require careful placement in various areas of the code to ensure performance impacts aren’t significant. As a result, I also turned to the mprotect system call to manage memory protections more effectively.
00:24:06.400 Using mprotect, I change the memory protection on the region when an object is moved, preventing access until the move is completed. However, mprotect has its own challenges, requiring memory regions to be aligned and sized correctly.
00:26:06.400 To deal with size and alignment issues, I sent a patch to Ruby to extend heap pages to be aligned correctly, enabling us to utilize mprotect effectively and ensure smooth operation.
00:28:20.720 I ran benchmarks using RDoc documentation generation and observed a reduction in peak memory usage with the patch. The alterations made it clear that we can allocate memory more efficiently by ensuring that all allocated blocks are utilized.
00:29:55.840 Through these optimizations, we've confirmed that with effective management of memory and careful handling of read barriers, we can reduce memory usage overall. Therefore, I titled this talk 'Reducing Memory by Using More' because it encapsulates the essence of the improvements.
00:30:56.240 In summary, while this project may seem slow at the moment, it signifies a substantial step forward in enhancing Ruby's memory handling capabilities, allowing the introduction of read barriers for concurrent execution. Thank you for your attention, and I hope to see you all in person at Ruby 2021!