Hemant Kumar

Under the Hood of Ruby's Generational Garbage Collector

Under the Hood of Ruby's Generational Garbage Collector

by Hemant Kumar

In the talk titled 'Under the Hood of Ruby's Generational Garbage Collector,' Hemant Kumar explores the advancements in Ruby's memory management, specifically focusing on the generational garbage collection (GC) introduced in Ruby 2.1. The presentation covers the evolution of GC, its current functioning, and the implications for both developers and users. Key points discussed include:

  • Historical Context: The transition from the traditional mark-and-sweep approach used in Ruby 1.8 and 1.9, which required traversing the entire heap, to the more efficient generational GC in Ruby 2.1.
  • Generational Garbage Collection: The concept that most objects in Ruby die young, surviving only a few collections, which leads to faster garbage collection as minor GC skips older objects. Issues arise when references in old generation objects point to new generation objects, risking accidental deletion during GC.
  • Solution with Remembered Sets: To prevent unintended deletions, Ruby uses a remembered set that includes modified old objects, thus ensuring proper garbage collection during minor collections.
  • Challenges with C Extensions: The danger posed by C extensions that can bypass Ruby's GC awareness, leading to potential memory issues if not managed properly.
  • Write Barriers Implementation: To manage the interaction between Ruby objects and C extensions, write barriers classify objects into 'shady' and 'sunny' to facilitate correct GC behavior.
  • ArbiKit Introduction: The presentation introduces ArbiKit, a low-overhead Ruby profiler that allows developers to monitor memory utilization easily in production, offering insights into object allocation and garbage collection performance.
  • GC Tuning: Hemant discusses how to tune the garbage collector using environment variables, demonstrating how changes to heap slots and growth factors affect application performance.
  • Live Demo: A live demo showcases the ArbiKit profiler's capabilities, including real-time monitoring of GC stats and memory size, reinforcing the importance of insightful profiling in Rails applications.

In conclusion, Kumar emphasizes the importance of using Ruby's provided APIs for custom data structures to avoid complications with memory management. He encourages the audience to engage with the tools available for profiling and refining their applications effectively. The overall goal of the talk is to assist developers in navigating Ruby's garbage collection landscape and utilize available profiling tools for better performance.

00:00:27.439 All right, so I'm going to talk about the curious case of Ruby's memory. It's two talks rolled into one because that's what I wanted to do. Who here has used Ruby at some point in production? Ruby 1.8, perhaps? Ah, so you remember the pain, right?
00:00:57.039 Ruby has garbage collection, and it always has since 1.8. Up to 1.9, it used to work with a mark-and-sweep approach. The garbage collection had two phases: the marking phase and the sweeping phase. In the marking phase, all the objects that have references are marked, and whatever is not marked is swept away as deleted. This was pretty good; it was efficient, simple, and easy to reason about.
00:01:29.119 Except that if you're running a Rails application that takes up 400 MB and you have probably a million objects on your heap, traversing this entire heap each time is very costly. That's why Ruby 2.1, also known as KO1, brought generational garbage collection. So, three cheers for generational GC!
00:01:49.920 Inside the generational GC, the concept is that most objects die young. They barely survive one or two garbage collections. If these objects survive a predefined number of garbage collections, they are moved to a generation called the old generation. When more memory is required, a minor GC is performed, and these old generation objects are not tested at all. As you can imagine, garbage collection becomes a lot faster.
00:02:24.480 However, let's say you have a hash called active users, which stores a reference to a user object that is considered new. In this scenario, an object in the old generation refers to an object in the new generation. The problem arises during a minor GC because Ruby does not check the old generation for references at all. This means your user object might be deleted even though someone is holding a reference to it—this could lead to a really bad bug.
00:02:53.599 Generational garbage collectors typically implement a trick called the remembered set. What they do is, whenever an old object is modified, it is placed into this different set called the remembered set. During a minor GC, not only is the young generation traversed, but the remembered set is traversed as well. Now you can see that the user object will not be garbage collected erroneously, which is pretty good and is what the JVM and other VMs do.
00:03:33.599 The issue in Ruby is that if you're running a Rails app, it probably has at least 10 or 20 C extensions loaded. These C extensions can behave unpredictably. For example, they might take a Ruby pointer and add something to an array in C, and Ruby won't be aware that this old-generation object has just added a new object. If an object in the old generation holds a reference to an object in the new generation, it could be deleted during a minor GC. This presents a significant problem.
00:04:14.000 Unlike the JVM, Ruby does not fully control the heap space of a process. To work around this problem, KO1 introduced the concept of write barriers. What this does is categorize objects into two categories: shady objects and sunny objects. When you touch an object and modify it in C using any of the macros, that object is marked as a shady object. The majority of objects in Ruby are typically shady, while only a few, such as hashes and array proxies, are sunny.
00:05:04.639 On a minor GC, not only is the young generation traversed, but all the shady objects and all the remembered set objects are traversed as well. The downside is that this somewhat negates the benefits of generational GC because it means traversing more objects, even during a minor GC. However, the advantage is that it's 100% compatible with almost all the C extensions out there, meaning objects not accessed from C and only modified from Ruby will remain untouched.
00:06:00.320 So, that's how Ruby's generational GC works in version 2.1. The next major version, 2.2, introduces something called incremental GC, meaning there will be more than two generations, and trace points will be hooked. Additionally, a symbol GC is being introduced, which will further reduce the heap size.
00:06:39.840 A key takeaway from this talk is to avoid using low-level access for Ruby data structures, like raw array pointers. Ruby actually provides APIs (functions) you can use if you're writing a custom data structure and want to expose it to Ruby. You can ensure these structures are write-protected.
00:07:12.319 Now, I’m going to talk about how to tune the garbage collector via environment variables. But before that, let’s take a trip down memory lane and discuss memory profiling of Ruby applications. It was horrible back in the day! For instance, Twitter had to use DTrace, and experts had to be brought in. Typically, profiling would look like a graphics dump, leaving you to guess what was going on.
00:07:45.920 Thanks to Ruby 2.1, which introduced trace points and great instrumentation support, we decided to create a Ruby profiler that's as easy to use as tools like Yourkit. It can run in production with very low overhead without causing issues for your application. I'm excited to introduce ArbiKit for the first time here at Rocky Mountain. It's up on GitHub; we just open-sourced it a few days ago.
00:08:34.160 ArbiKit is a low-overhead Ruby profiler built for MRI, written almost entirely in C. It comprises two parts: a desktop application and a Ruby gem. The gem doesn't do much other than gather data, such as which objects have been created or deleted, and where references are held. It sends all this data via ZeroMQ to the desktop client.
00:09:19.200 One of the beautiful things about ZeroMQ is that it has its own IO threads. This means that when we send the data, we won't block the Ruby thread. I can send a million messages without any issues. The beauty of the profiler is that it can be used in production. Simply add the ArbiKit gem to your Rails app, require it in your boot.rb file, and it will listen for incoming desktop client connections.
00:10:10.639 The desktop app is cross-platform, written in Qt C++, and allows us to render parts of the application using web technologies like D3.js and use SQLite for storage. All the heavy lifting is performed client-side while we benchmark various GUI libraries to ensure we don't just create an OS X-only application, which wouldn't align with the Ruby culture.
00:11:03.920 To ensure the ArbiKit memory profiler itself doesn't leak, we utilize tools like Valgrind and a simple command-line program available on OS X for tracking memory leaks. We are building CPU profiling capabilities as well, but our current focus lies on memory profiling.
00:11:44.680 Let’s get to a demo! This is the client interface. I will connect it to a running Ruby application. One great feature of ZeroMQ is that the server does not need to be running for clients to start connecting. This is already profiling live data. You can see GC stats, object count growth, memory size, and heap size, all real-time. I can trigger a manual garbage collection for better analysis.
00:13:05.440 Profiling a Rails app always requires you to initiate a GC before taking a heap dump. As I take the heap snapshot, you can witness the current status of objects in memory. It's critical to identify potential memory leaks, and I can compare heap snapshots to highlight which objects were present in one but not the other.
00:14:13.760 Next, let’s discuss GC tuning visualizations. Once we have all the data coming in via the ZeroMQ socket, there is incredible potential to visualize how this affects application performance. The first point to consider is the number of heap slots that Ruby initializes with. The default value is typically around 60,000 or 600,000.
00:15:03.840 You don't want the garbage collector to trigger too often since that could detrimentally affect your application's performance—especially during unit tests. All I need to do is adjust the initialized slots to higher values to observe differences in performance.
00:15:53.920 As I change the initialized slots, you can see a notable reduction in the frequency of GC triggers. Next, let's examine the growth factor—the rate at which Ruby's heap grows when more memory is needed. The default growth factor is set at 1.8, and we can adjust that to see how it influences performance.
00:16:38.960 When Ruby needs more memory, it utilizes an environment variable to track how much it should allocate. This adjustment will have a clear impact on our application performance. Observing this in real-time allows us to draw valuable insights.
00:17:34.480 Importantly, we also consider the Ruby GC analog limit, which controls how frequently the GC runs based on memory allocation. Demonstrating these variables together helps us understand their collective influence on GC frequency.
00:18:47.920 The initial heap slots and growth factor are not the only parameters that impact garbage collection. The GC analog limit is crucial; if the application allocates more than this limit—defaulting at 16 MB—then GC will immediately trigger. As I adjust this limit to higher values, the frequency of garbage collections significantly decreases.
00:20:45.760 So, that's all I have for today. The source code is available on GitHub, and feel free to reach out if you have questions. My name is Hemant Kumar, and I'm happy to share knowledge.
00:21:10.159 I have time for one or two small questions, so feel free to ask. Regarding overhead when running ArbiKit in production, it can be run in two ways—one way is to start the server without installing trace point hooks, resulting in zero performance penalty.
00:22:21.840 When you connect via the desktop client and manually trigger profiling, there might be some performance cost because Ruby has to run code for each allocated object. Still, it is minimal since all we're doing is capturing data and sending it back in intervals.
00:23:14.360 I have been working with clients on Rails memory issues and was inspired by the instrumentation support introduced in 2.1 to create something accessible for all developers. I appreciate your attention, and I’ll be posting the slides online for further reference.
00:24:00.640 Thank you once again for your time!
00:24:30.760 Any additional questions?