RailsConf 2021

A day in the life of a Ruby object

A day in the life of a Ruby object

by Jemma Issroff

In the talk titled "A Day in the Life of a Ruby Object" presented by Jemma Issroff at RailsConf 2021, the presenter explores the lifecycle of Ruby objects from creation to garbage collection. The discussion sheds light on how Ruby manages memory through its garbage collection (GC) mechanisms, specifically focusing on the C Ruby implementation.

Key Points Discussed:
- Object Creation and Memory Management:

The talk opens with an impressive experiment where the speaker generates one billion Ruby objects, highlighting how Ruby's garbage collector efficiently handles memory despite the operation exceeding the machine's physical memory. The garbage collector automatically reclaims memory as long as references to the objects are not kept.

  • Understanding Garbage Collection:

    The fundamental concept introduced is the tri-color mark and sweep algorithm used by Ruby's garbage collector. This process involves marking objects in three stages—white, gray, and black—to determine which objects can be safely eliminated.

  • Memory Layout and Object Representation:

    The Ruby heap’s structure as composed of pages and slots is explained. Each page accommodates several slots for Ruby objects, emphasizing how objects are stored and how pointers are addressed dynamically during execution.

  • Generational Garbage Collection:

    The talk delves into generational GC, which is based on the observation that most objects in Ruby applications tend to become obsolete quickly (die young). This understanding allows Ruby to focus its garbage collection procedures more efficiently.

  • Compact Memory Management:

    At the conclusion of the talk, the concept of memory compaction is covered, detailing how Ruby can reorganize its memory space to eliminate fragmentation. This involves moving live objects to create contiguous blocks of free memory, enhancing overall memory efficiency.

Significant Examples:

- The speaker showcases through interactive demonstrations how object lifecycles are managed in an IRB console, including measuring processing time for object creation, and visually represents how memory addresses of objects change when compacted.

Conclusions and Takeaways:

- Understanding Ruby’s garbage collection mechanisms is crucial for developers aiming to write performant Ruby applications.

- The ability to visualize and dissect object handling and memory management can empower developers to minimize memory leaks and enhance app performances.

- Developers are encouraged to actively engage with Ruby's APIs to monitor and control GC behavior and to familiarize themselves with Ruby’s memory management terminology.

00:00:05.600 The other day, I opened up an IRB console and ran a command that demonstrated Ruby's number underscore notation. It's amazing how we can visualize just how big this number is: one billion times `Object.new`. Personally, I am a little impatient, so I was multitasking on my computer while this was running. To my surprise, after about two minutes, it completed without any issues, which is quite impressive if you think about it. Each of these objects I created is a 40-byte object, and there were one billion of them. That's 40 gigabytes of memory. For context, I was running this on a machine with a total memory of only 16 gigabytes. How did this happen? Well, Ruby has a garbage collector.
00:00:38.460 We know we can run commands like this, no matter how outrageous it sounds. If you’re a bit more patient than I am, you could even create 10 billion objects, and the garbage collector will handle it. Just to clarify, in this talk, when I refer to Ruby's garbage collector, I mean MRI Ruby, not JRuby or Truffle Ruby, which have their own distinct garbage collection algorithms. Don't worry if this doesn't make sense to you! For now, back to our billion objects. The garbage collector took care of this because we didn't keep a reference to them. We didn’t assign them to a variable or ask Ruby to remember them, thus allowing it to collect them as they were created without exhausting my memory. I could continue working on other tasks while it was all happening.
00:01:14.520 Later in this talk, I will discuss the `GC.stat` command, which confirmed that the garbage collector ran over 50,000 times while I created those billion objects. It's impressive to see and verify exactly what was happening, enabling me to generate these objects without consuming excessive memory. Additionally, we'll delve into what 'pages' mean later on, but for now, just think of them as a proxy for the memory we were utilizing. Interestingly, the memory usage didn’t really increase from before to after creating the billion objects. As our friend Oscar the Grouch from Sesame Street would say, garbage collection helps us keep our space clean.
00:01:40.320 Ruby does garbage collection for us, helping to manage memory automatically; the whole point of having a garbage-collected language is that we, as developers, shouldn’t have to worry about it. However, I know this is a big slide faux pas; there is so much text here! You don’t need to read it now, but this information is taken from the Ruby 3 release notes that were published last Christmas. There was a section discussing garbage collection, and it concluded with a simple request: 'Please test first.' This is great advice, but to test effectively, we must first understand what's happening under the hood.
00:02:06.180 Thank you so much for joining this talk titled 'A Day in the Life of a Ruby Object,' or perhaps more fittingly, 'A Day in the Life of a Billion Ruby Objects.' I hope we can achieve many things together in this session, and I'll share more goals shortly. One critical point to note is I encourage you to work along with me as we go. I will be sharing many examples and commands from the IRB console, and I hope you'll open one up and experiment as I do—this is a great way to learn!
00:02:43.320 My primary goal during this talk is to ensure that by the end, we can all conceptualize garbage collection-related issues in a way similar to how we associate certain problems with network issues. I want you to recognize a scaling or tuning issue and connect it back to garbage collection. More specifically, I hope we'll gain a broad understanding of how garbage collection works and what it actually does. To get there, we’ll explore the arc of object creation—what happens when we call `Object.new`, and what the garbage collector is doing during this process.
00:03:10.260 As a bit about myself, I sometimes wear a colorful bucket hat when it rains. More relevant to this talk, I'm currently writing a book about Ruby's garbage collection. I enjoy writing, especially about Ruby. I'm also a Ruby blogger and contribute to the tip of the week in Ruby Weekly. I have experience as a backend Ruby developer and, while I have your attention, I’d like to mention that I am a co-organizer of a new Ruby group aimed at women and non-binary individuals. We have two speakers every month, and we would love for anyone who identifies as women or non-binary to join us. You can find the latest meetup details pinned on our Twitter page.
00:03:44.820 To return to the release notes, they mentioned to 'please test first.' If we read a bit more, they ask us to look into compaction and what transpires during that process. If we refer back to the Ruby 2.7 release notes on compaction, we’ll notice several terms that could confuse those unfamiliar with garbage collection. Terms like memory space, copy-on-write, memory fragmentation, major collections, compaction, pages, and the heap are all significant. We will explore each of these throughout the talk.
00:04:06.680 Instead of focusing on a billion objects for now, let's simplify by concentrating on creating one specific object. If you're following along in your console, feel free to pause and rewind me as needed; it will be more impactful for you to learn alongside me. When we create a new object, Ruby returns an object and a memory address, which should look slightly different from mine if you're experimenting. We’ll discuss what that memory address signifies, but first, let’s address how we can access the object. Since we haven’t assigned the object to a variable, we can’t access it.
00:04:39.240 This was similar to what happened with our billion objects, which Ruby simply collected. If we wanted to have access to our new object, we could assign it to a variable like this: `object = Object.new`, which would let us hold on to the reference. Many of you might be familiar with the `object_id`, a unique identifier Ruby assigns to each object. If we want to delve deeper into what the memory address indicates, we must broaden our view of memory management in general.
00:05:07.800 Your machine has a heap that makes up most of its memory, and any machine running Ruby code will similarly have a large memory store. Within this general heap lies the Ruby heap, which can sometimes be linked to the operating system heap and can also shift in size. It may grow and request more memory from the operating system, explaining how you might observe Heroku dynos consuming increased memory as your heap expands. If we investigate the Ruby heap, we will find it consists of memory units called pages. When Ruby requests more memory from the OS, it does so in increments of pages.
00:05:50.700 Each page contains around 409 slots, which amounts to approximately 16 kilobytes per page. These slots are where the actual objects are stored. They can either be empty, unallocated slots, or they might contain Ruby values. Each of these slots has a fixed size of 40 bytes, which is the size I mentioned previously. Additionally, each page has a header providing information about the page itself and the slots contained within it. Some slots may hold Ruby's internal C representations, known as R values—these are effectively where the objects themselves reside.
00:06:26.580 When we create our new object, we can conceptualize it as existing within this R value. R values can be any types of objects. In some instances, such as when using a long string, we observe that if the string doesn’t fit in the 40 bytes, the R value will point to an external memory address within the operating system's heap. This means, at times, our Ruby heap may occupy less memory than the total memory consumed by our program. Such discrepancies can occur, so it’s crucial to be aware that the size of the Ruby heap does not always equate to the memory consumption you might observe in systems like Heroku or AWS.
00:06:59.600 With that, I'd like to touch on strings briefly. To demonstrate when strings can become too large for R values, we'll create strings of varying sizes to see the effects. In the console, we can format strings with a specific length using the percent dot d syntax, converting integers into strings. For instance, we can create a string of 23 characters in length, and I’ll use a red underline in our diagram to denote this. We can generate such strings for different lengths—specifically 23 and 24 characters—and interestingly, they represent the threshold where R values shift from internal storage to external storage.
00:07:36.900 We will observe that creating one million strings of length 24 takes nearly ten times longer than creating one million strings of length 23. While this is an interesting observation, keep in mind there's no need to adjust your strings; it’s more of an educational insight into garbage collection. As you experiment along, try creating strings of lengths 22, 21, and 25 to further investigate this boundary for yourself—it's quite fascinating! While on this note, I gathered those processing time measurements by using the 'measure' command in IRB.
00:08:09.260 Returning to R values, they can reference other R values. For example, if our object is contained within an array, the R value will point to another R value where our object resides. This leads us to the concept of object space, which is synonymous with Ruby memory. By requiring the `object_space` library and using `ObjectSpace.dump`, we can retrieve valuable data about our object in a JSON-like format. This representation will allow us to see the addresses of directly referenced R values.
00:08:39.240 Scrutinizing the first element in an array lets us confirm it indeed corresponds to the exact address of our object outside the array. At this stage, let’s zoom back out and reevaluate the billion objects. We’ve established that to create objects, we needed pages in our Ruby heap with available slots to hold our R values for each of the billion objects. Notably, we didn’t exhaust memory—how is that possible? Ruby’s garbage collector employs a tri-color mark and sweep algorithm for cleaning up memory.
00:09:07.680 This algorithm involves several steps, which I will outline. First, we mark all our R values white; then we mark all root R values gray. But what are root R values? These are R values that the program will always be aware of, essential for running the program. We initiate our algorithm here, knowing that anything not reachable from this point will eventually be collected. Root R values include VM finalizers, machine contexts, and global lists, among others. We can explore these examples more during the Q&A section if needed.
00:09:48.900 To illustrate the mark and sweep algorithm, I created several root R values, which we'll position at the top of the screen to make space for many non-root R values beneath them. The arrows in our visual representation indicate the references—like we saw with the array referencing another R value. Some root R values reference other R values, while non-root R values may reference R values or remain without connections. This setup allows us to understand how the algorithm functions.
00:10:20.460 To summarize the algorithm steps: we start off by marking all R values white. Next, we move on to mark all root R values gray. The next move involves running a little while loop where we pick a gray R value to mark all values it references gray and then mark the original value black. This process continues, and when there are no more gray R values left, we conclude that all remaining white R values can have their slots reallocated. This means that any R value without a reference from the root can be discarded.
00:11:07.620 When I created those billion objects, we started with an R value in our heap that filled up an entire page. The garbage collector would run through the algorithm, confirming that none of those objects had any references to them from the root and, thus, we could safely discard them. Accordingly, all those slots became available for new R values, creating a repeating cycle whereby memory could be continuously reused as new objects were created. As we discussed earlier, this running process occurred over 50,000 times.
00:11:45.960 It's important to note that Ruby’s garbage collection operates as a tri-color system. You might wonder why we don't simply use black and white. The introduction of gray facilitates a more efficient garbage collection process, known as incremental garbage collection. This type pauses the program to permit garbage collection and allows it to keep an intermediary state. The gray marking indicates where previous checks left off, in contrast to a system limited to black and white, which wouldn't allow such tracking.
00:12:23.760 Returning to our object, we understand it won't be collected until we clear that reference. This makes us consider an intriguing scenario: what happens if we create our object and then call `Object.new` a billion times? Would each of those garbage collections be reviewing our object repeatedly? This seems inefficient, introducing us to a concept known as generational garbage collection, which builds on the weak generational hypothesis—the idea that most objects die young. In our Rails applications, we see many transient objects that don't persist.
00:13:16.500 Generally, when we create a web page, a multitude of objects are generated but become obsolete once the page is served to a client. Certain objects, however, are expected to persist, like users or sessions during the program. Generational garbage collection utilizes that knowledge—while garbage collection will execute autonomously, we can also prompt it manually using `GC.start`. This command can accept an optional `full_mark` parameter, which dictates whether all objects will be scanned.
00:13:52.620 If `full_mark` is false, we are executing minor garbage collection that only considers young objects. If it is true, we perform a major garbage collection encompassing all objects. Major collections generally initiate when the minor collections haven't reclaimed sufficient space for the program to operate effectively. As we investigate our billion-object creation, we engage in a little aside about location. I previously lived in Boston, where they tend to drop the R sound. If we
00:14:32.520 say 'GC stat' (dropping the R from start, as if we were going full Boston) we receive data about the garbage collector's activity. Feel free to run this in your console; the output contains various helpful statistics. Of particular interest are the counts for minor and major garbage collections, which indicate the number of times each has executed throughout our application's lifecycle.
00:15:05.520 As referenced in our billion object experiment, if we segment this information into minor and major GC counts, we discover the vast majority of garbage collections were minor. In fact, out of several hundred garbage collections that executed, only four were major collections. Those four major GC runs occurred during the initialization of IRB and none during the creation of our billion objects. Consequently, it becomes apparent that objects age much more rapidly than anticipated, suggesting that we didn't recheck our object 50,000 times.
00:15:48.960 In fact, we can gauge the age of objects. This can be achieved by utilizing `ObjectSpace.dump`, which provides an extensive range of information—a particularly intriguing aspect being its inclusion of the object’s flags. An object gains a true 'old' flag when it becomes aged, indicating it has exceeded three generations of garbage collection; before that, it remains a young object without such a flag. We can create a simple method to check if our object is old based on the existence of this flag.
00:16:20.280 Moreover, we could also design a `count_gc_until_old` method, initiating a new object, counting the garbage collection cycles until the object's age is confirmed. Through experimentation, it becomes evident an object requires only three garbage collections to become considered old. The age is recorded within the R value's binary representation, initially set to zero at creation, which increments through garbage collection cycles until at last reaching the threshold that prompts subsequent evaluations only during major garbage collection.
00:17:05.340 Earlier, I mentioned the topic of compaction in relation to the Ruby 3 release notes and would like to delve into it more comprehensively. We discussed the structure of the Ruby heap, which includes pages comprising approximately 409 slots. In a hypothetical scenario, R values are distributed sparsely, leading to a state known as memory fragmentation—an issue that can elevate memory consumption and lower performance. Applying compaction remedies this, consolidating memory use by reducing the number of pages required and thus improving performance.
00:17:40.680 The compacting process ensures the heap is more 'copy-on-write friendly', streamlining memory management for multi-threaded applications, and promoting CPU cache efficiency. With this understanding, we can return to our console because we've enjoyed our experimentation! If we create a set number—say, 10,000 objects—and subsequently clear the array holding these objects, we can execute `GC.compact`. Note that compaction must be triggered manually unless auto-compaction is set.
00:18:17.040 By enabling auto-compaction, which has been available since Ruby 2.7, we can observe the difference in memory addresses before and after compaction. When viewing the addresses, the contrasts will become apparent. This exemplifies how the compaction process effectively redistributes memory, ensuring optimal usage. Looking back at the release notes, we can fully grasp the implications of this information—specifically that memory space refers to the Ruby heap we've been discussing, and compaction is designed to defragment memory space during garbage collection.
00:18:54.480 In summary, the updates in Ruby 3 included enhancements to compaction as well as autonomy control regarding compaction runs. We now have a strong foundation to test and interpret garbage collection behavior as we observe it in our applications. As we recap the discussion, we've journeyed through the Ruby heap structure, dissected the tri-color mark and sweep algorithm, explored incremental and generational garbage collections, and covered the nuances of compaction in terms of memory management.
00:19:33.780 While I hoped to share a substantial amount of information, I also wanted to express my gratitude for your engagement in this talk. I sincerely appreciate the opportunity to present these ideas. If you've stayed with me thus far, I invite you to sign up for updates regarding the book I’m currently writing about Ruby garbage collection. You can do this at buttondown.email/gemmaisroff to receive notifications about my writing.
00:20:02.640 Additionally, I am producing a blog post series surrounding garbage collection, which you can find at gemma.dev or follow along via my Twitter handle @gemmaissroff. I would be thrilled to converse with anyone about this subject or engage in deeper discussions. Feel free to reach out to me on Twitter, Discord, or any means you prefer. Thank you once more for participating in this talk!