Chris Kelly
Down the rb_newobj() Rabbit Hole: Garbage Collection in Ruby

Summarized using AI

Down the rb_newobj() Rabbit Hole: Garbage Collection in Ruby

Chris Kelly • February 20, 2013 • Earth

The talk "Down the rb_newobj() Rabbit Hole: Garbage Collection in Ruby" presented by Chris Kelly at RubyConf AU 2013, explores the enhancements made to Ruby's garbage collection during the transition from Ruby 1.8 to Ruby 1.9. The discussion provides insight into the underlying mechanisms of garbage collection, showcasing how these changes led to significant performance improvements for applications, as experienced by the New Relic team.

Key Points Discussed:
- Introduction to Garbage Collection (GC):
- Garbage collection is defined as the process of identifying and reclaiming unused memory for reuse.
- Transition and Experience:
- New Relic upgraded from Ruby 1.8 to 1.9 to address performance issues, resulting in decreased garbage collection time from 80 milliseconds to 42 milliseconds.
- Understanding Object Lifecycle:
- Inserted objects need to be collected, with a major aim of GC being the collection of unreachable objects from the root.
- Navigating C Ruby Source Code:
- Emphasis was placed on understanding C Ruby's structure through its source code to better grasp how garbage collection works.
- Key focus was on header files and object architecture necessary for comprehension of GC processes.
- Detailed GC Process Breakdown:
- The talk elaborated on the marking and sweeping phases that manage object allocation and collection processes, with a description of 'Stop the World' phenomenon when garbage collection is performed.
- Architectural Innovations:
- Introduction of techniques like ‘lazy sweep’ for optimizing garbage collection and performance in Ruby 1.9, highlighting a shift towards efficiency.
- Real-World Implications:
- Insights were shared on how memory management and garbage collection techniques are critical, especially for embedded systems where resources are limited.
- Community Acknowledgment:
- Acknowledgment of contributions from individuals like Nari, whose work underpins advancements in memory management in Ruby.
- Recommendations for Further Reading:
- Suggestions for exploring resources such as "Ruby Under a Microscope" for a deeper understanding of Ruby's interaction with C.

Conclusion:
Through this deep dive, attendees were encouraged to appreciate the complexities of Ruby's garbage collection and to understand the improvements that facilitate optimized application performance in the Ruby ecosystem. Moreover, the talk highlighted the importance of community-driven enhancements which continue to shape Ruby's development.

Down the rb_newobj() Rabbit Hole: Garbage Collection in Ruby
Chris Kelly • February 20, 2013 • Earth

RubyConf AU 2013: http://www.rubyconf.org.au

(apologies for the poor audio)
New Relic recently made the big move to Ruby 1.9.3 which showed meaningful improvements over 1.8, particularly in garbage collection. So this talk is taking a look at what changed in Ruby's garbage collection that caused much of the improvements. We will start with the fundamentals of garbage collection but work down to the nitty gritty C code to get to the details of what's going on, starting with rb_newobj(). You should walk away with an understanding of how garbage collection works in MRI and a nice appreciation for the overall lifecycle of Ruby objects.

RubyConf AU 2013

00:00:06.740 Well, good morning everybody! Thank you for choosing this talk. I know the talk downstairs is awesome; I've seen a version of that before and you should watch it recorded. I really appreciate you coming to this one. My name is Chris Kelly, as mentioned.
00:00:12.420 On the internet, I go by Chris from New Relic, so that’s like my Twitter handle. Feel free to harass me on any of those platforms at any time. I work at a company called New Relic; for those who don't know, we specialize in application performance monitoring. That's really where this talk started, something we encountered while doing an upgrade. I'll go over that a little bit, but New Relic really loves Australia!
00:00:29.160 We actually have three Australians on staff, some of the most important people I've met. Two of them work in the US while one still lives here. So much so that we featured one of them on our homepage—Um, that’s Julian; you can see the back of his head! I'm also having a great time in Australia. This koala’s name is actually Ruby! I walked into this exhibit and thought, are you kidding me? I'm here forever! This couldn't be more important. That was at the Sydney Zoo earlier this week.
00:00:58.620 Thank you for having me, it's been a great time so far. I'm definitely coming back. This is my first time, so let's dive into what we're talking about today. There are four parts to this talk. The first part is really about understanding what garbage collection is and how we got here with this talk.
00:01:18.900 Next, we'll look at how to navigate the C Ruby source code. What we're focusing on here is C Ruby; we’re not talking about Rubinius, JRuby, or any other implementations; we are specifically talking about MRI. So, how many people here have ever opened a C or header file in the Ruby source? Okay, how many of you closed it immediately? The idea of part two is to hopefully get you comfortable with looking at the source code and understanding its structure.
00:01:47.880 In part three, we're going to take a deep dive into what happens once an object gets created and how it moves through to garbage collection. There is probably ten lines of Ruby in this talk, and a couple hundred lines of C. The Ruby code is interesting; it’s mostly just fake code designed to help us, so just be prepared for that. Finally, we'll conclude by discussing garbage collection in general—looking at what Ruby 1.8 did, the changes made in Ruby 1.9, and how we've moved forward.
00:02:35.099 Once you create objects, you then have to get rid of them, so that’s sort of the path we're going to follow. I hope this sounds good to everyone! So, as I mentioned, we’re focusing on C Ruby. There are great other implementations of Ruby, but in this case, we’re building on MRI.
00:02:55.260 Charles is here, who leads the JRuby team, and that's awesome. In the JVM architecture, they have the benefit of a garbage collector, but C Ruby does not have that luxury. We have to figure out how to manage garbage collection ourselves.
00:03:06.239 This transition from Ruby 1.8 to 1.9 has been particularly interesting. The core team shifted from handling garbage collection manually to adopting ideas from Unix processes that manage garbage collection for them. So, if you don't know what garbage collection is, hopefully you do, otherwise you might be quite confused by this topic! Essentially, garbage collection is the process of identifying unused memory and reclaiming it, so it can be reused by other processes.
00:03:54.060 This journey began back in September when New Relic was running Ruby 1.8, at which point we were experiencing about 80 milliseconds of garbage collection time for the main application. In the middle section, you have the Ruby process itself running along, and at the top is a database call. In September, we realized we needed to upgrade to Ruby 1.9 because we were timing out. As a performance company, we think we should keep up with the latest advancements.
00:04:18.540 So, in September of last year, we transitioned to Ruby 1.9, and we saw a dramatic decrease in garbage collection time. Our average garbage collection time dropped to 42 milliseconds! This prompted our team to explore what changed between Ruby 1.8 and 1.9 regarding garbage collection.
00:05:15.780 This exploration started with one of the engineers at my team, Tom Lee, who proposed the topic. I thought it was an awesome idea and asked if I could build on it, and so here we are.
00:05:31.560 If you want to see the performance changes, we saw some improvements in the Ruby process. The gap we're seeing isn't entirely due to garbage collection savings, but we did move the needle. All right, so that's how we got here.
00:05:52.260 Now let’s talk about how Ruby is built around objects—everything is an object. Garbage collection pertains to these objects. Objects are created and then need to be collected again. The major function of garbage collection is to gather all unused objects, traverse through them in a tree structure, and bring them back for reuse.
00:06:05.699 An object is considered garbage if it is unreachable from the root. We'll discuss this further. There’s a module called object space in Ruby that gives you access to every live object that exists. It allows you to traverse all created objects and find things like class instances.
00:06:30.780 It's very handy! This is where garbage collection happens; you can call the garbage collector off of this space. You may have also come across commands like GC.start, which perform very similar functions. I created a little program using object_space to gather all live objects.
00:06:47.760 When Ruby starts up, I find that we already have about 14,000 objects created. This isn’t even Rails; this is just Ruby! Imagine how many objects are created when Rails boots up—it has to load every variable and piece of code that goes into your object space.
00:07:08.160 Thus, your object space can expand significantly very quickly. Let's take a quick look at what happens when you create a class. Out of the box, you have 478 classes ready. If I create a custom class, like with 'Foo.new', it yields two more classes somehow. So, creating classes contributes to this growth.
00:07:41.220 Next, let’s create an empty array, which is just another object. Let's throw 10,000 objects into that array. So, I began with eight objects, and by adding, I reached 2,008 objects. This gives you an idea of how usage and memory can balloon.
00:08:01.800 Then, when I attempt to collect garbage, nothing happens. Given that, for all intents and purposes, these objects still exist—they are live and reachable.
00:08:16.799 However, if I reset everything back and run garbage collection, I see those 10,000 objects disappear, becoming available for reuse once again. If we check our classes again, we notice we're back to 478.
00:08:32.040 This brings us to the concept of garbage collection within a small, simple program. Create and destroy—this encapsulates what happens. Now imagine what happens with the few lines of code I just covered versus the bytecode that Rails generates—it contributes significantly to garbage collection before a processor or computer runs out of memory.
00:08:59.160 This discussion is vital, especially in embedded systems where memory is often limited. Now let's navigate into C Ruby—what's going on under the hood? As you might know, Ruby is written in C. We'll focus on header files, as they contain most of what we need, and then we'll talk about the VM, objects, and the garbage collector.
00:09:36.540 The first thing you need to understand about C Ruby is navigating through it. For example, value is usually an unsigned long. This pointer leads to the Ruby object in memory. Everything in Ruby is accessed through this pointer.
00:10:02.040 In C Ruby, they employ macros extensively, and if you aren't familiar with them, you might find yourself confused. A capitalized value usually indicates it's a macro that retrieves a pointer. Additionally, certain things, such as true, false, and nil, are used so frequently that they can be embedded directly rather than always assigned with a pointer.
00:10:40.140 Let’s look at object architecture briefly. We can create a basic object in Ruby 1.9 and 2, and from there, explore how instances are created. When dealing with Ruby objects, there’s a struct that includes pointers to some flags and the class.
00:11:03.600 The flags are particularly critical as they track if an object is marked for collection or frozen. Understanding this structure can help visualize how Ruby manages its objects and memory.
00:11:38.160 Now let’s discuss macros because they are integral to Ruby's source. A macro is essentially a substitution, taking values and inserting code chunks. They’ll appear frequently, so familiarizing yourself with their usage is crucial.
00:12:19.020 For instance, we have our basic Ruby string, which translates to a structure that can also encode performance improvements; specifically for strings less than or equal to 23 characters, the structure evenly embeds the string data. Otherwise, it retains pointers to the string data.
00:13:05.760 So, let's discuss a macro magic example. The R string pointer macro retrieves a pointer back for a string, abstracting the complexity of pointer management from the developer. This dramatically improves efficiency.
00:13:44.760 Thus, we enjoy reduced overhead in garbage collection because we don’t have to traverse unnecessary pointers repeatedly, improving the performance of the process.
00:14:38.760 The dynamics of heaps also come into play during garbage collection in Ruby. Heaps are allocated in defined slots, contributing to the assignment of slots and ensuring the available memory is efficiently organized for object storage and retrieval.
00:15:42.480 The allocation process could denote whether a new heap needs to be created or garbage should be collected inside the heap. So, let's delve into the actual garbage collection methods.
00:16:17.760 The first task during garbage collection is addressing the markers, where we traverse the objects and seek out connections stemming from the root. Unmarked items essentially signify objects that are no longer reachable.
00:17:04.680 The mark phase reveals which objects are considered in use, marking accessible paths while the sweeping phase disposes of what is deemed unnecessary. Each object is touched throughout this process which can culminate in a notable performance hit.
00:17:45.060 During a market sweep, we need to ensure every object is addressed—no running at once and object allocation processes halt. This halting phenomenon is part of what we term 'Stop the World' as it pauses various applications' functionalities.
00:18:29.640 As we explore together, it relates back to the cyclical functioning of garbage collection. Each new object creation hinges on the previous state of copied structures, thereby potentially hindering performance.
00:19:43.260 Ultimately, to optimize performance as we advanced from Ruby 1.8 to 1.9, new techniques such as ‘lazy sweep’ were introduced, striking a balance between thorough garbage collection and efficient application performance.
00:20:41.340 One noteworthy shift in Ruby performance is towards embedding flags within heaps themselves, establishing bitmap representations. This shift allows one process to mark an object without sacrificing the structural integrity of another. The upcoming versions of Ruby will hence leverage optimizations derived from collective experiences and contributions, enriching future performance capabilities.
00:21:52.560 In encouraging you to delve into garbage collection, it’s critical to understand that the complexity often reveals interesting and effective methodologies ingrained within Ruby's structure.
00:22:23.880 As I mentioned earlier, there is an excellent paper discussing garbage collection techniques even for experimental methods that may arise in the future. The slides from this talk will be available on SpeakerDeck for your reference.
00:23:02.760 Thanks especially to those who solidified the foundation on which this talk is built; for instance, Nari led many incredible initiatives on memory management and garbage collection that contribute to Ruby's evolutionary journey. Resources such as 'Ruby Under a Microscope' are beneficial as they elucidate how Ruby interacts with C.
00:24:00.660 Finally, if you wish to refresh your memory on C concepts like pointers, I would advise referencing classic C literature. Thank you all for your time today, and I’d be happy to take questions for the remaining moments!
Explore all talks recorded at RubyConf AU 2013
+25