00:00:06.740
Well, good morning everybody! Thank you for choosing this talk. I know the talk downstairs is awesome; I've seen a version of that before and you should watch it recorded. I really appreciate you coming to this one. My name is Chris Kelly, as mentioned.
00:00:12.420
On the internet, I go by Chris from New Relic, so that’s like my Twitter handle. Feel free to harass me on any of those platforms at any time. I work at a company called New Relic; for those who don't know, we specialize in application performance monitoring. That's really where this talk started, something we encountered while doing an upgrade. I'll go over that a little bit, but New Relic really loves Australia!
00:00:29.160
We actually have three Australians on staff, some of the most important people I've met. Two of them work in the US while one still lives here. So much so that we featured one of them on our homepage—Um, that’s Julian; you can see the back of his head! I'm also having a great time in Australia. This koala’s name is actually Ruby! I walked into this exhibit and thought, are you kidding me? I'm here forever! This couldn't be more important. That was at the Sydney Zoo earlier this week.
00:00:58.620
Thank you for having me, it's been a great time so far. I'm definitely coming back. This is my first time, so let's dive into what we're talking about today. There are four parts to this talk. The first part is really about understanding what garbage collection is and how we got here with this talk.
00:01:18.900
Next, we'll look at how to navigate the C Ruby source code. What we're focusing on here is C Ruby; we’re not talking about Rubinius, JRuby, or any other implementations; we are specifically talking about MRI. So, how many people here have ever opened a C or header file in the Ruby source? Okay, how many of you closed it immediately? The idea of part two is to hopefully get you comfortable with looking at the source code and understanding its structure.
00:01:47.880
In part three, we're going to take a deep dive into what happens once an object gets created and how it moves through to garbage collection. There is probably ten lines of Ruby in this talk, and a couple hundred lines of C. The Ruby code is interesting; it’s mostly just fake code designed to help us, so just be prepared for that. Finally, we'll conclude by discussing garbage collection in general—looking at what Ruby 1.8 did, the changes made in Ruby 1.9, and how we've moved forward.
00:02:35.099
Once you create objects, you then have to get rid of them, so that’s sort of the path we're going to follow. I hope this sounds good to everyone! So, as I mentioned, we’re focusing on C Ruby. There are great other implementations of Ruby, but in this case, we’re building on MRI.
00:02:55.260
Charles is here, who leads the JRuby team, and that's awesome. In the JVM architecture, they have the benefit of a garbage collector, but C Ruby does not have that luxury. We have to figure out how to manage garbage collection ourselves.
00:03:06.239
This transition from Ruby 1.8 to 1.9 has been particularly interesting. The core team shifted from handling garbage collection manually to adopting ideas from Unix processes that manage garbage collection for them. So, if you don't know what garbage collection is, hopefully you do, otherwise you might be quite confused by this topic! Essentially, garbage collection is the process of identifying unused memory and reclaiming it, so it can be reused by other processes.
00:03:54.060
This journey began back in September when New Relic was running Ruby 1.8, at which point we were experiencing about 80 milliseconds of garbage collection time for the main application. In the middle section, you have the Ruby process itself running along, and at the top is a database call. In September, we realized we needed to upgrade to Ruby 1.9 because we were timing out. As a performance company, we think we should keep up with the latest advancements.
00:04:18.540
So, in September of last year, we transitioned to Ruby 1.9, and we saw a dramatic decrease in garbage collection time. Our average garbage collection time dropped to 42 milliseconds! This prompted our team to explore what changed between Ruby 1.8 and 1.9 regarding garbage collection.
00:05:15.780
This exploration started with one of the engineers at my team, Tom Lee, who proposed the topic. I thought it was an awesome idea and asked if I could build on it, and so here we are.
00:05:31.560
If you want to see the performance changes, we saw some improvements in the Ruby process. The gap we're seeing isn't entirely due to garbage collection savings, but we did move the needle. All right, so that's how we got here.
00:05:52.260
Now let’s talk about how Ruby is built around objects—everything is an object. Garbage collection pertains to these objects. Objects are created and then need to be collected again. The major function of garbage collection is to gather all unused objects, traverse through them in a tree structure, and bring them back for reuse.
00:06:05.699
An object is considered garbage if it is unreachable from the root. We'll discuss this further. There’s a module called object space in Ruby that gives you access to every live object that exists. It allows you to traverse all created objects and find things like class instances.
00:06:30.780
It's very handy! This is where garbage collection happens; you can call the garbage collector off of this space. You may have also come across commands like GC.start, which perform very similar functions. I created a little program using object_space to gather all live objects.
00:06:47.760
When Ruby starts up, I find that we already have about 14,000 objects created. This isn’t even Rails; this is just Ruby! Imagine how many objects are created when Rails boots up—it has to load every variable and piece of code that goes into your object space.
00:07:08.160
Thus, your object space can expand significantly very quickly. Let's take a quick look at what happens when you create a class. Out of the box, you have 478 classes ready. If I create a custom class, like with 'Foo.new', it yields two more classes somehow. So, creating classes contributes to this growth.
00:07:41.220
Next, let’s create an empty array, which is just another object. Let's throw 10,000 objects into that array. So, I began with eight objects, and by adding, I reached 2,008 objects. This gives you an idea of how usage and memory can balloon.
00:08:01.800
Then, when I attempt to collect garbage, nothing happens. Given that, for all intents and purposes, these objects still exist—they are live and reachable.
00:08:16.799
However, if I reset everything back and run garbage collection, I see those 10,000 objects disappear, becoming available for reuse once again. If we check our classes again, we notice we're back to 478.
00:08:32.040
This brings us to the concept of garbage collection within a small, simple program. Create and destroy—this encapsulates what happens. Now imagine what happens with the few lines of code I just covered versus the bytecode that Rails generates—it contributes significantly to garbage collection before a processor or computer runs out of memory.
00:08:59.160
This discussion is vital, especially in embedded systems where memory is often limited. Now let's navigate into C Ruby—what's going on under the hood? As you might know, Ruby is written in C. We'll focus on header files, as they contain most of what we need, and then we'll talk about the VM, objects, and the garbage collector.
00:09:36.540
The first thing you need to understand about C Ruby is navigating through it. For example, value is usually an unsigned long. This pointer leads to the Ruby object in memory. Everything in Ruby is accessed through this pointer.
00:10:02.040
In C Ruby, they employ macros extensively, and if you aren't familiar with them, you might find yourself confused. A capitalized value usually indicates it's a macro that retrieves a pointer. Additionally, certain things, such as true, false, and nil, are used so frequently that they can be embedded directly rather than always assigned with a pointer.
00:10:40.140
Let’s look at object architecture briefly. We can create a basic object in Ruby 1.9 and 2, and from there, explore how instances are created. When dealing with Ruby objects, there’s a struct that includes pointers to some flags and the class.
00:11:03.600
The flags are particularly critical as they track if an object is marked for collection or frozen. Understanding this structure can help visualize how Ruby manages its objects and memory.
00:11:38.160
Now let’s discuss macros because they are integral to Ruby's source. A macro is essentially a substitution, taking values and inserting code chunks. They’ll appear frequently, so familiarizing yourself with their usage is crucial.
00:12:19.020
For instance, we have our basic Ruby string, which translates to a structure that can also encode performance improvements; specifically for strings less than or equal to 23 characters, the structure evenly embeds the string data. Otherwise, it retains pointers to the string data.
00:13:05.760
So, let's discuss a macro magic example. The R string pointer macro retrieves a pointer back for a string, abstracting the complexity of pointer management from the developer. This dramatically improves efficiency.
00:13:44.760
Thus, we enjoy reduced overhead in garbage collection because we don’t have to traverse unnecessary pointers repeatedly, improving the performance of the process.
00:14:38.760
The dynamics of heaps also come into play during garbage collection in Ruby. Heaps are allocated in defined slots, contributing to the assignment of slots and ensuring the available memory is efficiently organized for object storage and retrieval.
00:15:42.480
The allocation process could denote whether a new heap needs to be created or garbage should be collected inside the heap. So, let's delve into the actual garbage collection methods.
00:16:17.760
The first task during garbage collection is addressing the markers, where we traverse the objects and seek out connections stemming from the root. Unmarked items essentially signify objects that are no longer reachable.
00:17:04.680
The mark phase reveals which objects are considered in use, marking accessible paths while the sweeping phase disposes of what is deemed unnecessary. Each object is touched throughout this process which can culminate in a notable performance hit.
00:17:45.060
During a market sweep, we need to ensure every object is addressed—no running at once and object allocation processes halt. This halting phenomenon is part of what we term 'Stop the World' as it pauses various applications' functionalities.
00:18:29.640
As we explore together, it relates back to the cyclical functioning of garbage collection. Each new object creation hinges on the previous state of copied structures, thereby potentially hindering performance.
00:19:43.260
Ultimately, to optimize performance as we advanced from Ruby 1.8 to 1.9, new techniques such as ‘lazy sweep’ were introduced, striking a balance between thorough garbage collection and efficient application performance.
00:20:41.340
One noteworthy shift in Ruby performance is towards embedding flags within heaps themselves, establishing bitmap representations. This shift allows one process to mark an object without sacrificing the structural integrity of another. The upcoming versions of Ruby will hence leverage optimizations derived from collective experiences and contributions, enriching future performance capabilities.
00:21:52.560
In encouraging you to delve into garbage collection, it’s critical to understand that the complexity often reveals interesting and effective methodologies ingrained within Ruby's structure.
00:22:23.880
As I mentioned earlier, there is an excellent paper discussing garbage collection techniques even for experimental methods that may arise in the future. The slides from this talk will be available on SpeakerDeck for your reference.
00:23:02.760
Thanks especially to those who solidified the foundation on which this talk is built; for instance, Nari led many incredible initiatives on memory management and garbage collection that contribute to Ruby's evolutionary journey. Resources such as 'Ruby Under a Microscope' are beneficial as they elucidate how Ruby interacts with C.
00:24:00.660
Finally, if you wish to refresh your memory on C concepts like pointers, I would advise referencing classic C literature. Thank you all for your time today, and I’d be happy to take questions for the remaining moments!