00:00:48.480
Well then, let's get started. The last talk is by our pretty renowned speaker, Aaron. The topic is going to be. Hello, Aaron, how are you? I'm doing well. How are you? I'm doing well, thank you. No cat butts on the screen, not from my side. Same here. Okay, cool. I'm enjoying the sun and still hanging in there until tomorrow. Just another day of summer, yes! We can do it! Nice! I was about to say that your talk is going to be about the garbage collector. This reminds me of a Marvel franchise because I've been seeing talks on garbage collectors from you since I think three or four years now, and they're generally great.
00:01:20.320
Because, as I was saying before to someone else in the hallway chat, seldom do you get a chance in your daily job to look at the internals of something that you use all the time. I've always appreciated the insights I get from your talks, besides the fun, of course. Oh, thank you! I like talking about the garbage collector and looking at it because it's this thing that's always there, doing work for us. But we don't ever have to look at it, nor do we have to mess with it too much, until sometimes we do, and then it's like, oh boy! It's perfect. I think everybody shares that sentiment when looking at it.
00:01:52.960
Fun fact: The only technical question I got in my interview when I interviewed for my current company was about how Ruby handles memory. I was like, 'What kind of question is that?' It seemed strange. I was like, are you really sure you’re a Ruby developer? I guess so. Anyway, this is going to be a pre-recorded talk, and the people in the chat know that already. They’ll post questions there if there are any, and I’ll read them to you afterwards. Just be funny, people. Come on, the bar is pretty high here. I leave the stage to you then.
00:04:09.360
Hi everyone! Today I'm going to be talking about reducing memory by using more. Actually, I'm mostly going to be talking about concurrent and automatic compaction in Ruby, but I thought that the other name was much better. So I wanted to use that name instead. I am going to be talking about reducing memory by using more, but the main focus will be on automatic concurrent compaction today.
00:04:21.519
Hi everyone, welcome to Ruby Day! I really wish that I could have come in person, but I guess the good news is that since I can film from home, you can actually see my cat. Hey buddy, say hello to everyone! Ah, oh, hello! Okay, I think he’s done. My name is Aaron Patterson, and I'm known on the internet as Tenderlove, but you can just call me Aaron too; it’s really fine.
00:04:54.320
This year, I started a new job at a company called Shopify, and I'm a full-time Ruby and Rails contributor for the company. I feel really lucky to have this job, and I just wanted to say that I'm really happy I get to work on Ruby and Rails, improving the language and the framework all day. So I just want to say thanks to Shopify; it’s really a dream come true for me.
00:05:28.960
In this presentation today, I want to talk about the developments I’ve been doing at work on Ruby, as well as the progress and process that I’ve made so far. As I said, I'm on the Ruby and Rails core team, and I'm employed to be a Ruby and Rails contributor. I’m also the inventor of the Analog Terminal Bell, and I don’t know if you have seen the ad for this, but if we have time, I may play it.
00:06:03.600
Anyway, let's get into the presentation. In Ruby 2.7, I introduced a feature called gc.compact, and this feature allows us to save memory by filling in spaces in our heap. So if we have extra unused areas in the heap that are fragmented, we can fill in those spaces and use them. This is a technique called eliminating fragmentation.
00:06:28.000
If you recall, there was a DOS utility program from a long time ago that would defrag your hard drive—it would fill in any unused spaces on your hard drive. This is basically the same concept, except instead of your hard drive, it's filling in spaces in your memory. Compaction allows us to reclaim memory that's not being used. This sample program demonstrates the effectiveness of compaction.
00:06:59.199
So first off, it loads a Rails application into memory. Then it runs the GC to remove any garbage that was created while loading all the code, and then it compacts the heap to remove any unused spaces in memory. Here’s an image showing what the memory layout looks like before compaction runs. The green and black dots represent objects, while the white dots represent free space. After the compactor runs, it looks something like this: the layout changes, and the red dots indicate objects that couldn’t be moved. In this new graph, the objects are much closer together, so you don’t see as many white spaces between them.
00:07:36.800
And it's kind of hard to see, but the graph is more narrow than the previous one, indicating that we're consuming less memory. Before we dive into my current work, I want to cover the existing compaction algorithm in Ruby 2.7. The compaction process has two steps: first, moving the objects closer together, and then updating all the references.
00:08:11.120
Imagine that there’s a memory layout that looks like this: the blue squares represent live Ruby objects, and the white squares represent empty slots where we could store a Ruby object. Yellow represents a forwarding address. The compaction process in Ruby 2.7 requires two pointers, one at either side of the heap: one at the beginning and one at the end. One of them is called a free pointer, and the other one is called a scan pointer.
00:08:50.280
The free pointer scans right until it finds a free slot, and then the scan pointer moves left until it finds a filled slot. When it finds a filled slot, it swaps the two, so they get swapped around. It then writes a forwarding address to the old location. For example, the object in slot 9 is moved to slot 3, so we write a forwarding address of 3. We repeat this process of looking for empty slots, finding filled slots, swapping them, and writing addresses until the two pointers meet. Then we know we're done with the movement process.
00:09:17.920
Here’s the same heap layout before compaction. Some of these objects have references to each other, and it might look something like this graph. Now, when the objects get moved around, the references can go bad. For example, some references have gone bad, so we need to update all the references.
00:09:53.280
To do this, we simply iterate over the heap, checking the references and updating them. In this case, object one references object seven, but the object that was at seven is now at four, so we update that reference to point to four. The next object, two, is pointing at six; we need to update that to point to five. We repeat this process for every single object in the heap, and once we reach the end, we’re done with the reference update process.
00:10:31.440
Finally, when we're done, we can clear out all the forwarding addresses, and those become just empty slots again. Now we're done compacting the heap. This is the existing algorithm in Ruby 2.7. As far as compaction itself is concerned, I want to talk about my personal goals for that feature.
00:11:05.760
First off, I want to save memory, which is probably the most important goal. We want to save memory, and that’s essential. I also want it to be automatic, so developers don't have to know that it's happening or explicitly call for heap compaction; it should just happen automatically in the background and just work. Additionally, I would like it to be concurrent, which means your program can run while we compact the heap without pausing it too much.
00:11:47.680
And finally, I want it to be fast. I will discuss each of these goals and where they stand. First, let's talk about saving memory. In this slide, we see two graphs: one is the layout of a Rails application before we've compacted the heap, and the other one is after we've compacted the heap.
00:12:06.400
By comparing these two graphs, we can see that we've saved memory because the compacted heap is narrower. We were able to free up some space, and the compacted heap has all the objects much closer together, so there’s no free space in between them. This helps improve copy-on-write performance. Thus, we can confidently say that the first goal of saving memory is met.
00:12:46.240
Now, let’s look at the next goal of making the compactor automatic. I want to make the compactor automatic because I don't think developers should have to know that it's running or invoke it on their own. It should just happen behind the scenes for you. But today, we have to call gc.compact in order for the compactor to run, which clearly does not meet this goal.
00:13:14.160
Typically, we load our entire Rails application and then manually compact the heap because the code will remain loaded until we kill the process. However, the need to call gc.compact ourselves is a clear indicator that this goal is not met.
00:13:46.720
As for concurrency, we also must call gc.compact explicitly, confirming that this goal is not met either. I want to take this opportunity to discuss an implementation problem: When objects are moved, the destination slot has to be empty, which might not sound like an issue. However, it means that the GC steps must happen in a certain order.
00:14:04.720
As previously mentioned, the first step is moving the objects, followed by updating the references. Since we can only move objects into empty slots, the sweep step must occur before the compact step. The mark step must happen before the sweep step as well. Therefore, we need to call gc.compact ourselves, adhering to this specific order of operations.
00:14:41.760
Finally, the last goal of speed presents a subjective issue. The question arises: how fast is compacting the heap? It depends on what you are comparing it against. In my view, discussing speed is moot unless we have concurrency.
00:15:06.720
If you have to call gc.compact when booting your application, the time it takes becomes irrelevant because you won’t be serving any requests until the process is fully initialized. Therefore, we can't accurately assess whether the current solution is fast. However, making the compactor automatic, and fast, both hinge on implementing concurrency.
00:15:48.960
When we mention concurrency, we must consider running concurrently with something. Ideally, it would be best if we were executing concurrently with Ruby code. This way, we can reduce the pause times of your program, allowing it to run while compacting. However, for my first implementation, I thought it would be easier to start by making it concurrent with the sweeping step.
00:16:31.840
So, for the first implementation, we will adjust the move and sweep steps to execute at the same time. Now, let's talk about the sweeping process and how we can integrate the move process. After the marking step of the GC, any objects that will be kept will be marked, indicated by the marked bits along the top.
00:17:09.440
We keep a bit table that indicates whether or not an object is marked. Anything that is marked will be red and cannot be released, while unmarked objects can be swept and freed. So we look at each object; if the object is marked, we ignore it and move to the next one. If it isn't marked, we free the object and mark that slot as reusable. We continue this process, iterating through the heap.
00:17:48.720
Let’s now take a look at how we can combine the sweep and move steps. The sweep cursor resembles the move cursor, so we should be able to combine these two processes, moving across and examining each object.
00:18:02.960
The first step is to add a scan cursor analogous to the one in the original algorithm. The sweep step, in this case, remains unchanged. It just needs to look at each object and its marked bit. However, after freeing an object, instead of proceeding to the next, we utilize the scan cursor to search backward for an object that we can move into that slot.
00:18:46.640
By introducing the scan cursor, we can write a forwarding address and this looks identical to the previous compaction algorithm, merging movement and freeing any objects. When the two cursors converge, we know that the heap has been compacted, just as before.
00:19:14.560
Once they meet, we pause the sweep step to update all the references. Reference updating is the one non-concurrent step, so we must halt the movement and sweeping at that time. After we update the references in the same manner as the earlier compaction algorithm, we can continue the sweep step as normal.
00:19:59.120
The sweeping mechanism automatically frees any forwarding addresses. Thus, the sweep will clear out any forwarding addresses that aren’t marked, returning them to the free list. After the sweep step finishes, compaction is complete.
00:20:37.200
After implementing this algorithm, it seemed to work well. I sent a PR, and you can see it at this URL; the CI passed, and everything appeared fine. However, after merging, CI started failing, prompting us to revert the patch.
00:21:04.880
I set out to discover what was wrong. The algorithm itself seemed sound, but it took time to pinpoint the bug. I will describe what it was. Let’s say in Ruby we have maybe three classes: A, B, and C, where B and C are subclasses of A.
00:21:41.440
Internally, Ruby tracks the class parents and children. So, class A knows its children, B and C, while B and C know their parent A. If class C gets deleted, the GC frees C, but C must also be removed from class A's child list.
00:22:19.520
So due to this, class C has to access class A during the GC sweep phase. Let’s illustrate what happens when we incorporate the compaction algorithm. The sweep cursor moves along until it finds class C, which is unmarked, therefore it must remove itself from class A's child list.
00:22:59.760
After that, class C is deleted. However, let’s examine the same process involving the compaction algorithm. The sweep moves along and spots an empty slot; thus, the scan pointer proceeds to class A, leaving a forwarding address.
00:23:41.120
But when the sweep reaches class C during the sweep, class C needs to remove itself from class A's child list. It attempts to access its parent to perform this task, but class C is now pointing at a forwarding address, not the actual class A.
00:24:13.920
Therefore, when C tries to remove itself from class A's list, it encounters a forwarding address it cannot process, resulting in a program crash.
00:24:36.960
In the debug logs from the revert commit, if we evaluate the crash, we can see that this child class is in fact pointing at a moved address. To address this issue effectively, I made a significant assumption: when an object was freed, it would only free its own memory.
00:25:10.560
Accordingly, I wanted to implement a restriction on objects, ensuring they could only interact with their internals and not affect external objects. However, I realized I could not enforce this, as existing code might rely on such behavior.
00:25:43.200
Thus, I needed a solution. I opted to introduce read barriers. What are read barriers? Read barriers are functions triggered when something is read. For instance, a simple read barrier could intercept attempts to access an instance variable.
00:26:15.920
When an object is read, we have a chance to execute an action. In this particular GC bug scenario where C tries to read A, we could intercept and reposition A back into its original place.
00:26:52.960
Unavoidably, read barriers present challenges. They require placement in numerous locations, which complicates implementation because every significant point in the code might require a barrier.
00:27:50.080
Also, if we have to incorporate these barriers everywhere, it might slow down program execution, as we will have to check if a read barrier needs execution each time. This method treats our program as a graph, aiming to preserve its integrity.
00:28:27.360
Should any instance variable move, the barrier could execute upon reading and rectify the location. While this holds great potential, it is indeed challenging to implement and may hinder performance.
00:29:08.000
I, being quite lazy, decided to use and protect. This function allows us to safeguard a memory section from being accessed. If there’s an access attempt, an exception occurs, allowing us to take appropriate action.
00:29:49.600
To elucidate how this resolves the bug, we would implement the same process as before. After moving an object, we would protect the associated page, such that when C attempts to read the address, it triggers an exception since the region of memory is guarded.
00:30:26.640
After handling the exception, we remove the protection from the page and reposition all objects back to their original locations, allowing the sweep to proceed.
00:30:44.320
At this point, A is back in the right location, enabling C to free itself from A's child list.
00:31:16.000
However, this implementation leads to one hurdle: the system call's protected regions must be divisible by four kilobytes. The beginning address and the protected region’s size must both meet this criterion.
00:31:52.880
Although we confirm that the page addresses align to the four-kilobytes requirement, the total size to protect presents an issue. The calculated size for memory is not divisible by four kilobytes, creating complications.
00:32:35.440
Instead, I proposed leveraging exactly 16 kilobytes, enabling compatibility with mprotect. An interesting observation is that the required size by malloc equaled a Ruby object size.
00:33:05.680
Ultimately, if we allocate precisely 16 kilobytes, each Ruby page accommodates an additional object. To test this patch, I computed memory usage during RDoc generation as a benchmark.
00:33:44.720
The graph showcases memory usage over time. The x-axis represents time while the y-axis denotes memory consumption by the process. The red lines represent the master branch while the blue lines demonstrate the branch containing the applied patches from my changes. After running the benchmark 50 times, we can observe that the blue lines are typically lower.
00:34:35.440
In fact, our analysis shows reduced memory overall when applying these changes. This surprising outcome stems from how jd malek organizes memory allocation utilizing fixed size buckets.
00:35:33.040
In essence, allocating a suitably sized block encourages memory efficient usage, indicating that requesting more memory can actually help reduce overall usage.
00:36:00.480
This is why I titled my talk 'Reducing Memory by Using More,' because it encapsulates a discovery regarding how our memory management works.
00:36:44.080
The pivotal part of the patch is where heap page size now aligns with heap page alignment. Therefore, allocation respects the four-kilobyte boundary.
00:37:21.760
I made a pull request that implements a read barrier while combining compact and sweeps, using mprotect effectively. Unfortunately, the read barrier's performance turned out to be very slow.
00:38:04.400
Comparing the read barrier with the original compacting shows that the read barrier implementation is 4.5 times slower than before. This slow down raises the question of speed.
00:39:02.560
In responding to that question, I recall that with performance optimization always check if your compiler optimizations are enabled beforehand.
00:39:46.400
Reflecting upon this project, I initially felt dissatisfied as it appeared to slow down feedback. Truly, the project isn’t solely about enhancing speed but proving that we could implement a read barrier while facilitating concurrent compaction.
00:40:20.560
With that objective achieved, I realize the primary takeaway is about functional implementation rather than sheer speed. One essential point I want to convey is this: ensure that your code works successfully before attempting optimization.
00:40:59.520
This brings us to the conclusion of today’s talk; I see this issue in many different projects, and I admit to being a victim of it myself. It’s vital to establish the functionality of your code first, and once that is certain, focus on speeding it up. Thank you so much for having me! I wish I could have attended in person, but next time, I'll definitely be there.
00:41:38.080
Thank you all again! Hi, everybody!
00:42:09.760
Oh, you are not happy, are you?
00:42:56.569
Okay.
00:43:07.440
Has this ever happened to you? Changing directories, Git, and wondering if that is the right directory.
00:43:21.680
If only I had the Analog Terminal Bell! You’ll never miss a terminal bell notification again.
00:43:40.080
The Analog Terminal Bell is a device that rings anytime a bell character is displayed on your terminal.
00:43:56.440
Simply plug it into USB, then enable the Analog Terminal Bell in your terminal settings.
00:44:09.360
It works with your shell, it works with Vim, it even works with Devview.
00:44:16.720
Thanks to the Analog Terminal Bell, I'll never miss a terminal bell notification again. Go online and get yours today at AnalogTerminalBell.com.
00:44:29.680
Please note: the Analog Terminal Bell is not actually for sale and requires a custom build of item two. Other restrictions may apply; see the website for details.
00:44:52.960
Hello, hello! Just brilliant.
00:45:02.280
Thank you! I was totally not expecting that. Hello, there she is!
00:45:11.360
Unfortunately, not yet, but we’re gearing up to wrap up the day.
00:45:24.720
Let’s see if there are any questions.
00:45:51.120
The jokes in the chat have been delightful already! My question is, was that really a microwave plate?
00:46:05.040
That’s what I’ve been wondering throughout this talk! Months of work, and my real question is: was that a microwave plate?
00:46:29.120
It was indeed! I wanted something to spin on, and so I thought—oh, it's the microphone!
00:46:36.960
I was filming it, and Ebby was like, 'What are you doing?'
00:46:44.960
Now I have another question. That’s actually my question too!
00:46:51.680
For the audience, we asked all our speakers to send us a bio just to get to know them better.
00:47:01.760
In Aaron’s bio, he mentioned his interests: keyboards, cats, and codes. These are all hard C's.
00:47:14.880
But I noticed, why K and not C, like for cheese? Your Twitter feed is full of cheese!
00:47:28.320
That's a good point. Yes, yes! Bad joke should have been another one of my interests: cheese! What has it done to you?
00:47:41.760
Now, moving to the serious question in the final version.
00:48:04.960
Matteo wonders: Is concurrency simply with respect to sweeping and moving, or is it involved with other Ruby codes?
00:48:20.960
Actually, I must admit that when I started this project, I initially aimed to make it concurrent with freeing.
00:48:31.680
But thanks to being able to implement a read barrier, it is now concurrent with Ruby code as well. It runs alongside your Ruby code.
00:48:42.240
The only thing is that reference updating is not concurrent; there’s definitely a pause required for that segment.
00:49:04.640
Gregorio is asking: How does the cat feel about the Analog Terminal Bell?
00:49:24.000
The cat doesn’t mind it much; however, when the bell rings, my wife inevitably asks, 'What are you doing?'
00:49:35.040
Usually, the bell sound is loud! You might find my cat, Choo Choo, residing on my desk and she often sleeps on it.
00:49:55.680
In that case, I really cannot ring it while she’s on it! I could, but it might not be the greatest idea.
00:50:08.480
As we approach the conclusion, we’ve run out of questions, serious and otherwise.
00:50:26.240
If you had one piece of advice to offer the audience interested in diving deeper into internals, where should they stand?
00:50:40.480
I would recommend starting by writing a C extension if you want to begin hacking on MRI. Afterward, you can delve into Ruby internals.
00:51:05.680
A good book to read is 'The Garbage Collector's Handbook,' or just the GC handbook. It includes all the algorithms I've discussed, as well as those utilized in Ruby.
00:51:21.680
I believe what we’ve implemented in MRI isn’t innovative technology; rather, it’s generally known and awaits documentation in that resource.
00:51:45.680
The final question from Thiago is whether there’s any benchmark on large web projects.
00:51:56.480
We’ve conducted benchmarks, noting that memory savings hover around 10% for certain objects. However, the performance impacts of automatic compaction remain uninvestigated.
00:52:16.800
This warrants further exploration. Additionally, we must understand that this feature only engages during major GC runs, which should ideally arrive infrequently with proper tuning.
00:52:34.440
Thus, even if the performance dips slightly, it should not matter significantly in a larger context. Thank you profoundly for your time and participation today!
00:52:57.440
Of course! Thank you for having me!