What causes Ruby memory bloat?

00:00:05.520 We'd like to introduce you to Hongli Lai. Hongli has been a creator of Passenger, which has enabled many possibilities for Ruby developers, for which I am quite thankful. Recently, he gathered some input from the community regarding a memory-optimized Ruby distribution. Following that, he found responses on Hacker News suggesting looking into Ruby Enterprise Edition, which hasn't seen updates since version 1.8.7. Consequently, he reminded the audience that he has cared about memory optimization for quite some time.

00:00:31.439 He also knows some Japanese, which he learned through anime, so he might be more on the ninja side of things when discussing pirates versus ninjas. Now, let’s take a journey into memory bloat.

00:01:10.560 Thank you. It’s a great pleasure to be here. What causes memory bloat?

00:01:12.720 As the author of Passenger, I’ve had to deal with this issue for a considerable amount of time. This talk will be a little bit different. Some people may have already seen my previous presentations or read my blogs on this subject. However, this is not just a reiteration; I've made some surprising new discoveries lately, which I will discuss during this presentation.

00:01:58.799 I also had a deadline to submit this talk to EuRuKo so they could live caption it. Due to this, I've been working like crazy over the past week. At the end of the talk, I will share an announcement that is worth discussing.

00:02:22.080 At Fusion, we operate a proxy server written in Ruby to serve Debian and RPM packages. It's a very simple application; it simply proxies requests to another server and returns the modified results. However, I noticed that over time, the memory usage would balloon to 1.3 gigabytes. This seemed excessive for such a simple Ruby application consisting of less than a thousand lines of code.

00:02:48.000 The excessive memory usage led me to question why a simple Ruby application would require so much memory. I started hearing rumors about Ruby memory bloat caused by memory fragmentation, and I discovered that Ruby contributors, including Nate Berkal, have discussed this issue. They suggested either setting a specific environment variable or using a custom memory allocator, known as jemalloc.

00:03:11.600 After implementing their suggestions, I was able to reduce the memory usage by more than half. But if only it were that simple! Their solutions do work, but something seemed off with my understanding of memory allocation. Their explanations appeared incomplete.

00:03:32.680 In this talk, I will focus on three main points: first, how memory allocation works at a basic level; second, why this memory bloating occurs; and third, what we can do about it. Importantly, the problem I'm discussing manifests mainly in multi-threaded Ruby applications running on Linux. It doesn't typically happen on macOS or FreeBSD, nor does it occur on Linux if threads are not used.

00:04:00.239 It's ironic because multi-threading should ideally provide more concurrency with minimal added memory usage. Additionally, it's worth mentioning that Ruby is not the only application facing this issue; Redis and other servers encounter similar challenges. The problem is not isolated to Ruby alone.

00:04:39.680 Let's start with Memory Allocation 101. The operating system includes a library called the memory allocator, which tracks used and available memory. The malloc function allocates memory of a requested size and returns the memory address. In contrast, the free function deallocates memory at a given address.

00:04:58.560 Memory allocation consists of two parts: one for the application, known as user space, and the other managed by the OS kernel. The kernel allocates memory in blocks of 4 kilobytes, referred to as pages. It’s important to know that the kernel can only allocate or free memory in these entire OS pages, which is a fundamental property of all modern kernels.

00:05:38.080 Meanwhile, the user space memory allocator can allocate memory in smaller pieces. To achieve this, the allocator requests large blocks from the kernel and then divides these blocks into smaller pieces for specific uses. This process creates an OS heap, from which the allocator carves out pieces of the requested size, yielding the addresses to the calling application.

00:06:06.060 As long as there is space in an OS heap, the memory allocator will continue to allocate from it. Only when the OS heap is full will it ask the kernel for a new one. However, on the Ruby side, memory management operates differently, with Ruby handling memory without allocating each object separately.

00:06:41.520 Ruby uses a strategy similar to the memory allocator. It requests memory in large chunks, referred to as Ruby heap pages. These pages consist of slots where different Ruby objects can reside—which could be strings, arrays, classes, or regex patterns.

00:07:06.880 In this system, when Ruby encounters larger objects, like lengthy strings or arrays, it allocates memory separately, pointing back to this separately allocated data from the base object slot. When the garbage collector runs, not only is the slot marked free, but any external data it points to is freed as well.

00:07:31.200 To summarize, two categories of memory exist in a Ruby process: Ruby heap pages with their associated slots for Ruby objects, and memory allocated outside the Ruby heap pages. The latter is managed by the typical memory allocator library.

00:07:56.320 Now I’d like to pose a question. I created a memory benchmarking application that allocates memory in a loop. After some time, memory usage rises to 230 megabytes. This application is multi-threaded. Here’s my question: how much of this memory is attributed to the Ruby side versus the OS heap?

00:08:26.880 How many people believe that the majority of memory belongs to the Ruby side? Raise your hand. And how many think it’s about 50/50? Now, who thinks that it’s mostly to the OS? Most hands showed a belief that the OS was the culprit, but this belief is actually misleading.

00:09:05.600 In reality, the memory usage distribution poses a different story. The Ruby objects utilize roughly 7 megabytes of memory. Despite the OS reporting 230 megabytes, only 7 of those are the Ruby objects with their associated external data. The discrepancy in usage points to a significant portion that is not necessarily Ruby’s fault.

00:09:33.920 The memory bloating can, in many instances, stem from how the memory allocator operates. When looking at the suggestions of using specific environment variables like 'malloc arena max is 2,' I discovered that reducing memory allocation leads to significant improvements. These environmental adjustments lead to the memory allocator handling requests effectively, without resulting in excessive waste.

00:10:10.560 When multiple threads attempt to allocate memory from the same OS heap simultaneously, contention can occur since only one thread can access that allocation at a time to maintain thread safety. To combat this, memory allocators adopt strategies where multiple OS heaps are generated, thus enabling each thread to access its own OS heap, avoiding contention issues that can arise.

00:10:43.120 However, this approach can lead to more fragmented memory usage. Observing that most OS heaps consist of either gray pages (unreleased memory) or red pages (used memory) sheds light on how memory is not efficiently utilized.

00:11:12.920 There's also a simple but effective API call within the Linux memory allocator: malloc_trim() which can help force the releasing of gray blocks of memory back to the kernel.

00:11:47.840 The visualization shows a stark difference in memory usage after invoking this function to release free memory back to the OS. A well-managed memory allocation approach leads to both lower memory usage and less fragmentation overall.

00:12:30.880 In conclusion, fragmentation does impact memory bloat, but the memory allocator's inefficiency in releasing unneeded memory compounds the issue. Although there's still optimization potential in addressing fragmentation, the underlying cause lies significantly in how the allocator treats memory.

00:13:10.160 We came across a solution that involves a straightforward API call from the Ruby garbage collector. Numerous tests from third parties have revealed as much as a 17 to 20 percent decrease in memory usage but with some performance drop, so careful consideration is necessary based on your specific workload.

00:13:52.480 There also remain sustained options for further improvements such as selectively calling the trim function or leveraging a more efficient memory allocator, like jemalloc, which can yield better memory management overall.

00:14:30.240 My work is ongoing, and I recently came across unexpected findings combining jemalloc with malloc_trim. It’s essential to determine how to best utilize these discoveries while avoiding mismatches in memory allocator versions.

00:15:07.840 Now, let’s move on to the exciting part of my talk. Alongside Full Stack, we are developing a new Ruby distribution aimed at packaging all this memory reduction work into a user-friendly format. This distribution, called Full Stack Ruby, intends to incorporate jemalloc and malloc_trim.

00:15:45.440 We plan to release binaries rather than source code to simplify usage. These will be distributed as Debian and RPM packages, which will integrate with OS package managers to ensure security updates are easily accessible.

00:16:17.960 This project is truly community-driven and open-source, encouraging contributions without any profit incentive. Our intent is to bring recognized optimizations into mainstream Ruby usage.

00:16:58.160 In summary, I'm reaching out to the Ruby community to explore interest in these optimizations. If you want to stay informed, please follow my blog and check out our repository for updates.

00:17:30.080 Lastly, I’d like to express my sincere gratitude to the audience and my team. Thank you for listening, and I hope this discussion sparks further interest in improving Ruby memory management. Thank you!