RubyKaigi 2024

Finding Memory Leaks in the Ruby Ecosystem

RubyKaigi 2024

00:00:05.960 Hello RubyKaigi! I'm Peter, and I'm Adam. Today we're talking about finding memory leaks in the Ruby ecosystem. You can find a PDF version of these slides by scanning this QR code or by following the link at the bottom.
00:00:30.439 Hi everyone, I'm Peter. I'm currently based in Toronto, Canada. I'm on the Ruby core team and a senior developer on the Ruby infrastructure team at Shopify, where I work on performance and memory management in Ruby. I am the co-author of the variable width allocation feature in Ruby, which improves the performance and memory efficiency of Ruby's garbage collector. I'm also the author of the Ruby Mam and Autotuner gems, and I'll be introducing the Ruby Mam gem in this talk. In my free time, I like to travel and take photos, which I post on Instagram.
00:01:10.960 I'm Adam Hess, based in Seattle, Washington. I am a staff software engineer on the Ruby architecture team at GitHub. My team's goal is to enhance GitHub by collaborating with the Ruby community to improve the Ruby ecosystem for everyone. I was an early Prism contributor and have a general passion for parsers and compilers.
00:01:30.840 Before we discuss fixing memory leaks, let's quickly go through an example of a memory leak. Ruby is a garbage-collected language, which means that Ruby will eventually free allocations made by your program automatically. In contrast, C is a manually memory-managed language, meaning that memory in your C program needs to be freed manually, or it will live for the lifetime of your program. In this talk, we will focus on C programs and the effects of memory leaks on them, as C programs can impact your Ruby applications since Ruby itself is written in C. Additionally, you may interact with Ruby C extensions.
00:02:16.200 Let's examine a quick example of a memory leak. Consider a program that reads a line from standard input, stores it into a buffer, and then prints it back out to the user. However, this program has a problem: it has a memory leak. For every line of input, it allocates a new piece of memory, meaning that as this program runs, it uses more and more memory over time. We could fix this leak by slightly modifying our program. The modified program still reads a line of standard input from the user, stores it in a buffer, and prints it back out, but this version does not have a memory leak because it allocates memory only once at the start and reuses the buffer. As a result, this program will not increase its memory usage over time.
00:03:05.599 If we run these two programs, we can visualize the differences. The red line here represents the program with the memory leak, showing that memory grows linearly as allocations are made in a loop. In contrast, the blue line, representing memory usage for the program that allocates memory only once, remains stable throughout the program's lifetime. Memory leaks are generally detrimental, making it difficult to run a program for an extended period. Eventually, your operating system may kill the process if it uses more memory than it should, potentially causing allocation failures and leading to crashes. Moreover, memory leaks reduce the available capacity on your system, as less memory will be free for other applications.
00:03:27.480 We just walked through fixing a simple memory leak, but what if your program is more complex? Memory errors can be very difficult to resolve, and developers often struggle with managing memory. Therefore, even experienced developers should use tools to help identify memory leaks. One such tool is called Valgrind, which is a memory analyzer that helps you find leaks in your program. If we run Valgrind on the two programs we discussed earlier, it will identify the memory leakage in the first program. However, it will also find a memory leak in the second program because Valgrind does not discern that we intend for that memory to persist for the lifetime of the program.
00:04:05.320 Both programs will report a memory leak of 256 bytes from one block, but if we run this over multiple lines of input, the number of blocks and bytes reported will change while the leak information remains relatively unchanged. This can be problematic for complex programs, especially if you are unaware of the exact number of blocks or bytes being leaked, complicating the evaluation of whether it constitutes a legitimate memory leak or is intended to last for the lifetime of the program. Consequently, the objective of this project was to assist Valgrind in identifying only meaningful memory leaks in our programs.
00:04:45.840 I will now discuss the prior art that inspired the Ruby free at exit feature. At RubyKaigi 2022, I presented a talk titled 'Automatically Find Memory Leaks in Native Gems,' where I introduced a tool I developed called Ruby Mam. Ruby Mam is a tool designed to find memory leaks in native gems. Adam will shortly demonstrate how Valgrind Memcheck functions, which is a memory tool that operates on Linux. Adam showed how to use Valgrind Memcheck to identify memory leaks, reporting blocks of memory allocated but not freed.
00:06:07.159 Valgrind can also detect invalid memory usage, such as the use of memory after it has been freed, commonly known as 'use after free,' or accessing memory that is out of bounds. Ruby Mam wraps around Valgrind Memcheck to find memory leaks and memory errors.
00:06:21.640 Let’s try Valgrind Memcheck on an empty Ruby script. After scrolling to the end of this output, we find nearly 70,000 lines and over 10,000 reported leaks. Note that this output has been truncated to about 300 lines due to Keynote's limitations; in reality, it is around 200 times longer on a non-trivial Ruby program. This demonstrates that Valgrind Memcheck, while an excellent tool, is sometimes unusable directly on Ruby applications.
00:06:37.360 This is because Ruby does not free its resources upon shutdown—a deliberate design decision, as the system reclaims all memory anyway when the process exits. Thus, attempting to free memory at shutdown would only slow the shutdown process. To filter the thousands of output lines from Valgrind Memcheck, Ruby can utilize heuristics to eliminate noisy false positives.
00:07:07.200 Ruby Memcheck works by inspecting the stack frames of each memory leak to determine whether the leak originates from the gem or from the Ruby interpreter. If a memory leak occurs in Ruby, it is filtered out as a false positive. However, this heuristic may create false negatives; for instance, it might fail to identify memory leaks where memory initially allocated in Ruby is leaked by a native gem. If a string were allocated in Ruby and passed into your native gem before changing the pointer of the string content buffer, forgetting to free the original buffer would lead to a memory leak. However, Ruby Memcheck would miss this issue as the original allocation occurred within Ruby.
00:07:51.920 With these heuristics and limitations, you might be wondering if Ruby Memcheck is effective at all. Does it actually filter out the noisy false positives while maintaining the detection of real memory leaks? It turns out that it successfully identifies many memory leaks in popular and commonly used native gems, such as No-Cache, Liqui-D, gRPC, and Protobuf. It has even identified a memory leak within Ruby itself.
00:08:49.760 Here are just some of the memory leaks that were found and subsequently fixed by Ruby Mam. While Ruby Mam functions effectively, I felt uncomfortable with it as it was largely a hack circumventing the fact that Ruby does not free its memory during shutdown.
00:09:10.040 The real solution is to build a feature that frees all memory within Ruby upon shutdown. Unfortunately, I did not have enough time to work on this feature, given that there were numerous locations within Ruby that leaked memory upon shutdown.
00:09:27.440 Another limitation of Ruby Memcheck arises from its reliance on Valgrind APIs. As Valgrind only operates on Linux, Ruby Memcheck is unable to run on other platforms. This restriction prevents the use of other memory leak tools, such as the Leaks tool on macOS, or faster memory leak checkers like Google's sanitizers. Last year, when Adam reached out to me to contribute to this feature, I eagerly accepted the opportunity to collaborate with him to make it a reality. Adam will soon explain how we implemented this feature and how you can utilize it.
00:10:25.760 That was a brief overview of how Ruby Mam works without delving into specific details. For an in-depth explanation of Ruby Mam's inner workings, you can read my blog post linked on the right, or watch my RubyKaigi 2022 talk linked on the left. I will now hand off to Adam, who will discuss how the Ruby 'free at exit' feature was implemented.
00:11:14.320 We can indeed do more! As Peter mentioned, Ruby Memcheck was an incredibly useful and powerful tool but has limitations due to the heuristics it employs. Let's revisit our example program quickly. This program allocates memory once at the top, but Valgrind perceives it as leaking memory. By slightly modifying the program to free memory before exiting, we change Valgrind's output to indicate that no blocks were leaked and no leaks are possible. The question now is, can we achieve similar results for Ruby?
00:11:52.440 The answer is yes! Spoiler alert: I wouldn’t be giving this talk if it weren’t possible. I collaborated with Peter in early September, and we merged this feature in November in anticipation of the Ruby 3.3 release. It is an experimental feature in Ruby 3.3 and will receive further improvements in Ruby 3.4. The feature primarily revolves around the environment variable called 'RUBY_FREE_AT_EXIT'. When this variable is set for your Ruby program, it instructs Ruby to free the memory allocated at the end of your program.
00:12:27.760 This setup allows us to avoid incurring any additional shutdown costs unless you have enabled this variable. The implementation follows a straightforward approach: we check if 'free at exit' is enabled, then we run through and free every memory lookup we've identified.
00:13:01.640 For most objects in Ruby, this approach works effectively. However, there are cyclic problems to address. In some cases, cleanup processes in Ruby require certain objects even later in the shutdown process that are freed earlier. Let me illustrate one of those scenarios.
00:13:26.360 During Ruby's shutdown, most cleanup occurs in the execution context cleanup step, which calls finalizers. With 'RUBY_FREE_AT_EXIT' enabled, living objects freeze before shutdown. However, later in this cleanup stage, Ruby destruct is called, freeing the internal structures Ruby uses to track the program’s state. One of these structures is the Ruby Global table, which references Ruby arrays. If 'RUBY_FREE_AT_EXIT' is enabled, this table cannot be freed because the Ruby arrays were already released prior to it.
00:14:06.560 To resolve this cycle, we have delayed freeing the arrays until after Ruby destruct has cleaned up structures like the Ruby Global table. You may be wondering what the experience is like when using 'RUBY_FREE_AT_EXIT' with your program. With this variable disabled, running an empty Ruby script reveals over 100,000 reported leaks—numerous lines attributed to definitely lost or possibly lost objects.
00:14:42.680 In contrast, enabling 'RUBY_FREE_AT_EXIT' reveals only 32 lines of potential leaks, reflecting that the items remaining in the output are those we hope to rectify in future versions. Additionally, with this variable enabled, there are no amounts of definitely lost or possibly lost memory. Furthermore, during the investigation, we identified existing memory leaks in Ruby that had been fixed in the process, proving the utility of this feature.
00:15:39.480 This demonstration unveiled how this feature operates, but you may be curious about how useful it proves and what memory leaks it detects. We attempted to find memory leaks using tools with the Ruby 'RUBY_FREE_AT_EXIT' feature activated. Specifically, we employed Valgrind Memcheck on Linux and the Mac OS leaks tool, and shortly, you'll witness how each of these tools functions. We executed tests and specifications in Ruby one at a time to discern which tests were leaking memory. Following that, we reduced these tests to concise reproduction scripts using the tests and specs, identifying over 20 memory leaks within Ruby.
00:16:40.680 Let's examine one of these memory leaks in detail, which occurs when a regular expression match times out. In the illustrative script, we set the regular expression timeout to a short duration of 1 millisecond. We create a regular expression and a long string to ensure that the match will time out before completion. Encapsulating this in a rescue block handles the inevitable timeout error. We perform this in a loop 100 times to amplify the memory leak visibility, tracking the memory usage of the Ruby process using the PS tool over ten iterations to observe growth.
00:17:50.080 Before rectifying this memory leak, we note that the memory usage grows linearly with each iteration, culminating at approximately 3 GB. This inefficiency indicates a severe memory leak. After addressing this memory leak, we observe a less significant increase in memory usage, registering from 39 MB to 56 MB, remaining stable afterward. This is a drastic improvement compared to the previous 3 GB observed before the memory leak fix.
00:18:51.760 In addition to this, we can visualize the differences in memory utilization through a graph. After a certain iteration, the memory consumption begins to flatten, confirming the success of the tweak. We can also demonstrate this memory leak using Valgrind Memcheck, providing valuable information regarding its source of allocation. Running the command to analyze Ruby with 'RUBY_FREE_AT_EXIT' enabled produces significant insights, enabling us to trace the memory leak and its allocation source, which is particularly beneficial for debugging.
00:19:25.040 In the output summary, Valgrind identifies three separate memory leaks in total, leaking roughly 31 MB of memory. This enables us not just to identify leaks but also facilitates the use of various other memory leak checkers, beyond just Valgrind Memcheck. We can apply the leaks tool on macOS, which reports similar findings. The command run inside the macOS leaks tool with the added --exit flag to check for memory leaks when Ruby terminates provides a summary that mirrors the previous results.
00:20:35.360 This similarity highlights the effectiveness of the Ruby 'free at exit' feature. The memory management in Ruby, as evident from the leaks tool, confirms that this approach to detecting memory leaks proves beneficial. To remediate this particular memory leak, we changed the function validating the timeout to return true instead of raising an error when it times out, allowing cleanup to occur prior to the error being raised.
00:21:05.120 The cleanup function allows us to first free the leaking memory before raising the timeout error when the memory has already been cleaned up. In summary, this gives us a robust method to monitor and manage memory leaks not only in Ruby but also in native gems through the presented feature.
00:21:57.240 You might think this feature solely identifies memory leaks in Ruby, but it can also serve to pinpoint memory leaks in native gems or help debug memory leaks in Ruby applications. If you're a maintainer of a native gem, this feature will assist you in finding leaks in your gem, utilizing this along with your test suite. It’s straightforward to run a memory leak tool with 'RUBY_FREE_AT_EXIT' enabled.
00:22:36.720 There's also a more adaptable solution in the Ruby Mam gem that adjusts for Ruby versions supporting 'RUBY_FREE_AT_EXIT' while simplifying usage of Valgrind Memcheck on your test suite. For Ruby versions that do not support this feature, such as 3.2, it continues using the existing heuristics to filter out the memory leaks.
00:23:00.640 If you're a maintainer of a native gem, I strongly encourage you to try out Ruby Mam if you haven't already. Memory management in native applications can be extremely challenging, especially in edge cases like error handling. Ruby Mam has successfully identified memory leaks in several well-known native gems.
00:23:36.160 This allows developers to grasp the extent of memory leaks not only within their Ruby applications but also within the native extensions they rely on, leading to a more stable and efficient Ruby environment. You can scan this QR code to access a PDF version of these slides. If you have any questions or simply want to reach out, feel free to connect with us via email, Twitter, or Mastodon. Thank you for attending our talk!