Hunting Production Memory Leaks with Heap Sampling

00:00:00.900 Welcome to our talk on hunting production memory leaks with heap sampling. My name is Ivo Anjo, and I've been working with Ruby since about 2014. I always describe myself as a bit of a weird Rubyist because I've done a lot of work around performance and performance tooling. I also completed a PhD where I focused on shared memory concurrency. This unique background is how I ended up at Datadog, where I built Datadog's continuous profiler for Ruby, which is shipped as part of the DD Trace gem.

00:00:12.240 My name is KJ Tsanaktsidis, and I work at Zendesk on our 1 million line Rails application. I'm relatively new to Ruby, having started this work on our Rails app at the end of last year. My focus is on performance and reliability aspects of the monolith, which means I spend a lot of my time looking at the profilers that Ivo builds. That's how we were introduced.

00:01:11.700 In today's talk, I will first provide some background on what heap profiling is and what existing solutions are available in the Ruby ecosystem. Then we will discuss the work we did on the Ruby Mem Profiler gem and the Back Tracy gem to enhance the profiling space in Ruby. Finally, we will explore where we can go next after this work.

00:01:38.420 So, what is heap profiling? My interest in this area started because people would come to me at Zendesk with questions about memory issues. They would say things like, 'Help! My application's memory usage slowly increases over time, and whenever we deploy, it goes down again. I suspect I have a memory leak.' Occasionally, they would show me a graph where memory usage might spike suddenly, leading to application failures. However, these issues often had a nasty characteristic: they only seemed to happen in production. If you attempted to replicate the issue on your development machine, where performance is not a primary concern, the problems would mysteriously vanish.

00:02:32.759 In situations like this, I would reach for a heap profiler from another language, such as Go or C. What a heap profiler does is provide a snapshot of the program's memory at a specific point in time. It collects data about every object in the program, indicating characteristics such as whether it's a data object, a class object, or a Ruby-specific type. We want to know the size of each object and which piece of code caused it to be allocated, as this information is crucial for identifying the source of any issues.

00:03:16.319 For example, imagine we have a simple Rails application that allocates memory in two ways. The first allocation is very short-lived; when a request is made, a ticket is loaded and this object will go out of scope when the request finishes. It will likely be garbage collected in the next cycle. However, there is a second type of allocation that persists indefinitely, creating a memory leak. Each time this request occurs, we add an instance of this class. Over time, as we continue to handle requests, our memory usage grows.

00:04:10.500 Using a heat profiler for this scenario provides valuable insights. We can take snapshots of the memory state at different times, allowing us to see how memory is changing. By comparing these snapshots, we can identify what is contributing to the growth in memory usage and track down the offending code responsible for allocating excessive objects.

00:05:00.000 Now that we understand the concept of heap profiling, let's look at existing solutions available in Ruby. Ruby has had memory profiling tools for quite some time. For instance, there are gems such as Saffron's Memory Profiler and Shopify's Heap Profiler, which help gather information and provide lists of top objects that might be causing issues. Other notable gems include Derailed Benchmarks, which focuses on helping reproduce issues by executing endpoints, and Hippie, which helps analyze heap dumps.

00:05:51.240 All of these gems leverage the ObjectSpace module and its extensions that Ruby provides. For example, the ObjectSpace API includes methods like trace_object_allocations_start, which records the source file and line information for allocated objects. This can be extremely useful for debugging memory issues. The 'dump_all' method allows users to obtain details about the entire heap, resulting in a JSON representation of all objects.

00:06:54.000 However, using these APIs in production comes with challenges. When tracing object allocations, there is increasing memory usage because Ruby maintains extra information for every allocated object. Additionally, the internal workings of Ruby require maintaining a hash that holds object keys and corresponding values. When objects are created and destroyed, this hash has to be updated, leading to increased memory consumption and more CPU work.

00:07:43.500 Moreover, the dump_all method can pose security risks. It may inadvertently include sensitive information such as credentials for database connections or personal identifiable information (PII). These could pose compliance issues with regulations such as GDPR in Europe. Thus, if you capture a heap dump from production, you must carefully consider the data contained within and how to store and handle this data, as heap dumps can tend to be large, taking up significant storage, especially in ephemeral environments.

00:08:42.960 Heap dumps also block application execution, leading to challenges in multi-threaded environments. If you have a Puma web server or Sidekiq running threads, there's uncertainty about when to safely pause your application for heap dumping. Further compounding this is the desire to compare multiple heap dumps, requiring frequent pauses and storage management for large files.

00:09:54.600 These challenges sparked the development of the Ruby Mem Profiler gem. We aim to implement heat profiling without relying heavily on ObjectSpace, as we found that was often not suitable for production use.

00:10:06.300 To achieve this, we devised a method to keep track of object allocations using the TracePoint API. This approach involves creating a hash of allocated objects, where we store information about their creation and the code that created them. A separate thread runs in the background to periodically save this data to disk, resulting in a profile of memory usage over time.

00:11:00.000 Initially, we chose an efficient binary protobuf format for writing this data, as it's already used by Datadog's CPU profiler. This choice allowed us to focus on information collection while delaying visualization challenges until later.

00:11:52.500 However, our initial implementation proved to be slower than anticipated, similar to the ObjectSpace approach. Thus, we sought to enhance our performance significantly. We opted for biosampling, where we only record a small percentage of object allocations in our hash. This results in reduced memory usage, allowing us to gather average data across multiple servers.

00:12:52.620 Our profiling tool had a challenging task, given the requirements for real-time performance. For every allocated object, the profiling needs to run efficiently without significantly slowing down the application. We noticed that a significant amount of time was spent on streamlining object allocations.

00:13:10.800 To improve this, our solution involved diving deep into Ruby’s internals through C extensions. By manipulating Ruby’s stack more directly, we were able to handle back traces more efficiently without incurring the same overhead, which helped minimize profiling overhead.

00:14:40.800 However, it still wasn’t fast enough. We also recognized that many objects were short-lived, meaning the effort to track them could be wasted. Hence, we decided to defer the construction of backtrace strings until they were needed for profile file generation.

00:15:25.800 Instead of retaining complete backtrace strings for every allocation, we could simply store pointers to Ruby’s internal representations. Only when generating the profile file would we then construct the complete backtrace strings, dramatically improving the efficiency of our memory profiling.

00:16:30.120 While our optimization attempts significantly improved performance, we still faced delays when constructing the complete profiles under load. Our strategy was to execute as many tasks as possible outside the Global VM Lock (GVL), allowing smoother application performance.

00:17:00.180 We achieved our objectives and conducted micro-benchmarks that indicated low overhead with minimal sampling rates. While deploying to Zendesk's pre-production environment highlighted some performance bottlenecks, we felt we were in a promising direction because traditional methods like ObjectSpace would introduce far more latency.

00:17:49.740 Overall, the journey to optimizing the Ruby Memory Profiler yielded valuable insights, particularly regarding object allocations in Ruby. Furthermore, exploring general profiling strategies led us to discover tools that were effective in analyzing C extensions, which we found particularly useful in debugging.

00:18:43.920 Upon using the profiling tool, we generated flame graphs to visualize memory usage across various stacks effectively. These visualizations provided critical insights into our application's memory pressures and highlighted specific areas requiring attention.

00:19:38.880 Despite our successful strides, we sought feedback from users and highlighted the importance of developing awareness surrounding memory profiling within the Ruby community. Our goal is to create a tool that users can experiment with while gathering valuable insights to enhance the products.

00:20:21.120 To aid backtrace clarity, we created a separate gem named Back Tracy. This gem enhances the traditional Ruby backtrace by providing additional contextual information, crucial for understanding object states during development. For instance, it captures relevant module and class details when generating backtraces.

00:21:20.280 The Back Tracy gem also offers visibility to understanding complex stack frames and allows developers to customize how they visualize this data. For our memory profiling efforts, this tool complements the Ruby Memory Profiler and helps trace memory problems more effectively.

00:22:45.240 In the future, we aim to continue improving performance while ensuring that our profiling methodologies cater to wider use cases. We envision exploring facilities that enable tracking of additional memory sources beyond just Ruby objects, including native memory.

00:23:20.880 We also aim to provide clearer reference chains that may reveal why certain objects remain alive, aiding in debugging efforts. Additionally, we would like to see enhancements to Ruby's VM tools and accessibility to make profiling more seamless, potentially fostering the creation of built-in Ruby profilers.

00:24:00.870 Thank you for your attention, and we look forward to your feedback on Ruby Memory Profiler and Back Tracy. Both gems are available on RubyGems, and we encourage everyone to explore these resources. Your thoughts will help us refine these tools further. We're also actively hiring, so please connect with us if you're interested.