Talks
RubyMem: The Leaky Gems Database for Bundler
Summarized using AI

RubyMem: The Leaky Gems Database for Bundler

by Ernesto Tagwerker

In this talk titled "RubyMem: The Leaky Gems Database for Bundler," Ernesto Tagwerker discusses the challenges of memory leaks and out of memory errors in Ruby applications, particularly focusing on how dependencies can sometimes be to blame. The session emphasizes the importance of using the Bundler Leak tool to identify problematic gems known to leak memory. Key points from the presentation include:

  • Introduction to Memory Issues: Tagwerker explains the difference between memory leaks and memory bloat, highlighting how these issues can arise from both application code and third-party gems.
  • Personal Anecdote: A story is shared about a performance optimization task involving a GIF search engine, where a cache implementation inadvertently led to memory bloat and deployment issues. This experience underscores the need for careful memory management practices.
  • Debugging Process: The speaker describes various debugging tools used to identify memory issues, including Ruby's ObjectSpace, Rack Mini Profiler, and Memory Profiler, and how they can help isolate the root cause of memory problems.
  • Introduction of Bundler Leak: Tagwerker introduces Bundler Leak as a solution to quickly check for known memory leaks in dependencies. By running a simple command, developers can ascertain if any of the gems in their Gemfile are associated with memory leaks and take necessary actions.
  • Collaboration and Community Contribution: He elaborates on how Bundler Leak was developed from the community-driven project Leaky Gems and is aimed at better assisting Ruby developers in managing memory concerns effectively.
  • Future Aspirations: Tagwerker calls for contributions to the Ruby Mem Advisory DB, a database tracking gems with memory issues, highlighting the communal effort needed to keep Ruby applications running efficiently.

In conclusion, Tagwerker advocates for leveraging tools like Bundler Leak to streamline the identification and resolution of memory leak issues in Ruby projects, ultimately enabling developers to focus more on coding rather than debugging performance problems. The session combines practical advice with community ethos, encouraging developers to contribute to open-source projects and share their findings to enhance the overall Ruby ecosystem.

00:00:01.599 Hello, my name is Ernesto Tagwerker, and I am here to talk about RubyMem: the Leaky Gems Database for Bundler. This is a relatively new project that helps us identify dependencies in our applications that are known to have problems with memory usage.
00:00:21.600 If you have any questions or comments, feel free to reach out to me on Twitter. My handle is @etacworker. I'm originally from Argentina, so if you hear any words that sound funny, it's because English is not my native language.
00:00:30.880 I've been living in Philadelphia for the past four years with my wife, daughter, and son. I really enjoy it here as there is a small Ruby community that I help organize. Due to COVID, I've been spending a lot of time at home, working and taking care of my kids. My daughter loves to draw, and I enjoy drawing with her. She really wanted me to share one particular drawing with you, which is why you will see some of my own drawings in this presentation. I apologize if they're not as good as hers.
00:01:04.160 One last thing about me is that I love open source. I wouldn't be here speaking if it weren't for open source projects like Ruby on Rails and many others. Therefore, I try to give back to the community as much as possible. I'm one of the maintainers of the Database Cleaner gem, which you might have used in the past; it helps you maintain a clean state of your database between test runs.
00:01:34.240 I also maintain NexRails, which is a toolkit for your next Rails upgrade project. It assists with tasks such as dual booting your Rails application and finding incompatibilities with the next version of Rails. Additionally, I work on Skunk, a project that helps you assess technical debt in your Ruby applications. It analyzes all the files in the application and identifies the most complex files lacking test coverage, using tools like Ruby Critic and SimpleCov.
00:02:04.480 If you're interested in any of these projects, please feel free to check them out on GitHub. I'm also the founder of Ombu Labs, a small software development shop based in Philadelphia. We love Rails, Ruby, and working remotely.
00:02:25.920 If you're interested in working on challenging projects and working remotely from anywhere, feel free to check out our page. We are currently hiring, and we would love to hear from you.
00:03:03.520 A couple of years ago, we found ourselves working a lot on Rails upgrade projects and Ruby rescue projects, so we decided to launch a service called Fast Ruby. The work we do with Ruby rescue projects often involves performance optimization, which requires us to look into the memory usage of not just our clients' codebases but also their dependencies.
00:03:36.720 Many of the tools I'm going to talk about today are used in our client projects on a weekly basis. When I talk about memory issues, I'm not referring to forgetting something and then realizing it later. I mean applications using an excessive amount of memory—so much that it becomes a problem. This can happen with browsers like Chrome on your Mac, for example.
00:04:00.480 I will be discussing memory issues such as memory leaks or memory bloat. Sometimes, the root cause can be in your Ruby gems, and sometimes it might be within your application code or even C extensions within your Ruby gems. This can be a tricky issue.
00:04:20.160 I find the best way to illustrate this is by telling a story, so it's story time! A few years ago, my boss approached me and asked if I could help with his project. I answered, 'Sure, how can I help?' He explained that he wanted me to make his GIF search engine faster.
00:04:50.640 Having never heard of the project before, I asked him to show me what it looked like. He described it as a simple search box where you plug in a keyword and get five GIFs in response. I thought, 'That’s a basic feature; I wonder how I can improve it.' I decided to add a simple cache behind the scenes. I found out that it was a Rhoda application, not a Ruby on Rails one. Although I often work with Rails, this one was using a different framework, which is also rack-based.
00:05:37.680 When I looked at the search action, I noticed it was using a query object that made third-party requests to the Giphy API. Given the vast collection of GIFs hosted at Giphy, it was clear this was essentially a wrapper for the API. I thought it would be straightforward to implement my cache in Ruby using a hash.
00:06:36.480 I worked for a few hours and came up with a solution. I created a query cache constant that was a hash object to store values from the Giphy API, ensuring I wouldn't contact Giphy more than once per keyword. I got it reviewed, tested, and everything looked good. Feeling satisfied, I deployed the changes, which included my caching solution. After upgrading some of the dependencies, my boss was pleased, and so were the end users.
00:07:12.320 However, the next day, I received reports from users about errors in production. They couldn't find the GIFs they were looking for, and I immediately felt responsible because I was the last to deploy changes. I checked the logs and found multiple memory usage error messages, including 'R14 memory quota exceeded.' My boss also reached out to me on Slack regarding unusual behavior in our production memory graph. As I checked the Heroku dashboard, I noticed memory usage had been rising consistently since deployment.
00:08:14.640 Twelve hours into the deployment, we had run out of memory. Although I confirmed this looked bad, I felt it couldn't possibly be my small changes that caused the issue. I thought it might be Rails, but remembered I wasn’t using Rails. Then I considered the possibility that it could be an issue with Rhoda, especially since I had deployed a version bump for it.
00:08:47.600 When I mentioned this concern to my boss, who has more experience, he said, 'What’s more likely, that Rhoda has a memory bug or that your code has a memory bug?' He suggested I check the changelog for the latest deployment.
00:09:19.760 I agreed that investigating the changelog was a good idea and told myself to start debugging the issue. I considered various debugging tools, knowing Ruby has tools in its standard library. I decided to use 'ObjectSpace' to identify how many objects were instantiated and retained in memory during the application requests.
00:10:00.799 I included some code to count all object instances in Ruby memory and report back which types were the most common at that moment. As I used the application with the new method, I observed that hashes and strings were increasing, but I still wasn't convinced my application code was the problem.
00:10:18.080 To investigate further, I opted to use Rack Mini Profiler and Memory Profiler to analyze the situation in more depth. I added these gems to my application, making sure to require the libraries right before starting my application. The good news was that Rack Mini Profiler loads as a rack middleware, allowing me to add it easily to my Rhoda application.
00:10:55.680 Once integrated, I could see memory usage information by configuration parameter; by appending 'pp=profile_memory' to my application requests, I could examine how memory was being allocated by gem and by file, among other details.
00:11:48.320 I needed to observe how the memory behaved across different requests and see if all the objects added were being garbage collected. After several hours of debugging, I discovered that my code was indeed the root cause of the memory issue. I revisited the changes I had deployed to pinpoint the problem.
00:12:12.599 It became clear that my cache had limited memory space, but the queries I received could be infinite; users could search for random keywords, and it would result in my application’s memory filling up until it eventually ran out of space. It seemed my solution was leading to a significant memory bloat.
00:12:46.479 To resolve this, I decided to revert my changes and redeploy the application. Upon doing so, I noticed a dramatic drop in memory usage on the Heroku dashboard. I shared this update with my boss, who inquired about the root cause of the memory leak. I explained that my cache could potentially use infinite memory while we only had a fixed amount of 500 megabytes available in our Heroku dynos.
00:13:30.000 His follow-up question was, 'Why did you take so long to fix it?' I explained that I had shipped multiple changes and couldn't determine which one caused the issue, as I didn't know whether the leak was in my application code or in my dependencies.
00:14:05.760 And just like that, story time is over. Now let's reflect on the lessons we learned from this experience. First, anyone can introduce a memory bloat or memory leak; it could be you, a teammate, or even a well-intentioned open-source developer trying to improve their library.
00:14:25.920 These issues are easy to introduce but often challenging to identify. This task becomes even harder when you're consistently deploying changes to an application, as you can't monitor how instances behave over time. However, the good news is that there are effective profiling tools available in Ruby.
00:14:57.679 I’ve mentioned just a couple, but many others exist, such as Flamegraph and Derailed Benchmarks. After this experience, we started asking ourselves, 'If only there were a way to quickly determine whether it’s my code or one of my dependencies leaking memory.' This conceived the need for Bundler Leak.
00:15:32.400 The purpose of Bundler Leak is to answer the question—are we using any Ruby gems known to leak memory? To begin utilizing it, install the library in your Gemfile. Once that's done, run 'bundle leak,' and it will tell you if any dependencies are known to be leaking memory.
00:15:53.920 If nothing is found, you can confidently investigate your application code as the root cause. However, if something is reported, it will specify the Gem causing the issue and provide a link to the source of the memory leak report. You can then examine your Gemfile to see if you are explicitly using that version.
00:16:29.600 You may find yourself wondering whether you really need that specific version or if you can upgrade to a patched version. If so, you simply change the dependency declaration in your Gemfile and run 'bundle update.' After that, run 'bundle leak' again, and hopefully, it will show that there are no longer any leaky dependencies.
00:16:53.440 This approach can save you significant time in addressing memory leak issues. In fact, someone likely reported that memory leak to the maintainers of the problematic gem, who dedicated hours to patch it. With Bundler Leak, you can quickly leverage that information to save valuable time.
00:17:18.080 Now let's ensure we aren't still leaking memory. If so, focus on your application code and identify what actions might be causing memory issues. I would like to provide a bit of information about the implementation of this project.
00:17:41.920 This project was inspired by Sergey Alexeev's project called Leaky Gems, which aimed to track reports of Ruby gems that had memory usage problems. Essentially, it functioned as a README that users could contribute to through pull requests.
00:18:04.480 We took that format and transformed it into a machine-readable format that could serve as the foundation for a command-line tool. This evolved into the Ruby Mem Advisory DB, which tracks every gem known to have issues with memory leaks.
00:18:30.400 Our reports provide clear information, including gem names, the date data was reported, affected versions, and patched versions, giving you a straightforward method to address any memory leaks.
00:18:59.840 Additionally, Bundler Leak is a fork of Bundler Audit. If you're unfamiliar with Bundler Audit, it's a tool designed to identify security vulnerabilities in your gems, informing you if you're utilizing a vulnerable version and guiding you to where you can find more information.
00:19:32.320 If we compare both tools, you'll notice they are very similar. To run Bundler Audit, you execute 'bundle audit,' while to run Bundler Leak, you simply type 'bundle leak.' Each one corresponds to their respective advisory databases: Ruby Sec Advisory DB for security vulnerabilities and Ruby Mem Advisory DB for leaky gems.
00:20:02.720 However, Ruby Sec and Ruby Mem are unrelated; Ruby Mem is an independent project that leverages known leaky gems as a database.
00:20:30.080 As such, we have several ambitions for Ruby Mem and Bundler Leak. We have observed that the more reports we gather, the greater value we can provide to the Ruby community. Many libraries are still unknown and may have memory issues that we're not tracking.
00:20:53.920 If you know of any leaky gems that caused problems some time ago, check the Ruby Mem Advisory DB to find out whether they are tracked, whether the leak has been patched, or remains unresolved. If leaked memory is a security concern, consider submitting privately to RubyMem.com.
00:21:14.080 When a gem has been patched and that information has been available for weeks or months, feel free to submit an issue to our GitHub project. Additionally, we are looking for contributors and co-maintainers. If you are keen to contribute to this project, running 'bundle leak' in your projects can help identify any unexpected behavior.
00:21:52.079 The next time you're faced with the question, 'Is it my code or is it a dependency leaking memory?' you can save a lot of time by looking into Bundler Leak first. Once you've checked it, dive into your application's code and dependencies to get to the bottom of the issue.
00:22:12.000 If you are interested in this project, it’s available on GitHub. Should you have any questions, don’t hesitate to message me on Twitter. I’d like to express my gratitude to Sergey Alexeev for creating the original project that inspired this work.
00:22:37.480 Thanks also to Nate Berkopec for suggesting this idea for such a cool tool, and to my coworker Luis Sagastume for partnering with me to implement this project. Thank you all for your time, and I hope you find this tool useful in your performance optimization efforts. If you're looking for additional resources, feel free to check out these pages.
Explore all talks recorded at RubyKaigi 2020 Takeout
+17