Real World Ruby Performance at Scale

00:00:17.420 Hello, everyone! I'm going to talk about real-world Ruby performance. It's a topic I'm very excited about, so if I seem enthusiastic up here, it's because I truly am. And this is the last talk of the conference, so I know you're all a bit tired. It’s been a long couple of days, and maybe some of you had a couple of drinks or stayed out late talking to people. That's totally fine. My goal is to bring enough energy to keep everyone awake. If you find yourself falling asleep, that's okay, but please don’t start snoring, as that might insult me! I realize having the last talk on the last day can be a bit challenging, but we'll get through it together.

00:01:10.140 Before diving into the topic, I want to give a quick shout-out. There are three individuals who have worked on various tools and made significant contributions to the Ruby language itself, paving the way for many developments in Ruby 2.1. I’d like to acknowledge my great friend, Sam, from GitHub—who I have only interacted with online but is an incredibly talented developer. Additionally, there's coichi, who recently gave an amazing talk on the incremental garbage collector that I hope everyone saw.

00:01:57.960 Let’s get into the content now. I want to emphasize that this talk was actually quite challenging to write. I've given many talks before, and I noticed that several speakers mentioned imposter syndrome recently. You see individuals like myself, Aaron Patterson, or Sandy giving talks, and you might think we don’t feel nervous. But the truth is, I get really nervous, and I'm sure most speakers do. If you ever feel like speaking, I encourage you to do so. Sharing your experiences is crucial for community growth. Over the last few years, I’ve learned a tremendous amount about performance—specifically Ruby performance—while scaling a large application.

00:03:10.610 There's so much to cover that I wasn't sure what to talk about. However, I realized that tips and tricks are like cliff notes for tech learning. It’s more important for me to share ideas and philosophies rather than mere snippets. I've been a mentor for junior developers and, while I could point out slow methods, it's far more valuable to help them understand processes and methodologies rather than just focusing on isolated code snippets. If you take away anything from today, I hope it's the overarching process rather than just specific tools. Ruby and its community are evolving rapidly, and it’s vital to focus on how to approach problems that will stay with you for life.

00:03:44.030 Today, we're going to look at Ruby performance through the lens of therapy. As someone who has been in psychotherapy for a couple of years, I see parallels between the two. If you're Jewish and have interesting parents, you might relate! I want everyone to relax for a moment. Feel free to close your eyes, but not for too long, or you might doze off. Therapy, in any form, is usually a multi-step process, and I plan to illustrate how these steps not only apply to improving your code but also to personal development.

00:04:36.090 The first step in any therapy session is acceptance. In regard to Ruby performance, it's your fault! Yes, it truly is! As the philosopher says, 'it’s not you; it’s me.' Performance is all about context. Several years ago, a controversial statement circulated in the community that 'Rails doesn't scale,' and that’s something we are still grappling with. This brings emotional and intellectual turbulence, and it's just not true. How can you discuss any performance issues without considering reproducibility and the necessary context?

00:05:12.060 I like to think about this context similarly to how I discuss performance with my parents. When I tell them that we saved a few milliseconds off a page load, they ask, 'How do you even measure that?' They don’t understand what a few milliseconds mean in the realm of web performance. This confusion is common in discussions about programming languages, not just Ruby. When we say Ruby or Rails is slow, we often focus on a singular stack and miss the bigger picture. A Rails request might take, say, ten milliseconds to initiate, while databases can take longer and we might think memcached is slow.

00:06:20.260 In essence, a lot of the time consumed during a request is actually within our application code. So let’s all take a moment together and acknowledge, 'It’s my fault!' on the count of three—One, two, three! Oh, that felt good, right? Now, step two, after accepting responsibility, is diagnosing the problem. You need to ask yourself not where Ruby went wrong but where you went wrong. This diagnosis relies on what I call the 'Five M's': metrics, measurements, and numbers. Because, indeed, milliseconds matter.

00:07:03.300 Collecting metrics is vital. We all collect metrics, right? If we don't have solid statistics or justification for our beliefs about performance, we risk stumbling around blind in the dark. This is dangerous territory. The important takeaway is that there are tools available for every potential performance issue.

00:07:47.960 As Ruby 2.1 evolves, the performance tools are continually improving. Discussing specific tools can be introduced later, but it's essential to recognize that any performance problem has its corresponding tools. Step three is treatment—what steps can we take to address the identified issues? I like to think of it like playing golf—how can we achieve the lowest score with the fewest strokes? It's about methodically testing and confirming each change through a scientific method.

00:08:40.210 When treating these issues, I visualize this process as a rectangle, or sometimes a cube—though my Keynote skills aren't quite up to drawing cubes! The idea here is we can optimize either vertically or horizontally. Vertical optimization addresses parts of a single request, while horizontal optimization looks at the broader application setup. Making targeted fixes to specific, frequently accessed code paths can often yield drastic performance improvements.

00:09:47.170 Another important aspect of this therapy is understanding that context is vital for acceptance, and introspective ability is crucial for diagnosis. We also need to have a strong familiarity with our performance tools to ensure effective treatment.

00:10:42.570 Let me introduce myself—my name is Aaron Quint, and I serve as the Chief Scientist at Paperless Post, an online invitations and stationery company based in New York. We have an office in San Diego as well, and our operations have grown tremendously over the years. Since our inception, we've seen our user base expand to about 80 million today, yet we've maintained the same Rails application since the beginning. Despite scaling rigorously, we've never conducted a major rewrite of our core application.

00:11:58.440 Our business experiences peaks and valleys like many startups, and being seasonally driven means we can predict our high-usage periods every year, such as Valentine's Day. The graph demonstrates how our card sending volume spikes around February due to Valentine's Day and again during the holidays. With these seasonal peaks, it creates significant stress for our teams. Right now, as we approach this holiday season, our traffic is increasing drastically—doubling on a week-over-week basis, which emphasizes the urgency to optimize.

00:12:57.880 I've learned that our focus often leans toward shipping features for our customers, even while keeping an eye on performance and site stability. It is critical to understand that a faster implementation of features often leads to a trade-off with how smoothly they actually perform. Thus in our experience, optimizing for speed must go hand-in-hand with stability. Although our operations team is growing, we were previously limited, which made it essential to streamline our site. This leads us to some practical case studies where I can share real examples of tools we’ve used to fix performance issues.

00:14:46.530 The first case revolves around JSON. Many of our applications rely on JSON for communication between different services and our users. One part of our application, the paper browser, where users select cards, generates extensive JSON data. Over time, we managed to optimize the performance of this JSON generation process because these pages can be cached frequently since they are not user-specific. However, we noticed that when a cache invalidated, it created a drastic impact on overall performance.

00:15:41.500 This performance degradation occurred because the pages that got struck without cached versions were incredibly expensive to generate. This graph illustrates that although overall performance seemed great, sporadic page requests could generate delays. The solution we implemented involved self-expiring nested cache keys. These keys are self-validating based on updates to their associated records. Instead of manually invalidating caches, we only needed to check timestamps to see if the cache needs refreshing.

00:16:44.130 This solution was integrated effortlessly into our existing JSON methodology without burdening developers with additional complexity. We included elements that needed fetching from the cache in a way that any top-level cache would serve its purpose efficiently.

00:17:43.950 Despite our efforts, we still faced challenges of cache invalidation due to frequent updates to design templates. Our overall performance was notably impacted because un-cached performance proved burdensomely slow. We designed an internal tool, which I covered in detail in previous talks, that helps aggregate various performance metrics together to make our profiling process more manageable.

00:18:43.910 This meta-tool, called PB Profiler, runs various performance checks over a piece of code by collating results from other profiling tools. It generates comprehensive outputs that help us identify how to improve metrics—as understanding script performance is crucial.

00:19:55.640 This profiling framework also gathers data on various activities, such as caching toggling whether records were cached or not, using benchmarks to track operations, and memory profiling per line of code. When I used this tool against our paper generation request, I determined that generating one paper averaged 162 milliseconds, which really highlighted issues when producing pages consisting of hundreds of elements. This amount of time was far from sustainable.

00:20:54.160 Running the output through our line profiler pinpointed exactly which line of code was responsible for excessive processing times. In doing so, we realized that a significant portion of time was spent executing expensive operations repeatedly. The approach was simple: optimize the slowest lines to yield an immediate impact on overall performance.

00:22:23.340 Taking systematic measures and iterating over time helped us reduce the processing time from 1200 milliseconds to around 30 milliseconds per package. The majority of improvements came from simple oversights in the code where redundant computations were present or inappropriately executed.

00:23:35.250 When we addressed horizontal optimizations, we focused on improving code efficiency across our whole application. Right before Valentine’s Day, which we refer to as V-day due to the pressure it puts on our operations, Ruby 2.1 came out with some tools that prompted collaboration on performance-related projects.

00:24:20.350 One key improvement was utilizing stack profiling—tools that record code performance over time. Stack Prof uses time sampling to minimize overhead while still obtaining an accurate picture of performance across the stack. By providing an aggregated view of function calls and their frequencies, we could isolate slow operations influencing our application’s overall performance.

00:25:27.740 Through profiling, we discovered unexpected delays from our use of statsd, a tool we relied on for collecting metrics. The reason stemmed from DNS resolution that occurred whenever a hostname was used instead of an IP address, causing significant slowdowns. By correcting this, we dramatically improved response times.

00:26:23.620 Approaching holiday scalability is another critical factor. As we prepare for high traffic periods, like Black Friday and Christmas, we continuously reassess our resources and make predictable upgrades to keep up with traffic needs. Recently, we’ve expanded our hardware resources by adding more nodes to our clusters and upgrading capabilities that accommodated increased traffic without negatively affecting responsiveness.

00:27:51.110 Once again, it reinforces the notion that sometimes it is indeed your fault for holding back on necessary investments—upgrading your resources can lead to increased performance. Engineers and operators need to find the right balance between performance strategies, time, effort, and budget.

00:29:25.150 Finally, let's talk about shrinking the gap. It’s not just about optimizing the slowest path; it requires discerning which aspect of the application deserves the most focus based on user behavior. We've built a 'hit list' model that ranks controller actions by request frequency and response time metrics. This allows us to visualize and prioritize what needs attention most.

00:30:18.000 By examining these patterns, you come to realize that even actions that seem quick can distract from high-impact areas and adjusting our optimization tasks according to usage—meaning thousands of requests can outweigh sporadic slow actions.

00:31:08.690 As we attempt optimizations, we realize that while some immediate improvements might not seem impactful, they can still influence performance metrics due to cumulative effects over numerous requests. It’s essential to invest in consistent incremental updates as our application grows to maintain pace with service demands.

00:32:03.040 In conclusion, remember that performance tuning takes time—it isn’t always about major wins but rather about refining our craft through persistent practice. I draw parallels to my recent experience mastering baking bread—through mistakes and lessons learned, I believe we’re all on the path to improvement.

00:32:56.040 As Ruby operators, we're in a unique position today where tools and resources are advancing rapidly to help us scale effectively. The Ruby community is starting to take notice of the many substantial applications running on Ruby and is stepping up to improve not only speed but also the instruments we use to analyze performance.

00:34:10.870 I urge you to embrace these tools and develop a thorough understanding to elevate your work. There’s an abundance of knowledge accumulated over decades in programming and performance tuning, and utilizing additional tools for profiling and diagnostics will enhance your capabilities. Thank you for your time today! I’m Aaron Quint, find me on Twitter or GitHub. Let’s keep the Ruby community thriving!