Performance Optimization

Summarized using AI

Why I am excited about Ruby 2.1?

Sam Saffron • February 19, 2014 • Earth

In this talk from RubyConf AU 2014, Sam Saffron discusses the excitement surrounding the release of Ruby 2.1, highlighting its new features and improvements. The presentation is aimed at performance enhancements, particularly in large-scale web applications, a passion of Saffron's stemming from his work at Discourse and previous involvement with Stack Overflow.

Key points covered include:

  • Introduction to Ruby 2.1: Saffron introduces the new version and emphasizes the improvements made, particularly in garbage collection (GC) and performance profiling.
  • Performance Tools: He discusses various open-source gems he maintains that contribute to Ruby performance, such as Rack Mini Profiler and Fast Blank, demonstrating their value in everyday development.
  • Benchmarking Discourse: Saffron explains the creation of the Discourse Bench for realistic performance testing, contrasting with traditional micro-benchmarking methods that often yield misleading results.
  • Garbage Collection Improvements: One of the most significant advancements in Ruby 2.1 is the new GC. Saffron elaborates on how the GC now runs minor collections for newer objects separately, resulting in fewer application stalls and improved performance for both median and slower request times.
  • Memory Optimization: Despite increased memory consumption, Saffron outlines how the number of objects retained has decreased, indicating effective optimization techniques used in Ruby 2.1.
  • Practical Performance Enhancements: The talk details actionable steps developers can take to fine-tune Ruby 2.1 in production, such as custom builds and utilizing out-of-band GC.
  • Memory Leak Detection: Saffron shares practical techniques for identifying memory leaks using the AB Trace gem, which aids in monitoring memory usage during production.
  • Conclusion: Ruby continues to evolve, and Saffron encourages cautious adoption of Ruby 2.1 in production environments, supporting the idea of performance tuning and utilizing new APIs for better application performance.

Overall, the presentation is rich with technical insights and practical examples aimed at developers eager to enhance their performance metrics in Ruby applications. Saffron invites questions, showcasing both the practical implications of his work and a collaborative spirit within the Ruby community.

Why I am excited about Ruby 2.1?
Sam Saffron • February 19, 2014 • Earth

RubyConf AU 2014: http://www.rubyconf.org.au

Ruby 2.1 is about to be released, it will include a generational GC, performance improvements and a much improved platform for diagnostics and profiling.
In this talk I will cover many of the new profiling and instrumentation APIs introduced in Ruby 2.1, introduce you to the new GC and show you some cool tools you can build on top of this. I will cover memory profiling, GC instrumentation, the much improved stack sampling, memory profilers, flame graphs and more.

Sam posted the slides from his talk here: speakerdeck.com/samsaffron/why-ruby-2-dot-1-excites-me

RubyConf AU 2014

00:00:09.080 Hello everyone, I'm excited to talk a little bit about Ruby 2.1 and share with you a few tricks.
00:00:15.480 First, let me introduce myself. That's my Twitter handle. I currently work for Discourse, which I co-founded, and previously I was part of Stack Overflow.
00:00:20.760 I've always been very passionate about performance and addressing performance issues in large-scale web applications.
00:00:26.080 This can be a bit surprising given Ruby's general attitude towards performance. But here is a screenshot of Discourse and a bit of the tech stack that we're using.
00:00:38.520 We run on Rails 4 and also on Rails Master. We dual boot and use PostgreSQL, Redis, Ember, and a whole bunch of open-source technologies. Our hosting solution is based on Docker.
00:00:52.399 If anyone wants to discuss that further, I would be more than happy to engage in that conversation.
00:01:02.680 In my work, I also maintain several gems as part of my open-source contributions. For instance, who's heard of Rack Mini Profiler? A few of you? That's great!
00:01:15.680 There are lesser-known gems like Message Bus, which allows for long polling in Rails—a feature that is quite tricky to implement due to the single-threaded nature of processing requests.
00:01:27.720 Fast Blank is another gem that gives a little performance bump to your Rails applications just by including it, requiring very minimal setup.
00:01:41.479 We've seen individual improvements ranging from 3% to 5%, depending on how often you use blank or present. This has led to more developers obsessing over its performance.
00:01:53.560 NLR Redux is a handy caching gem that could be useful if you ever need caching capabilities. Here are some screenshots of what flame graphs look like.
00:02:06.719 However, this talk primarily focuses on Ruby 2.1. If time allows, I might do a little live demo of Rack Mini Profiler and flame graphs, but my main focus will be on Ruby 2.1.
00:02:18.599 Specifically, I want to answer the question: 'Is it faster?' I'll also discuss how to tune it, examine memory leak tracking tricks, review memory usage tracking methods, and talk a little bit about the future.
00:02:37.400 Before anyone jumps into installing Ruby 2.1 in production, please do not do that tomorrow! Historically, many Ruby 1.x and 2.0 releases have had various issues.
00:02:55.920 If you look at Ruby 1.9 or even 2.0, they all had problems upon their release, and unfortunately, 2.1 is no exception.
00:03:09.680 If you deploy this version, expect segmentation faults and various unexpected issues, so it’s better to test it first.
00:03:18.239 The primary issue to be aware of, which I’ll discuss soon, is related to memory usage. I have a link to my slides, which I’ll make available for everyone to view afterward.
00:03:30.480 Much of my performance analysis of Ruby 2.1 was conducted using something called the Discourse Bench. I developed a script that simulates a Discourse application.
00:03:43.519 This allows anyone to run it locally and get a realistic benchmark of how a typical Rails application would perform.
00:03:54.760 Traditionally, a lot of benchmarking is done with micro-benchmarks, which are not representative of real-world web applications.
00:04:08.920 They often do not maintain long-lived processes or allocate memory in the same way, leading to misleading performance metrics.
00:04:22.919 The Discourse Bench aims to resolve this issue by testing an actual application with real-life data.
00:04:30.039 During these tests, I examined various stacks of Ruby 2.1 with different optimizations and wanted to present meaningful results.
00:04:43.519 A common mistake in benchmarking is not accounting for variable CPU speeds caused by load. To ensure stable CPU performance for accurate benchmarking, it’s crucial to run all tests multiple times.
00:04:59.400 One way to achieve this is by pinning the CPU at a high speed, which significantly helps with consistency.
00:05:13.479 The Discourse Bench offers various options, such as using Unicorn or Thin to serve it, tracking memory stats, number of iterations run, and more.
00:05:27.199 At the end, it provides a range of data, including how long it took to load Rails, based on different percentiles.
00:05:40.240 When comparing Ruby 2.1 with 2.0, a particularly interesting finding is that memory consumption has almost doubled.
00:05:53.080 However, the number of objects retained in memory has actually decreased due to extensive performance work aimed at reducing object lifespan.
00:06:00.479 Despite this reduction, the overall memory consumption continues to rise, which I'm going to discuss more shortly.
00:06:07.240 It's also worth noting that overall performance has improved with Ruby 2.1.
00:06:18.400 While the performance differences may seem minimal in terms of the median request times, the 90th percentile performance shows significantly better results for slower requests. This improvement primarily stems from re-engineered garbage collection (GC).
00:06:38.160 Ruby 2.1 introduces an innovative approach to GC that effectively reduces stoppages and optimizes memory handling.
00:06:45.280 Out of the box, Ruby 2.1 is more memory hungry, but it also delivers better performance for median requests and significantly improved performance for slower requests.
00:07:05.960 You will experience fewer stalls, especially in Rails applications that may have previously experienced slow requests at random intervals.
00:07:15.120 The substantial changes to the GC made by Charlie, who is present in the room, have contributed to this optimization.
00:07:24.199 The key change in Ruby 2.1 is the introduction of a new, more efficient GC that focuses on essential data rather than scanning through everything.
00:07:39.199 GC's have historically stopped the world, freezing the application until they complete. The new GC incorporates a concept of a minor GC that handles newer objects separately from older objects.
00:07:50.839 This way, only necessary objects are processed during minor GC operations, leading to much less disruption.
00:08:04.400 Additionally, there’s a global method cache that significantly speeds up method lookups.
00:08:10.440 With improved granularity, it allows for increased metaprogramming without negatively impacting performance.
00:08:21.280 We’ve also optimized object initialization times and enhanced tuning options.
00:08:37.040 Ruby 2.1 introduces a frozen string cache that yields substantial performance benefits.
00:08:49.999 In production, there are several techniques you can apply right now to enhance your setups. The first one is to create a custom Ruby build, which I’ll discuss.
00:09:05.960 Next, you can run Unicorn with out-of-band GC, and utilize JMalloc to optimize memory allocations. The GitHub branch of Ruby 2.1 has all the critical fixes backported.
00:09:20.360 This includes performance optimizations that will likely be part of a future 2.2 release.
00:09:35.960 The most significant improvement here is the enhanced method cache lookup, which benefits performance by around 5 to 10%.
00:09:48.360 If you can operate these changes effectively, you could see notable performance enhancements. However, I caution against making the optimizations without thoughtful consideration.
00:10:06.720 It's essential to understand the impact of various settings on your application in real-world scenarios.
00:10:24.320 For example, you can start with large heaps to prevent Ruby from needing to continuously grow the heap size.
00:10:32.680 You can also set limits on the growth rate of the heap itself and negotiate with Ruby's internal settings.
00:10:44.800 However, be mindful that certain settings can also slow down bench results, especially if applied indiscriminately.
00:10:59.600 The implementation of out-of-band GC means that we can handle garbage collection between requests rather than during them.
00:11:12.360 This shift results in faster response times for requests, as the GC runs at opportune times when users aren't actively using the application.
00:11:27.440 Previously, older approaches to out-of-band GC would disable GC completely during requests, leading to memory ballooning.
00:11:37.039 In Ruby 2.1, we have a functioning out-of-band GC that really enhances performance.
00:11:49.360 I've also implemented a version of this for 2.0 at Discourse, which works reasonably well but requires more tuning depending on workload.
00:12:03.560 So, when you adopt Ruby 2.1, employing this gem in conjunction with Unicorn is a straightforward way to gain performance benefits.
00:12:18.880 Even though median request performance remains relatively unchanged, you will notice a marked improvement in 99th percentile performance.
00:12:34.440 This improvement results from not having to run GC during requests, which previously caused request latency.
00:12:45.480 One key takeaway is that it provides a more uniformly performed application without any additional cost.
00:12:56.120 When it comes to forking processes like Unicorn, a common issue arises with memory reporting.
00:13:07.560 When you check memory via commands like PS aux, the numbers can seem misleading since they include the memory shared across forked processes.
00:13:19.640 To accurately measure impact, we look at PSS (Proportional Set Size) to more accurately gauge a process's memory footprint.
00:13:32.760 This typically leads to memory savings of 20 to 30% when implemented effectively.
00:13:40.560 We've also added flags to further optimize GC performance, allowing for adjustments to how the GC handles old objects.
00:13:55.960 This adjustment directly impacts how often full GC runs, allowing you to potentially reduce memory usage while maintaining performance.
00:14:06.800 If you're running memory-tight applications, tuning these settings for more frequent GC could significantly save memory.
00:14:20.240 Interestingly, Ruby 2.1 also permits the new GC to be disabled, reverting back to the prior version if necessary.
00:14:36.840 For long-running Rails apps, it's common to observe memory usage escalating after an extended period.
00:14:49.680 A key issue with traditional C memory allocators becomes less efficient over time, while alternatives like JMalloc perform better.
00:15:04.240 Using JMalloc can lead to significant memory consumption benefits when allocating numerous objects over time.
00:15:17.760 You have to precompile it and set environmental variables for the allocation to take effect.
00:15:33.680 Another workaround is utilizing TCMalloc, which also performs better in many server scenarios compared to standard allocators.
00:15:50.840 Now, turning to the testing suite for Discourse, we see that out-of-the-box, Ruby 2.1 delivers a performance boost.
00:16:06.960 Using the bar 210 GitHub branch, we've recorded our spec tests running 42% faster.
00:16:16.640 By adjusting the Ruby GC parameters, we can influence the frequency of GC runs and improve overall performance.
00:16:29.040 This allows us to shave off significant time during our test runs without the need for introducing extraneous environment variables.
00:16:43.680 This is an immediate improvement you can implement right away. Have any of you ever experienced a memory leak in a Rails app?
00:16:56.000 It's often quite difficult to identify the cause, but I’ll show you a technique to uncover memory leaks.
00:17:07.440 Using the AB Trace gem written by the folks at GitHub can help connect to a process and run diagnostics.
00:17:22.440 We use AbTrace in production and find it to be safe since it operates without needing to open unnecessary ports.
00:17:36.960 It's very low impact, so it won’t impact your operational apps.
00:17:51.400 You can connect to your process, dump heaps to analyze for leaks, and compare the heap snapshots.
00:18:06.000 While the code may seem overwhelming, the essence is about taking two heap snapshots and finding differences.
00:18:21.160 By identifying what exists in the second snapshot and not in the first, you can pinpoint potential leaks.
00:18:35.480 In practice, this helps uncover which parts of your code are leaking objects, allowing for corrective action.
00:18:50.080 The AbTrace provides a comprehensive report with statistical data on allocated strings, allowing you to optimize your implementation.
00:19:05.600 Next, let’s discuss memory profiling. I created a memory profiling gem that utilizes the new APIs introduced in Ruby 2.1.
00:19:18.600 These APIs allow us to monitor object allocations in Ruby, providing valuable insights.
00:19:29.680 One example is examining Rails startup where we can analyze memory usage patterns.
00:19:43.840 The results distinguish retained versus allocated types of objects, which can show trends in memory growth.
00:19:56.560 Higher retained metrics could signal potential memory leaks, while high allocation numbers generally indicate inefficiencies.
00:20:10.960 For instance, we can analyze a gem report that highlights how many objects were allocated and their memory footprint.
00:20:26.600 If a gem retains a high number of objects, it indicates inefficiencies worth addressing.
00:20:39.680 Moreover, we can inspect specific lines of code to understand object allocations, pinpointing areas for improvements.
00:20:52.440 I’ve encountered several instances of unnecessary memory allocations in common libraries, and addressing these could yield significant benefits.
00:21:07.200 For example, if a method allocates strings repeatedly with the same value, we could improve that by introducing caching.
00:21:21.600 When developing for Ruby 2.1, leveraging string freezing could significantly reduce allocations.
00:21:36.320 Using internal hooks allows you to inspect GC stats and gain additional insight.
00:21:48.920 We've introduced new APIs and capabilities in Ruby 2.1 for managing jobs and tracking object allocations.
00:22:00.960 Moreover, I want to highlight a couple of those new changes, which could greatly enhance performance today.
00:22:18.680 The string scrub and exception CES are two new features that tackle common issues developers encounter frequently.
00:22:34.640 As Ruby evolves, we will witness continued GC improvements in upcoming versions like 2.2 that aim to resolve any remaining memory ballooning.
00:22:50.560 For my wish list, I hope to see the inclusion of long-running benchmarks for Ruby, enabling us to assess changes in performance over time.
00:23:06.320 Having this ability would allow for capturing performance regressions before their release.
00:23:20.240 In positive news, Ruby continues to improve! Just ensure to be cautious when applying optimizations to avoid unintentional performance losses.
00:23:35.840 Also, cautiously approach Ruby 2.1 installations in production, and consider the GitHub version for better performance insights.
00:23:50.960 If anyone has any further questions, feel free to reach out or engage with me on Twitter or forums. I’ve also prepared additional resources that I’ll share online.
00:24:11.280 Before concluding, I want to showcase Mini Profiler. In live production, you can observe how it integrates and offers performance insights.
00:24:24.960 As I navigate through Discourse, small indicators appear, showing performance data that help in identifying slow queries.
00:24:35.680 I can click through various categories and queries to analyze which parts are underperforming and why.
00:24:50.680 Identifying unoptimized queries like N+1 problems or unnecessary repeated queries allows for practical fixes.
00:25:05.640 Utilizing Mini Profiler gives developers critical insights into their applications and performance metrics.
00:25:21.040 I think that wraps up my talk. I’ll take a couple of minutes for questions if anyone has queries.
00:25:30.680 Yes, I see one hand raised. Go ahead!
00:25:43.960 Hi! Regarding that last chart with the flame graph you displayed, is that your own creation?
00:26:03.560 Yes, that’s Rack Mini Profiler! It’s available to install as a gem right now.
00:26:12.560 Great! Thank you! Anyone else?
00:26:25.440 I have a recommendation. I think the performance of Discourse could greatly improve with GC disabled.
00:26:37.280 That’s an interesting suggestion! I appreciate the feedback.
00:26:47.160 Alright, I hope you all found this talk informative! Thank you!
Explore all talks recorded at RubyConf AU 2014
+17