00:00:09.080
Hello everyone, I'm excited to talk a little bit about Ruby 2.1 and share with you a few tricks.
00:00:15.480
First, let me introduce myself. That's my Twitter handle. I currently work for Discourse, which I co-founded, and previously I was part of Stack Overflow.
00:00:20.760
I've always been very passionate about performance and addressing performance issues in large-scale web applications.
00:00:26.080
This can be a bit surprising given Ruby's general attitude towards performance. But here is a screenshot of Discourse and a bit of the tech stack that we're using.
00:00:38.520
We run on Rails 4 and also on Rails Master. We dual boot and use PostgreSQL, Redis, Ember, and a whole bunch of open-source technologies. Our hosting solution is based on Docker.
00:00:52.399
If anyone wants to discuss that further, I would be more than happy to engage in that conversation.
00:01:02.680
In my work, I also maintain several gems as part of my open-source contributions. For instance, who's heard of Rack Mini Profiler? A few of you? That's great!
00:01:15.680
There are lesser-known gems like Message Bus, which allows for long polling in Rails—a feature that is quite tricky to implement due to the single-threaded nature of processing requests.
00:01:27.720
Fast Blank is another gem that gives a little performance bump to your Rails applications just by including it, requiring very minimal setup.
00:01:41.479
We've seen individual improvements ranging from 3% to 5%, depending on how often you use blank or present. This has led to more developers obsessing over its performance.
00:01:53.560
NLR Redux is a handy caching gem that could be useful if you ever need caching capabilities. Here are some screenshots of what flame graphs look like.
00:02:06.719
However, this talk primarily focuses on Ruby 2.1. If time allows, I might do a little live demo of Rack Mini Profiler and flame graphs, but my main focus will be on Ruby 2.1.
00:02:18.599
Specifically, I want to answer the question: 'Is it faster?' I'll also discuss how to tune it, examine memory leak tracking tricks, review memory usage tracking methods, and talk a little bit about the future.
00:02:37.400
Before anyone jumps into installing Ruby 2.1 in production, please do not do that tomorrow! Historically, many Ruby 1.x and 2.0 releases have had various issues.
00:02:55.920
If you look at Ruby 1.9 or even 2.0, they all had problems upon their release, and unfortunately, 2.1 is no exception.
00:03:09.680
If you deploy this version, expect segmentation faults and various unexpected issues, so it’s better to test it first.
00:03:18.239
The primary issue to be aware of, which I’ll discuss soon, is related to memory usage. I have a link to my slides, which I’ll make available for everyone to view afterward.
00:03:30.480
Much of my performance analysis of Ruby 2.1 was conducted using something called the Discourse Bench. I developed a script that simulates a Discourse application.
00:03:43.519
This allows anyone to run it locally and get a realistic benchmark of how a typical Rails application would perform.
00:03:54.760
Traditionally, a lot of benchmarking is done with micro-benchmarks, which are not representative of real-world web applications.
00:04:08.920
They often do not maintain long-lived processes or allocate memory in the same way, leading to misleading performance metrics.
00:04:22.919
The Discourse Bench aims to resolve this issue by testing an actual application with real-life data.
00:04:30.039
During these tests, I examined various stacks of Ruby 2.1 with different optimizations and wanted to present meaningful results.
00:04:43.519
A common mistake in benchmarking is not accounting for variable CPU speeds caused by load. To ensure stable CPU performance for accurate benchmarking, it’s crucial to run all tests multiple times.
00:04:59.400
One way to achieve this is by pinning the CPU at a high speed, which significantly helps with consistency.
00:05:13.479
The Discourse Bench offers various options, such as using Unicorn or Thin to serve it, tracking memory stats, number of iterations run, and more.
00:05:27.199
At the end, it provides a range of data, including how long it took to load Rails, based on different percentiles.
00:05:40.240
When comparing Ruby 2.1 with 2.0, a particularly interesting finding is that memory consumption has almost doubled.
00:05:53.080
However, the number of objects retained in memory has actually decreased due to extensive performance work aimed at reducing object lifespan.
00:06:00.479
Despite this reduction, the overall memory consumption continues to rise, which I'm going to discuss more shortly.
00:06:07.240
It's also worth noting that overall performance has improved with Ruby 2.1.
00:06:18.400
While the performance differences may seem minimal in terms of the median request times, the 90th percentile performance shows significantly better results for slower requests. This improvement primarily stems from re-engineered garbage collection (GC).
00:06:38.160
Ruby 2.1 introduces an innovative approach to GC that effectively reduces stoppages and optimizes memory handling.
00:06:45.280
Out of the box, Ruby 2.1 is more memory hungry, but it also delivers better performance for median requests and significantly improved performance for slower requests.
00:07:05.960
You will experience fewer stalls, especially in Rails applications that may have previously experienced slow requests at random intervals.
00:07:15.120
The substantial changes to the GC made by Charlie, who is present in the room, have contributed to this optimization.
00:07:24.199
The key change in Ruby 2.1 is the introduction of a new, more efficient GC that focuses on essential data rather than scanning through everything.
00:07:39.199
GC's have historically stopped the world, freezing the application until they complete. The new GC incorporates a concept of a minor GC that handles newer objects separately from older objects.
00:07:50.839
This way, only necessary objects are processed during minor GC operations, leading to much less disruption.
00:08:04.400
Additionally, there’s a global method cache that significantly speeds up method lookups.
00:08:10.440
With improved granularity, it allows for increased metaprogramming without negatively impacting performance.
00:08:21.280
We’ve also optimized object initialization times and enhanced tuning options.
00:08:37.040
Ruby 2.1 introduces a frozen string cache that yields substantial performance benefits.
00:08:49.999
In production, there are several techniques you can apply right now to enhance your setups. The first one is to create a custom Ruby build, which I’ll discuss.
00:09:05.960
Next, you can run Unicorn with out-of-band GC, and utilize JMalloc to optimize memory allocations. The GitHub branch of Ruby 2.1 has all the critical fixes backported.
00:09:20.360
This includes performance optimizations that will likely be part of a future 2.2 release.
00:09:35.960
The most significant improvement here is the enhanced method cache lookup, which benefits performance by around 5 to 10%.
00:09:48.360
If you can operate these changes effectively, you could see notable performance enhancements. However, I caution against making the optimizations without thoughtful consideration.
00:10:06.720
It's essential to understand the impact of various settings on your application in real-world scenarios.
00:10:24.320
For example, you can start with large heaps to prevent Ruby from needing to continuously grow the heap size.
00:10:32.680
You can also set limits on the growth rate of the heap itself and negotiate with Ruby's internal settings.
00:10:44.800
However, be mindful that certain settings can also slow down bench results, especially if applied indiscriminately.
00:10:59.600
The implementation of out-of-band GC means that we can handle garbage collection between requests rather than during them.
00:11:12.360
This shift results in faster response times for requests, as the GC runs at opportune times when users aren't actively using the application.
00:11:27.440
Previously, older approaches to out-of-band GC would disable GC completely during requests, leading to memory ballooning.
00:11:37.039
In Ruby 2.1, we have a functioning out-of-band GC that really enhances performance.
00:11:49.360
I've also implemented a version of this for 2.0 at Discourse, which works reasonably well but requires more tuning depending on workload.
00:12:03.560
So, when you adopt Ruby 2.1, employing this gem in conjunction with Unicorn is a straightforward way to gain performance benefits.
00:12:18.880
Even though median request performance remains relatively unchanged, you will notice a marked improvement in 99th percentile performance.
00:12:34.440
This improvement results from not having to run GC during requests, which previously caused request latency.
00:12:45.480
One key takeaway is that it provides a more uniformly performed application without any additional cost.
00:12:56.120
When it comes to forking processes like Unicorn, a common issue arises with memory reporting.
00:13:07.560
When you check memory via commands like PS aux, the numbers can seem misleading since they include the memory shared across forked processes.
00:13:19.640
To accurately measure impact, we look at PSS (Proportional Set Size) to more accurately gauge a process's memory footprint.
00:13:32.760
This typically leads to memory savings of 20 to 30% when implemented effectively.
00:13:40.560
We've also added flags to further optimize GC performance, allowing for adjustments to how the GC handles old objects.
00:13:55.960
This adjustment directly impacts how often full GC runs, allowing you to potentially reduce memory usage while maintaining performance.
00:14:06.800
If you're running memory-tight applications, tuning these settings for more frequent GC could significantly save memory.
00:14:20.240
Interestingly, Ruby 2.1 also permits the new GC to be disabled, reverting back to the prior version if necessary.
00:14:36.840
For long-running Rails apps, it's common to observe memory usage escalating after an extended period.
00:14:49.680
A key issue with traditional C memory allocators becomes less efficient over time, while alternatives like JMalloc perform better.
00:15:04.240
Using JMalloc can lead to significant memory consumption benefits when allocating numerous objects over time.
00:15:17.760
You have to precompile it and set environmental variables for the allocation to take effect.
00:15:33.680
Another workaround is utilizing TCMalloc, which also performs better in many server scenarios compared to standard allocators.
00:15:50.840
Now, turning to the testing suite for Discourse, we see that out-of-the-box, Ruby 2.1 delivers a performance boost.
00:16:06.960
Using the bar 210 GitHub branch, we've recorded our spec tests running 42% faster.
00:16:16.640
By adjusting the Ruby GC parameters, we can influence the frequency of GC runs and improve overall performance.
00:16:29.040
This allows us to shave off significant time during our test runs without the need for introducing extraneous environment variables.
00:16:43.680
This is an immediate improvement you can implement right away. Have any of you ever experienced a memory leak in a Rails app?
00:16:56.000
It's often quite difficult to identify the cause, but I’ll show you a technique to uncover memory leaks.
00:17:07.440
Using the AB Trace gem written by the folks at GitHub can help connect to a process and run diagnostics.
00:17:22.440
We use AbTrace in production and find it to be safe since it operates without needing to open unnecessary ports.
00:17:36.960
It's very low impact, so it won’t impact your operational apps.
00:17:51.400
You can connect to your process, dump heaps to analyze for leaks, and compare the heap snapshots.
00:18:06.000
While the code may seem overwhelming, the essence is about taking two heap snapshots and finding differences.
00:18:21.160
By identifying what exists in the second snapshot and not in the first, you can pinpoint potential leaks.
00:18:35.480
In practice, this helps uncover which parts of your code are leaking objects, allowing for corrective action.
00:18:50.080
The AbTrace provides a comprehensive report with statistical data on allocated strings, allowing you to optimize your implementation.
00:19:05.600
Next, let’s discuss memory profiling. I created a memory profiling gem that utilizes the new APIs introduced in Ruby 2.1.
00:19:18.600
These APIs allow us to monitor object allocations in Ruby, providing valuable insights.
00:19:29.680
One example is examining Rails startup where we can analyze memory usage patterns.
00:19:43.840
The results distinguish retained versus allocated types of objects, which can show trends in memory growth.
00:19:56.560
Higher retained metrics could signal potential memory leaks, while high allocation numbers generally indicate inefficiencies.
00:20:10.960
For instance, we can analyze a gem report that highlights how many objects were allocated and their memory footprint.
00:20:26.600
If a gem retains a high number of objects, it indicates inefficiencies worth addressing.
00:20:39.680
Moreover, we can inspect specific lines of code to understand object allocations, pinpointing areas for improvements.
00:20:52.440
I’ve encountered several instances of unnecessary memory allocations in common libraries, and addressing these could yield significant benefits.
00:21:07.200
For example, if a method allocates strings repeatedly with the same value, we could improve that by introducing caching.
00:21:21.600
When developing for Ruby 2.1, leveraging string freezing could significantly reduce allocations.
00:21:36.320
Using internal hooks allows you to inspect GC stats and gain additional insight.
00:21:48.920
We've introduced new APIs and capabilities in Ruby 2.1 for managing jobs and tracking object allocations.
00:22:00.960
Moreover, I want to highlight a couple of those new changes, which could greatly enhance performance today.
00:22:18.680
The string scrub and exception CES are two new features that tackle common issues developers encounter frequently.
00:22:34.640
As Ruby evolves, we will witness continued GC improvements in upcoming versions like 2.2 that aim to resolve any remaining memory ballooning.
00:22:50.560
For my wish list, I hope to see the inclusion of long-running benchmarks for Ruby, enabling us to assess changes in performance over time.
00:23:06.320
Having this ability would allow for capturing performance regressions before their release.
00:23:20.240
In positive news, Ruby continues to improve! Just ensure to be cautious when applying optimizations to avoid unintentional performance losses.
00:23:35.840
Also, cautiously approach Ruby 2.1 installations in production, and consider the GitHub version for better performance insights.
00:23:50.960
If anyone has any further questions, feel free to reach out or engage with me on Twitter or forums. I’ve also prepared additional resources that I’ll share online.
00:24:11.280
Before concluding, I want to showcase Mini Profiler. In live production, you can observe how it integrates and offers performance insights.
00:24:24.960
As I navigate through Discourse, small indicators appear, showing performance data that help in identifying slow queries.
00:24:35.680
I can click through various categories and queries to analyze which parts are underperforming and why.
00:24:50.680
Identifying unoptimized queries like N+1 problems or unnecessary repeated queries allows for practical fixes.
00:25:05.640
Utilizing Mini Profiler gives developers critical insights into their applications and performance metrics.
00:25:21.040
I think that wraps up my talk. I’ll take a couple of minutes for questions if anyone has queries.
00:25:30.680
Yes, I see one hand raised. Go ahead!
00:25:43.960
Hi! Regarding that last chart with the flame graph you displayed, is that your own creation?
00:26:03.560
Yes, that’s Rack Mini Profiler! It’s available to install as a gem right now.
00:26:12.560
Great! Thank you! Anyone else?
00:26:25.440
I have a recommendation. I think the performance of Discourse could greatly improve with GC disabled.
00:26:37.280
That’s an interesting suggestion! I appreciate the feedback.
00:26:47.160
Alright, I hope you all found this talk informative! Thank you!