Running Rack and Rails Faster with TruffleRuby

by Benoit Daloze

In the video titled "Running Rack and Rails Faster with TruffleRuby," Benoit Daloze, the project lead of TruffleRuby, presents optimizations designed to enhance the performance of Rack and Rails applications using the TruffleRuby implementation. TruffleRuby is a high-performance Ruby interpreter developed by Oracle Labs, leveraging the Graal JIT compiler to significantly improve execution speed while maintaining compatibility with Ruby 2.6, including support for C extensions.

Key points discussed include:
- TruffleRuby Modes: TruffleRuby can be executed in two modes—Graal VM mode for seamless Java integration, and native mode, which compiles TruffleRuby into a single executable for faster startup and reduced memory usage.
- Compatibility and Performance: TruffleRuby boasts a compatibility rate of 97% with Ruby specifications. It allows most existing Ruby applications to run without modifications, leveraging existing C extensions.
- Global Lock and Parallel Execution: Unlike traditional Ruby implementations, TruffleRuby can execute Ruby code without a global lock, allowing for parallel execution. However, compatibility with C extensions may require enabling a global lock by default. Daloze discusses future plans to remove this lock as the project evolves.
- Benchmarking Setup: The presentation showcases benchmarks created with the Real Simple Benchmark (RSB), focusing on a simple Rack and Rails application tested under controlled conditions using a popular web server (Puma). Daloze emphasizes that results may vary in real-world scenarios with multiple concurrent requests.
- Performance Results: Benchmarks showed that:
- Rack Application: Ruby 2.6 processed approximately 20,000 requests per second, while TruffleRuby surpassed this with over 30,000.
- Rails Application: Ruby 2.6 achieved around 3,000 requests per second; TruffleRuby improved this performance significantly, handling about 6,000 requests per second on a single thread and increasing up to 18,000 with more threads.
- Future Prospects: Daloze encourages developers to benchmark their applications and explore TruffleRuby, which simplifies installation through common Ruby version managers and supports both native and Graal VM modes. Additionally, he discusses ongoing efforts to ensure safe parallel execution of C extensions.

In conclusion, the talk illuminates the considerable advancements TruffleRuby makes in optimizing Rack and Rails applications, showcasing its ability to outperform traditional Ruby implementations while fostering ease of adoption for developers. Benoit Daloze invites the audience to experiment with this innovative solution to enhance their applications' performance.

00:00:03.439 Hello and welcome to my talk, "Running Rack and Rails Faster with TruffleRuby." My name is Benoit Daloze, and I am the project lead of TruffleRuby. If you don't know yet, TruffleRuby is a high-performance Ruby implementation from Oracle Labs. It uses the Graal JIT compiler to achieve its impressive performance. TruffleRuby targets full compatibility with Ruby 2.6, including C extensions, and it is open source and available on GitHub.

00:00:14.960 There are two ways you can run TruffleRuby: in GVM mode or in native mode. In GVM mode, you can run on the Graal VM, allowing you to seamlessly integrate with Java. In native mode, the TruffleRuby interpreter and the Graal compiler are compiled ahead of time into a native executable. This yields a result similar to MRI Ruby, where you have a single executable containing everything. The benefits of this approach are faster startup times, quicker warm-up periods, and a smaller memory footprint.

00:00:43.600 The reason for these enhancements is that there is a virtual machine tailored specifically for TruffleRuby, which avoids loading unnecessary classes and operations found in a more generic VM. As for compatibility, TruffleRuby is progressing well. It is compatible with Ruby 2.6, and many C extensions work out of the box, including libraries such as LibRuby and geary, as well as many database drivers.

00:01:06.760 There's a good chance that if you try a random C extension, it might function correctly with TruffleRuby. This is a significant advantage if you're looking to run existing applications on TruffleRuby, as most applications from MRI should work without any changes to the Gemfile, or at most with minimal adjustments.

00:01:29.440 In terms of completeness, TruffleRuby currently passes 97% of the Ruby spec, which is an impressive benchmark, and it boasts the highest ratio among all Ruby alternative implementations. One of the primary goals of TruffleRuby is to execute Ruby code more quickly, and we have seen great performance on many CPU-intensive benchmarks and micro benchmarks. However, the focus of today's talk is on web applications.

00:02:06.240 TruffleRuby does not use a global lock, allowing Ruby code to be executed in parallel, which is a significant advantage. Yet, the situation gets more complicated when it comes to C extensions. Many C extensions require that all their C code runs under a single lock. To maximize compatibility, TruffleRuby has a global lock enabled by default.

00:02:36.400 As the project evolves and provides new tools that interoperate across languages, we will gradually remove this lock. For instance, tools such as cross-language debuggers and profilers. A simple example of this is the CPU sampler, which is a sampling profiler. Running it on TruffleRuby is straightforward. You can execute it with the command "truffleruby --cpu-sampler" followed by your program, and it generates a report detailing time spent across various parts of your code.

00:03:04.800 Today, I want to look at the Real Simple Benchmark, or RSB. This project was developed by Noah Gibbs, who conducted extensive benchmarking on Ruby 3 times three. The objective of Ruby 3 times three is to make Ruby three times faster than Ruby 2.0. RSB consists of two primary applications: a simple Rack application and a simple Rails 4 application, both tested under various loads.

00:03:34.079 The benchmarking for both applications uses wrk to measure their performance, focusing on latency and handling various configurations. For our discussion, we will concentrate on a few key aspects. Specifically, we will benchmark the Rack application with a simple ERB template, as well as the Rails 4 application serving plain responses.

00:04:04.080 Both applications will run on Puma, a popular web server, and we will measure both Ruby 2.6, which is the version compatible with TruffleRuby, and the latest version of TruffleRuby in GVM mode for its superior speed. Unfortunately, I encountered a bug with using ActiveRecord in conjunction with Puma, which resulted in extremely slow response times.

00:04:36.079 Eventually, we decided to just create a new database connection for each request instead of using a persistent connection for the benchmarks. This approach significantly impacted the measurement, skewing results toward measuring kernel and network performance rather than Ruby itself.

00:05:06.800 We adjusted our concurrency settings, controlling the number of server threads and wrk request threads. To maintain clarity and consistency, we used a uniform number of threads for both settings. Each request maintains a single connection, which simplifies processing and helps focus on the response generation.

00:05:26.240 My benchmarking runs were performed on an eight-core processor that had frequency scaling and turbo boost enabled to mimic real-world server usage. However, with too many threads, my results did not scale perfectly. Monitoring indicated that increasing threads led to diminishing returns beyond a certain point.

00:05:56.919 During the measurements, I focused on how requests stabilized around a certain throughput figure, reported in requests per second. Each run lasted for ten seconds, measuring how many requests were completed in that time.

00:06:25.280 It's worth noting that these were simple Rack and Rails applications, and the conditions for measurement were idealized to a single request at a time. Real-world conditions will likely be more complex as they involve multiple concurrent requests.

00:06:55.680 Let's begin with the Rack application. For the simple ERB template, we found that Ruby 2.6 handled approximately 20,000 requests per second, which is quite minimal but impressive nonetheless. As we increased the thread count, Ruby 2.6 became slower due to the global lock, demonstrating that threads are not effective for CPU-bound work.

00:07:00.559 In contrast, TruffleRuby achieved over 30,000 requests per second. By running it with more threads, performance improved further. With Ruby 2.6's multi-process setting, throughput did increase but not to the same extent as TruffleRuby, which remained significantly ahead.

00:07:30.400 When evaluating Rails performance, we found that Ruby 2.6 handled around 3,000 requests per second for a basic response. This result illustrates that it performs reasonably well. TruffleRuby managed to process about 6,000 requests per second, marking a notable 2x speedup on a single thread. The performance improved progressively as the number of threads increased, with peaks reaching about 18,000 requests per second.

00:08:06.800 In conducting these benchmarks, it is crucial to note that I ran tests with the C extension lock disabled. The only C extension in use was the Puma HTTP parser, which is designed for safe parallel operation. The discussion surrounding C extensions and their ability to operate in parallel will also extend to Ruby 3 and the Ractor project led by Koichi Sasada.

00:08:46.560 We are keen to standardize how C extensions are marked for safety in parallel execution. One challenge we encountered was adjusting the splitting limit in Truffle to accommodate larger codebases like Rails, which exceed the default limits, potentially causing performance hits during execution.

00:09:27.600 The aim is to refine the system so the limits can be intelligently omitted without future complications. This requires an understanding of code optimization strategies, particularly as it pertains to how certain functions and methods are compiled into efficient executing code.

00:10:07.520 In conclusion, the results showcased in my talk reflected a basic benchmarking setup. However, I encourage all developers to benchmark their own applications and share their findings, whether their results are favorable or otherwise. If you are looking to try TruffleRuby, it’s quite simple to set up.

00:10:29.920 You can use your favorite Ruby version manager to install it in native mode, or leverage GraalVM for both native and GVM modes. GraalVM also facilitates access to other languages in a single VM, allowing efficient multi-language interactions.

00:10:57.120 This implementation of C extensions in TruffleRuby is an exciting frontier. Thank you for listening!