Ruby on Ice 2019

Parallel and Thread-Safe Ruby at High-Speed with TruffleRuby

Array and Hash are used in every Ruby program, but current implementations either prevent to use them in parallel (MRI) or lack thread-safety guarantees (JRuby raises on concurrent Array#«). We show how to make Array and Hash thread-safe while allowing Ruby collections to scale up to tens of cores!

By Benoit Daloze https://twitter.com/@eregontp

Benoit Daloze is a PhD student in Linz, Austria, researching concurrency in Ruby with TruffleRuby for the past several years. He has contributed to many Ruby implementations, including TruffleRuby, MRI and JRuby. He is the maintainer of ruby/spec, a test suite for the behavior of the Ruby programming language.

https://rubyonice.com/speakers/benoit_daloze

Ruby on Ice 2019

00:00:12.650 Now there is a talk ahead of us. Benoit is a PhD student in Linz, and his research focuses on concurrency in Ruby using TruffleRuby. In addition to TruffleRuby, he has also contributed to many other Ruby implementations, such as MRI and JRuby. He is the maintainer of ruby/spec, a test suite for the behavior of the Ruby programming language. In his talk, "Parallel and Thread-Safe Ruby at High Speed with TruffleRuby," Benoit Daloze will show us how to make Array and Hash thread-safe in Ruby, among many other things. Please welcome him on stage.
00:00:42.690 Thank you! Hello! How are you doing? Leslie, not too tired? Do you hear me? Okay, perfect. So, my name is Benoit, and I've been doing a PhD at Johannes Kepler University in Linz, Austria. I'm actually working at Oracle Labs, focusing on TruffleRuby. I want to clarify that TruffleRuby is still a research project, so don’t take everything I say as definitive. I work in Zurich and have been doing a lot of research on concurrency in Ruby in general for more than four years. I'm also the maintainer of ruby/spec.
00:01:14.220 Today, I want to talk about two topics. First, I want to give you an introduction to TruffleRuby and its advantages. Then, I want to discuss my research, specifically on parallelism and thread-safety in Ruby. TruffleRuby is a high-performance implementation of the Ruby programming language created by Oracle Labs. To achieve high performance, it uses a stem compiler called Graal, targeting full compatibility with standard Ruby implementations, including MRI and C extensions.
00:01:55.830 There are two ways to run TruffleRuby: on the JVM (Java Virtual Machine) or on what we call the substrate VM. The advantage of running it on the JVM is interoperability; if you already have a Java program or Java libraries you want to use, you can leverage that. This provides great peak performance, but the default way to run TruffleRuby is with the substrate VM, which does ahead-of-time compilation of TruffleRuby itself and uses Just-In-Time (JIT) compilation to native code. The end result is a lightweight native executable with the TruffleRuby interpreter and the JIT included. This is significantly smaller than the JVM and starts much faster. For instance, the JVM takes at least half a second to start, whereas we can start in just 25 milliseconds since everything is pre-compiled and all classes are already loaded.
00:02:57.420 This all contributes to faster warm-ups, meaning the time to reach good performance is shorter because we utilize the JIT compiler to compile the important methods. The JIT is already pre-compiled and everything is native, giving us a fast interpretation speed and a lower memory footprint because we do not have to handle all the Java edge cases. Ultimately, this results in great peak performance, regardless of what you are doing, making it a drop-in replacement in both cases.
00:03:58.859 I want to discuss the Ruby 3x3 project, which aims for Ruby 3 to be three times faster than Ruby 2.0. This ambitious goal is likely only achievable through a Just-In-Time compiler, as MRI's traditional design does not lend itself to such performance gains. The interesting question is whether TruffleRuby can indeed be faster than three times Ruby 2.0, or whether it can reach speeds comparable to languages like Java or C. To illustrate this, I want to run a benchmark, specifically OptCarrot, which is one of the main benchmarks for Ruby. It's an emulator written entirely in Ruby, created by one of the MRI contributors.
00:04:39.370 First, let’s try with Ruby 2.0, as this will serve as our baseline. Running the program, we see that it operates at about 34 frames per second. This is somewhat slow, especially compared to the 60 frames per second typically expected for fluid animations. It is important to compare this to Ruby 2.6.1, which yields about 45 frames per second. We can also run it using the Mjit (memory JIT) to bypass the interpreter's overhead, resulting in performance improved to around 70 frames per second, which is promising but still not three times faster than Ruby 2.0.
00:05:22.360 Now, let’s run the benchmark on TruffleRuby. Here, I’ll be using it in JVM mode, as it tends to perform slightly better thanks to JVM's garbage collector. At first, there is a learning curve, shown by the slow start at 0 frames per second, but once it learns the execution pattern, it greatly improves. The performance quickly improves to around 240 frames per second, sometimes even hitting 300, which means we can be approximately eight times faster than Ruby 2.0. Once we start the game again, we see the JIT continues optimizing as it learns new patterns.
00:06:58.620 As we delve deeper, we notice the time is displayed in frames per second for this benchmark, but when TruffleRuby operates at this efficiency, it truly shows potential to be significantly faster than earlier versions of Ruby, achieving results comparable to Java and beyond. This warm-up curve demonstrates that while initially TruffleRuby may be erratic, once it optimizes for the specific code path, it surpasses all previous benchmarks and aligns with high-performance compilers such as V8.
00:07:42.470 Now, let’s look at benchmark comparisons with other implementations. We see considerable speed-ups through utilizing the AST (Abstract Syntax Tree) optimization and by embedding the native code generation directly related to optimizations such as just-in-time compilation which is crucial for many applications. For example, many classic CPU benchmarks can achieve speed-ups reaching even 30 times faster with TruffleRuby in instances where it can optimize internal method calls effectively.
00:08:57.000 The real advantages showcase the potential of TruffleRuby particularly in areas involving template rendering, where it can render Ruby templates nearly ten times faster using the standard libraries, demonstrating that computational patterns yield significant optimizations due to how Ruby's internals are structured. Parts of this performance enable methods to be streamlined psychotics that manipulate strings or compile blocks in lieu of overheads that slow other implementations down.
00:11:09.500 As we continue to improve TruffleRuby, we also hope to see more seamless interactions with existing Ruby frameworks, particularly Rails. Rails apps tend to have many dependencies, and although many of them are C extensions, we now have made significant strides in supporting these C extensions. For instance, database drivers such as SQLite and MySQL can now function similarly to standard Ruby, thus making it easier to integrate existing Ruby applications with TruffleRuby without extensive modifications.
00:12:48.920 In addition to performance enhancements, we need to address concurrent programming in Ruby. The primary issue with existing implementations is that they use a global lock, which means Ruby code does not run in parallel. This is a significant limitation, particularly on modern multi-core processors. We cannot scale web applications efficiently with this model, as it limits throughput and resource utilization. By examining the landscape of languages that allow parallel execution, we can align Ruby in this domain.
00:14:39.630 My research aims to make Ruby collections like Arrays and Hashes thread-safe while avoiding excessive synchronizations. Current implementations often block or degrade performance when accessed concurrently. My goal is to introduce fine-grained locking mechanisms and to enhance Ruby's concurrency model without sacrificing its dynamic features.
00:16:50.090 I propose implementing partial evaluation strategies, wherein we can avoid most of the overhead typically associated with thread-safety. By synchronizing only when necessary—i.e., when objects are shared across threads—we can maintain performance on single-threaded operations while still providing safe access in multi-threaded scenarios. This approach minimizes the need for extensive synchronization and promotes optimal performance in real-world applications.
00:18:31.220 Through various studies and experiments, we continue to shape the implementation that resolves typical contention issues seen in concurrent programming, allowing more flexible designs. We have observed that performance remains on par with existing methods while safely enabling concurrent manipulation on collections, catering to Ruby’s expressive syntax without forgoing safety.
00:21:17.090 In summary, the journey towards executing high-performance Ruby code in combination with thread safety represents a substantial leap forward for the language. We can execute Ruby in parallel and match the guarantees offered by mainstream languages while maximizing the use of modern hardware capabilities.
00:24:31.850 As we strive to enhance Ruby's concurrency model, remain committed to improving its performance and reliability. With ongoing research and the application of cutting-edge optimization techniques, TruffleRuby exhibits the capability to compete with many traditional languages, exemplifying that we can achieve high standards of performance while staying true to the core Ruby ethos.
00:27:50.540 To begin experimenting with TruffleRuby, you can run it as an alternative Ruby manager. It is part of the larger GraalVM project, which not only focuses on Ruby but also allows seamless integration with languages like Java, JavaScript, and others in a unified virtual machine. The recent developments in GraalVM ensure that various programming languages interact efficiently, giving Ruby developers the tools to scale and enhance their applications further.
00:30:01.760 Thank you for your attention. If you have any questions, feel free to ask now, or catch me afterward in the corridor. I hope you enjoyed the talk!