Talks

Keynote: Parallel and Thread-Safe Ruby at High-Speed with TruffleRuby

Array and Hash are used in every Ruby program. Yet, current implementations either prevent the use of them in parallel (the global interpreter lock in MRI) or lack thread-safety guarantees (JRuby raises an exception on concurrent Array#). Concurrent::Array from concurrent-ruby is thread-safe but prevents parallel access.

This talk shows a technique to make Array and Hash thread-safe while enabling parallel access, with no penalty on single-threaded performance. In short, we keep the most important thread-safety guarantees of the global lock while allowing Ruby to scale up to tens of cores!

RubyKaigi 2018 https://rubykaigi.org/2018/presentations/eregontp

RubyKaigi 2018

00:00:00.410 Hello, everyone. I would like to talk about parallelism and performance in Ruby.
00:00:05.819 My name is Bruno de Lewis, and I'm a student in Austria.
00:00:12.059 I've been researching concurrency and parallelism in Ruby. I've been working on TruffleRuby, which is another Ruby implementation.
00:00:18.480 I'm also the maintainer of RubySpec, a test suite for the Ruby programming language, and I'm involved with the Ruby core committee.
00:00:30.210 Today, I want to discuss performance first, and then I will talk about my research on parallel and thread-safe Ruby.
00:00:43.440 To begin, let me introduce what TruffleRuby is. How many of you know about TruffleRuby?
00:00:54.140 TruffleRuby is a high-performance implementation of Ruby developed by Oracle Labs. It uses the Graal Just-in-Time compiler to achieve high speed and aims for full compatibility with CRuby, including its C extensions.
00:01:10.740 There are two ways to run TruffleRuby. The first is on the Java Virtual Machine (JVM), which allows interaction with existing Java libraries, much like JRuby.
00:01:23.820 The second configuration, which is the default, compiles TruffleRuby and the Graal compiler ahead of time into a native executable.
00:01:29.369 In this mode, it starts very quickly and is faster on startup than MRI, for instance. It also has a fast warm-up time.
00:01:39.570 Because everything is already precompiled, the execution starts at a quicker pace. Moreover, it has a lower memory footprint than a typical JVM.
00:02:00.189 Now, I would like to talk about the Ruby 3 Times 3 project.
00:02:08.720 The goal of this project is to make Ruby 3 three times faster than CRuby 2.0. The main approach for achieving this for CPU benchmarks is through the Just-in-Time compiler.
00:02:22.230 But I have two questions regarding this project: Do we need to wait until Ruby 3, which will be in a few years, around 2020?
00:02:27.590 And can we be faster than Ruby 2.0? I want to demonstrate this by running Optcarrot, which is the main CPU benchmark for Ruby 3.
00:02:47.970 So first, we will start by running it with CRuby 2.0. This is the baseline. I'm playing the Landmaster game, which is the default game for this emulator, and as we can see, the speed is not very impressive, around 28-30 frames per second.
00:03:17.710 Next, we can try running the latest MRI with the MG flag activated and see if it gets faster.
00:03:36.160 The initial speed is about 40 frames per second, but it ramps up to about 50 frames per second.
00:03:43.570 Now, when we run this with TruffleRuby, it may initially start up slower because the JIT compiler is warming up and learning how the program behaves. However, once it understands, it begins to run faster.
00:04:06.200 Initially, the performance is around 7 frames per second, but soon after, we reach speeds of around 170-200 frames per second, which is a significant improvement.
00:04:29.870 This shows that TruffleRuby can achieve far higher performance compared to both MRI and the prior version.
00:04:43.440 It's crucial to highlight that the performance of Optcarrot is exceptional with TruffleRuby compared to other benchmarks.
00:05:18.420 Running without a graphical user interface, we can achieve almost 300 frames per second, demonstrating an impressive performance gain.
00:05:38.240 While this is just one benchmark, we perform very well on many CPU-based benchmarks, like the computer language benchmark game.
00:06:00.990 In many cases, we achieve performance improvements ranging from 10 to 30 times faster than CRuby 2.3 on various benchmarks.
00:06:24.440 However, there are some exceptions, such as the Pi Digits benchmark, which only calculates large numbers and doesn’t really evaluate the Ruby implementation.
00:06:47.110 In most cases, we see a significant gain in performance across the board.
00:07:01.730 For instance, when we run a micro-benchmark designed to assess speed, we see TruffleRuby running 30 times faster than CRuby 2.0.
00:07:15.410 The same trend continues in rendering template engines where TruffleRuby can outperform previous implementations by around 10 times.
00:07:50.390 The gains come from differences in approach, including how string representations are handled.
00:08:04.649 While we don't yet run Rails faster, we have begun supporting Rails benchmarks, although some bugs will need to be addressed.
00:08:31.900 Integrating extensions is a significant hurdle since they have more than 100 dependencies.
00:09:01.920 However, we have made progress on major extensions like OpenSSL, MySQL, and others.
00:09:23.470 Now, to focus on why TruffleRuby is fast: we utilize partial evaluation.
00:09:39.880 This means that the important parts of Ruby methods can be executed more quickly.
00:09:59.490 For example, we transform a simple Ruby code involving an array and a block into a compiler graph, representing how each Ruby operation executes.
00:10:26.880 Here, we can see how certain nodes in the graph simplify computational efficiency by allowing for direct execution with constant values.
00:10:58.670 This transformation helps TruffleRuby optimize memory access and reduce overhead through unrolling loops and optimizing storage.
00:11:19.950 Thus, minimizing unnecessary allocations leads to lower memory usage.
00:11:48.720 Importantly, partial evaluation allows us to track which values can remain constant, enabling further optimizations.
00:12:24.790 The optimization process continues with the Graal compiler refining the code, removing unnecessary checks, and efficiently allocating memory.
00:12:44.196 In summary, our ability to generate native assembly for efficient execution demonstrates how TruffleRuby can run Ruby code faster than other implementations.
00:13:30.710 But the significant challenge remains: how can we achieve thread safety and parallel access in Ruby.
00:14:02.500 Dynamic languages, particularly Ruby, often struggle with this due to global locks preventing code execution in parallel.
00:14:41.080 As a result, we must look towards creating collections that not only maintain thread safety, but also improve parallel execution.
00:15:07.660 One of the most significant problems is providing an interface that prevents exceptions when parallelizing collections.
00:15:30.990 This was my PhD thesis, and I believe it's crucial that we ensure collections like Arrays and Hashes can be used thread-safe while also enabling parallel access.
00:16:01.150 For instance, if we create an array and append elements safely while avoiding race conditions.
00:16:23.480 If multiple threads interact with the same object simultaneously, we must elegantly handle state without causing errors.
00:16:50.780 Using mechanisms like Coroutine and shared mutable state can help regulate access within Ruby.
00:17:12.740 I've designed the thread-safe approach to balance performance and the Ruby programming model.
00:17:35.160 Consequently, the methods I've created provide secure access to mutable structures while allowing parallel operations.
00:18:05.420 This ensures users can still define operations over their mutable state while benefiting from the enhancements of TruffleRuby.
00:18:30.500 Therefore, our implementation produces significant performance gains without sacrificing safety for concurrency.
00:19:03.840 Overall, the design supports both the convenience of dynamic languages like Ruby and the performance of compiled languages.
00:19:31.930 Now, looking back to our previous comparisons, we see that TruffleRuby's performance in parallel execution greatly outshines traditional implementations.
00:20:01.560 Ensuring that Ruby can run in parallel while keeping its dynamic nature proves beneficial for a diverse range of applications.
00:20:24.380 The journey towards achieving this integration of speed and safety is ongoing, and we'll continue to embrace opportunities for improvement.
00:20:40.400 Ultimately, TruffleRuby aims not only to enhance performance but also to enable existing Ruby applications to utilize concurrency smoothly.
00:21:03.540 To summarize, to realize the potential of Ruby under a concurrent environment, we must maintain performance and usability.
00:21:35.400 I look forward to the continued development and integration of these concepts in the Ruby community.
00:21:57.000 I appreciate everyone's participation in today's presentation and welcome any further questions.
00:22:20.180 Thank you again for your time!
00:22:51.920 Now let's move on to the Q&A session, where you can ask your questions regarding TruffleRuby and parallelism in Ruby.
00:23:11.230 Feel free to raise your hand, and I will be glad to address your queries.
00:23:53.570 I can assure you that performance is a vital aspect of the Ruby ecosystem, and your input is crucial as we navigate these improvements.
00:24:21.760 Now, let’s discuss any specific challenges or questions you might encounter using Ruby's new features. Your concerns are worth exploring.