Performance

Keynote: Parallel and Thread-Safe Ruby at High-Speed with TruffleRuby

Keynote: Parallel and Thread-Safe Ruby at High-Speed with TruffleRuby

by Benoit Daloze

Introduction

The keynote presentation titled "Parallel and Thread-Safe Ruby at High-Speed with TruffleRuby" by Benoit Daloze at RubyKaigi 2018 discusses crucial advancements in Ruby's performance, specifically emphasizing parallelism and thread safety achieved through the TruffleRuby implementation.

Key Points

  • TruffleRuby Overview:

    • Developed by Oracle Labs, TruffleRuby is a high-performance Ruby implementation using the Graal Just-in-Time compiler.
    • It aims for full compatibility with CRuby and offers significant performance improvements over traditional Ruby implementations.
  • Performance Metrics:

    • The presenter showcases benchmarks comparing CRuby 2.0, MRI with the MG flag, and TruffleRuby.
    • Initial speeds of 28-30 frames per second with CRuby increase to around 170-200 frames per second with TruffleRuby after JIT compiler warm-up, demonstrating a remarkable performance leap.
  • Ruby 3 Times 3 Project:

    • Aiming to make Ruby 3 three times faster than CRuby 2.0, this includes various CPU benchmarks and TruffleRuby's performance achievements.
  • Optimizations in TruffleRuby:

    • TruffleRuby utilizes partial evaluation, optimizing how Ruby methods execute by transforming them into a compiler graph for efficiency. This results in optimized memory access and reduced overhead.
  • Thread Safety and Parallel Access:

    • Addressing Ruby's historical challenges with parallel execution, Benoit Daloze discusses methods developed to create thread-safe collections (Arrays and Hashes) that maintain performance and usability.
    • The implementation allows multiple threads to safely interact with mutable structures without exceptions or race conditions.
  • Future Directions:

    • Emphasis on continuing improvements in parallel execution ensuring that Ruby can take advantage of modern multi-core processors while preserving the dynamic language characteristics.

Conclusion

The keynote concludes with a strong vision for the Ruby ecosystem, focused on enhancing performance and integrating concurrency effectively. Daloze encourages ongoing discussion and community engagement to address challenges and leverage Ruby's full potential in concurrent environments.

Overall, the session provides valuable insights into the advancements in Ruby implementation through TruffleRuby, emphasizing the need for high-speed, high-performance code that also upholds thread safety.

00:00:00.410 Hello, everyone. I would like to talk about parallelism and performance in Ruby.
00:00:05.819 My name is Bruno de Lewis, and I'm a student in Austria.
00:00:12.059 I've been researching concurrency and parallelism in Ruby. I've been working on TruffleRuby, which is another Ruby implementation.
00:00:18.480 I'm also the maintainer of RubySpec, a test suite for the Ruby programming language, and I'm involved with the Ruby core committee.
00:00:30.210 Today, I want to discuss performance first, and then I will talk about my research on parallel and thread-safe Ruby.
00:00:43.440 To begin, let me introduce what TruffleRuby is. How many of you know about TruffleRuby?
00:00:54.140 TruffleRuby is a high-performance implementation of Ruby developed by Oracle Labs. It uses the Graal Just-in-Time compiler to achieve high speed and aims for full compatibility with CRuby, including its C extensions.
00:01:10.740 There are two ways to run TruffleRuby. The first is on the Java Virtual Machine (JVM), which allows interaction with existing Java libraries, much like JRuby.
00:01:23.820 The second configuration, which is the default, compiles TruffleRuby and the Graal compiler ahead of time into a native executable.
00:01:29.369 In this mode, it starts very quickly and is faster on startup than MRI, for instance. It also has a fast warm-up time.
00:01:39.570 Because everything is already precompiled, the execution starts at a quicker pace. Moreover, it has a lower memory footprint than a typical JVM.
00:02:00.189 Now, I would like to talk about the Ruby 3 Times 3 project.
00:02:08.720 The goal of this project is to make Ruby 3 three times faster than CRuby 2.0. The main approach for achieving this for CPU benchmarks is through the Just-in-Time compiler.
00:02:22.230 But I have two questions regarding this project: Do we need to wait until Ruby 3, which will be in a few years, around 2020?
00:02:27.590 And can we be faster than Ruby 2.0? I want to demonstrate this by running Optcarrot, which is the main CPU benchmark for Ruby 3.
00:02:47.970 So first, we will start by running it with CRuby 2.0. This is the baseline. I'm playing the Landmaster game, which is the default game for this emulator, and as we can see, the speed is not very impressive, around 28-30 frames per second.
00:03:17.710 Next, we can try running the latest MRI with the MG flag activated and see if it gets faster.
00:03:36.160 The initial speed is about 40 frames per second, but it ramps up to about 50 frames per second.
00:03:43.570 Now, when we run this with TruffleRuby, it may initially start up slower because the JIT compiler is warming up and learning how the program behaves. However, once it understands, it begins to run faster.
00:04:06.200 Initially, the performance is around 7 frames per second, but soon after, we reach speeds of around 170-200 frames per second, which is a significant improvement.
00:04:29.870 This shows that TruffleRuby can achieve far higher performance compared to both MRI and the prior version.
00:04:43.440 It's crucial to highlight that the performance of Optcarrot is exceptional with TruffleRuby compared to other benchmarks.
00:05:18.420 Running without a graphical user interface, we can achieve almost 300 frames per second, demonstrating an impressive performance gain.
00:05:38.240 While this is just one benchmark, we perform very well on many CPU-based benchmarks, like the computer language benchmark game.
00:06:00.990 In many cases, we achieve performance improvements ranging from 10 to 30 times faster than CRuby 2.3 on various benchmarks.
00:06:24.440 However, there are some exceptions, such as the Pi Digits benchmark, which only calculates large numbers and doesn’t really evaluate the Ruby implementation.
00:06:47.110 In most cases, we see a significant gain in performance across the board.
00:07:01.730 For instance, when we run a micro-benchmark designed to assess speed, we see TruffleRuby running 30 times faster than CRuby 2.0.
00:07:15.410 The same trend continues in rendering template engines where TruffleRuby can outperform previous implementations by around 10 times.
00:07:50.390 The gains come from differences in approach, including how string representations are handled.
00:08:04.649 While we don't yet run Rails faster, we have begun supporting Rails benchmarks, although some bugs will need to be addressed.
00:08:31.900 Integrating extensions is a significant hurdle since they have more than 100 dependencies.
00:09:01.920 However, we have made progress on major extensions like OpenSSL, MySQL, and others.
00:09:23.470 Now, to focus on why TruffleRuby is fast: we utilize partial evaluation.
00:09:39.880 This means that the important parts of Ruby methods can be executed more quickly.
00:09:59.490 For example, we transform a simple Ruby code involving an array and a block into a compiler graph, representing how each Ruby operation executes.
00:10:26.880 Here, we can see how certain nodes in the graph simplify computational efficiency by allowing for direct execution with constant values.
00:10:58.670 This transformation helps TruffleRuby optimize memory access and reduce overhead through unrolling loops and optimizing storage.
00:11:19.950 Thus, minimizing unnecessary allocations leads to lower memory usage.
00:11:48.720 Importantly, partial evaluation allows us to track which values can remain constant, enabling further optimizations.
00:12:24.790 The optimization process continues with the Graal compiler refining the code, removing unnecessary checks, and efficiently allocating memory.
00:12:44.196 In summary, our ability to generate native assembly for efficient execution demonstrates how TruffleRuby can run Ruby code faster than other implementations.
00:13:30.710 But the significant challenge remains: how can we achieve thread safety and parallel access in Ruby.
00:14:02.500 Dynamic languages, particularly Ruby, often struggle with this due to global locks preventing code execution in parallel.
00:14:41.080 As a result, we must look towards creating collections that not only maintain thread safety, but also improve parallel execution.
00:15:07.660 One of the most significant problems is providing an interface that prevents exceptions when parallelizing collections.
00:15:30.990 This was my PhD thesis, and I believe it's crucial that we ensure collections like Arrays and Hashes can be used thread-safe while also enabling parallel access.
00:16:01.150 For instance, if we create an array and append elements safely while avoiding race conditions.
00:16:23.480 If multiple threads interact with the same object simultaneously, we must elegantly handle state without causing errors.
00:16:50.780 Using mechanisms like Coroutine and shared mutable state can help regulate access within Ruby.
00:17:12.740 I've designed the thread-safe approach to balance performance and the Ruby programming model.
00:17:35.160 Consequently, the methods I've created provide secure access to mutable structures while allowing parallel operations.
00:18:05.420 This ensures users can still define operations over their mutable state while benefiting from the enhancements of TruffleRuby.
00:18:30.500 Therefore, our implementation produces significant performance gains without sacrificing safety for concurrency.
00:19:03.840 Overall, the design supports both the convenience of dynamic languages like Ruby and the performance of compiled languages.
00:19:31.930 Now, looking back to our previous comparisons, we see that TruffleRuby's performance in parallel execution greatly outshines traditional implementations.
00:20:01.560 Ensuring that Ruby can run in parallel while keeping its dynamic nature proves beneficial for a diverse range of applications.
00:20:24.380 The journey towards achieving this integration of speed and safety is ongoing, and we'll continue to embrace opportunities for improvement.
00:20:40.400 Ultimately, TruffleRuby aims not only to enhance performance but also to enable existing Ruby applications to utilize concurrency smoothly.
00:21:03.540 To summarize, to realize the potential of Ruby under a concurrent environment, we must maintain performance and usability.
00:21:35.400 I look forward to the continued development and integration of these concepts in the Ruby community.
00:21:57.000 I appreciate everyone's participation in today's presentation and welcome any further questions.
00:22:20.180 Thank you again for your time!
00:22:51.920 Now let's move on to the Q&A session, where you can ask your questions regarding TruffleRuby and parallelism in Ruby.
00:23:11.230 Feel free to raise your hand, and I will be glad to address your queries.
00:23:53.570 I can assure you that performance is a vital aspect of the Ruby ecosystem, and your input is crucial as we navigate these improvements.
00:24:21.760 Now, let’s discuss any specific challenges or questions you might encounter using Ruby's new features. Your concerns are worth exploring.