Splitting: the Crucial Optimization for Ruby Blocks

00:00:00.439 Hello, everyone! If I speak too fast, please raise your hand, and I will try to slow down.

00:00:07.620 Today, I want to talk about splitting: the crucial optimization for Ruby blocks. Can you hear me well? Okay, perfect.

00:00:16.020 It's great to be in person again at RubyKaigi.

00:00:21.500 I work at Controversalabs in Zurich and have been involved with Ruby since 2014. I have a PhD in parallelism in dynamic languages, I'm the maintainer of the Ruby Spec, and I'm also a committer for TruffleRuby.

00:00:28.439 If you don't know, TruffleRuby is a high-performance Ruby implementation. It features the GraalVM Just-In-Time (JIT) compiler and aims for full compatibility with CRuby 3.1, including extensions.

00:00:40.440 For instance, it can run applications like Mastodon and Discourse with very minimal changes. You can find it on GitHub, Twitter, and Mastodon.

00:00:52.620 Today, I want to discuss splitting, and to do that, I want to go back to its origins. The concept of splitting originates from the Self programming language, which is very similar to Ruby.

00:01:05.280 The Self language, created in 1996, has influenced many fundamental research areas that are still relevant in almost all dynamic languages today.

00:01:18.420 For example, Self introduced maps or shapes to better represent objects, which Ruby has used since its inception and CRuby has incorporated since version 3.2.

00:01:30.180 They also introduced many optimizations, including Just-In-Time (JIT) compilation—converting Ruby code to machine code—and the reverse optimization, which translates machine code back to the interpreter.

00:01:41.880 This reverse optimization was important for debugging. They invented polymorphic inline caches, which I will explain in more detail later, as well as splitting, which was introduced in a 1989 paper by Craig Chambers and David Ungar at Stanford.

00:01:53.280 The remarkable aspect of that paper is its example, which is in Self code but directly applies to Ruby today. We will translate this example from Self code into Ruby.

00:02:06.920 In Self, they defined a method for summing numbers from a current number up to an upper bound, inclusive. This method demonstrates how splitting works, and the corresponding Ruby code is straightforward.

00:02:23.940 First, we set a variable sum to zero. We then call the step method from the current number to the upper bound, stepping by one. It's implicit, and during this step, we add the results to the sum.

00:02:41.880 However, you might notice that step up to only works for integers, but we want it to work for all numbers, including floating points, rationals, and big numbers.

00:02:54.120 This leads us to consider whether we can efficiently compile this number step to machine code. Analyzing this method, the crucial part is our inline step.

00:03:07.080 However, to inline the step method, we need to know which step method is being called, as there could be various implementations. From a static analysis perspective, we cannot glean that information without additional context.

00:03:31.200 To solve this, dynamic languages utilize inline caches, which are caches embedded in the representation the virtual machine uses for execution. In TruffleRuby, this inline cache exists within the bytecode representation.

00:03:50.940 For example, if we call sum2 with integers and floats, the inline cache will accommodate both cases by remembering the type of self at the call site.

00:04:09.299 This means when we look at our code, if self is confirmed to be an integer, we can safely determine that numeric_step is the method being called.

00:04:30.420 Then, the compiler analyzes numeric_step, which is notoriously complex due to its ability to be invoked in numerous ways.

00:04:45.240 The method numeric_step can be called in various ways: for instance, stepping by different values, descending, or using keyword arguments which introduces more complexity.

00:05:06.600 As a simplification for our presentation, we will adapt the numeric_step method to make it fit on the slide, removing edge cases and focusing on the core logic.

00:05:25.080 The important section of this method is the inner loop where the yield statement and the logic under it must be optimized. This calls for further inlining.

00:05:40.719 Our goal is to inline both calls: one to sum2 and one to step, and subsequently analyze their complexities to remove as many unnecessary checks as possible.

00:06:01.320 The execution of the numeric_step inner loop becomes clearer with simplifications since we can remove unnecessary checks when profiling arguments.

00:06:11.880 This streamlining allows us to reason about leveraging more accelerated optimizations for Ruby code, similar to what would be done in C.

00:06:20.760 When we execute this optimized code, our CPU can utilize predictable operations, speeding up the calculations significantly.

00:06:34.200 In essence, splitting makes it possible to create multiple copies of a method for different call sites without introducing the complexity of branching.

00:06:46.200 The copies may include slight tweaks depending on caller characteristics, resulting in more efficient execution.

00:07:07.320 For example, if we only have one specific block being used, we can extend optimizations by moving constant variables outside the loop.

00:07:24.720 In Ruby, this leaves us with the crucial need to balance complexity and performance when calling various methods and inlining their behaviors.

00:07:37.920 We find it necessary to reduce the number of block references, or else we suffer from the inability to optimize due to handling multi-calls in a single method.

00:07:49.680 For this reason, we rely on profiling and splitting to maintain efficient method calls, particularly when working with dynamic languages.

00:08:04.680 The ability to create multiple copies allows the compiler to optimize one specific version of step at a time, streamlining execution.

00:08:17.180 The goal is to develop a clearer and faster Ruby implementation through a finely tuned process, optimizing logic for Ruby code given the context.

00:08:30.420 The benefits of splitting become apparent when it further reduces the number of method calls per block invoked, leading to faster compilation overall.

00:08:42.300 By profiling arguments, we allow subsequent methods and calls to become more streamlined and efficient, just as seen in statically typed languages.

00:08:55.740 In Ruby applications, methods might behave akin to C code if properly optimized, which is part of our goal.

00:09:10.440 Thus, we aim to maintain a competitive edge over traditional Ruby implementations by introducing these state-of-the-art optimizations.

00:09:23.040 As we move forward, the goal is to leverage advanced JIT compilation techniques, ensuring we can handle even complex expressions.

00:09:36.000 The latest metrics illustrate that splitting provides a noticeable speed up, with benchmarks presenting drastic performance improvements.

00:09:48.420 TruffleRuby without splitting demonstrates a significant increase in performance over the standard Ruby implementation.

00:10:00.300 In multiple benchmarks, you might find Ruby native code to perform comparably to C when optimizing Ruby effectively using splitting.

00:10:12.840 As we continue to explore these Ruby implementations, we see consistent performance increases attributed to splitting.

00:10:26.400 The benchmarks from profiling approaches yield meaningful insights into not just strengthening performance, but persistence as well.

00:10:39.660 By addressing the splits, we ensure uniform behavior across various blocks and method chains, reducing overhead in methods effectively.

00:10:52.020 With regards to practical implementations, it is vital to maintain both flexibility and speed in Ruby while seeing immense value through splitting.

00:11:05.580 The application of a systematic approach to profiling enables TruffleRuby to achieve benchmarks reaching up to 115 times faster than CRuby.

00:11:18.180 This is especially significant for library methods like opt_carrot, which capitalize on optimizations, reflecting strengthened performance.

00:11:32.220 The introduction of parallel execution allows for better processor utilization while executing these optimizations effectively in parallel.

00:11:46.140 In conclusion, integration with GraalVM opens a vast frontier for Ruby applications, bridging language features and enhanced code interoperability.

00:12:00.300 With that said, as we push for technical advancements in Ruby, we must recognize the potential these innovations hold for the future.

00:12:14.520 I would like to share additional findings and position ourselves for the upcoming interactions adhering closely to user and developer needs.

00:12:26.160 So feel free to ask questions following the talk, and I'd be eager to discuss complementing these ideas through collaborative efforts.

00:12:40.440 Thank you very much for your attention! I appreciate your time and engagement.