00:00:00.439
Hello, everyone! If I speak too fast, please raise your hand, and I will try to slow down.
00:00:07.620
Today, I want to talk about splitting: the crucial optimization for Ruby blocks. Can you hear me well? Okay, perfect.
00:00:16.020
It's great to be in person again at RubyKaigi.
00:00:21.500
I work at Controversalabs in Zurich and have been involved with Ruby since 2014. I have a PhD in parallelism in dynamic languages, I'm the maintainer of the Ruby Spec, and I'm also a committer for TruffleRuby.
00:00:28.439
If you don't know, TruffleRuby is a high-performance Ruby implementation. It features the GraalVM Just-In-Time (JIT) compiler and aims for full compatibility with CRuby 3.1, including extensions.
00:00:40.440
For instance, it can run applications like Mastodon and Discourse with very minimal changes. You can find it on GitHub, Twitter, and Mastodon.
00:00:52.620
Today, I want to discuss splitting, and to do that, I want to go back to its origins. The concept of splitting originates from the Self programming language, which is very similar to Ruby.
00:01:05.280
The Self language, created in 1996, has influenced many fundamental research areas that are still relevant in almost all dynamic languages today.
00:01:18.420
For example, Self introduced maps or shapes to better represent objects, which Ruby has used since its inception and CRuby has incorporated since version 3.2.
00:01:30.180
They also introduced many optimizations, including Just-In-Time (JIT) compilation—converting Ruby code to machine code—and the reverse optimization, which translates machine code back to the interpreter.
00:01:41.880
This reverse optimization was important for debugging. They invented polymorphic inline caches, which I will explain in more detail later, as well as splitting, which was introduced in a 1989 paper by Craig Chambers and David Ungar at Stanford.
00:01:53.280
The remarkable aspect of that paper is its example, which is in Self code but directly applies to Ruby today. We will translate this example from Self code into Ruby.
00:02:06.920
In Self, they defined a method for summing numbers from a current number up to an upper bound, inclusive. This method demonstrates how splitting works, and the corresponding Ruby code is straightforward.
00:02:23.940
First, we set a variable sum to zero. We then call the step method from the current number to the upper bound, stepping by one. It's implicit, and during this step, we add the results to the sum.
00:02:41.880
However, you might notice that step up to only works for integers, but we want it to work for all numbers, including floating points, rationals, and big numbers.
00:02:54.120
This leads us to consider whether we can efficiently compile this number step to machine code. Analyzing this method, the crucial part is our inline step.
00:03:07.080
However, to inline the step method, we need to know which step method is being called, as there could be various implementations. From a static analysis perspective, we cannot glean that information without additional context.
00:03:31.200
To solve this, dynamic languages utilize inline caches, which are caches embedded in the representation the virtual machine uses for execution. In TruffleRuby, this inline cache exists within the bytecode representation.
00:03:50.940
For example, if we call sum2 with integers and floats, the inline cache will accommodate both cases by remembering the type of self at the call site.
00:04:09.299
This means when we look at our code, if self is confirmed to be an integer, we can safely determine that numeric_step is the method being called.
00:04:30.420
Then, the compiler analyzes numeric_step, which is notoriously complex due to its ability to be invoked in numerous ways.
00:04:45.240
The method numeric_step can be called in various ways: for instance, stepping by different values, descending, or using keyword arguments which introduces more complexity.
00:05:06.600
As a simplification for our presentation, we will adapt the numeric_step method to make it fit on the slide, removing edge cases and focusing on the core logic.
00:05:25.080
The important section of this method is the inner loop where the yield statement and the logic under it must be optimized. This calls for further inlining.
00:05:40.719
Our goal is to inline both calls: one to sum2 and one to step, and subsequently analyze their complexities to remove as many unnecessary checks as possible.
00:06:01.320
The execution of the numeric_step inner loop becomes clearer with simplifications since we can remove unnecessary checks when profiling arguments.
00:06:11.880
This streamlining allows us to reason about leveraging more accelerated optimizations for Ruby code, similar to what would be done in C.
00:06:20.760
When we execute this optimized code, our CPU can utilize predictable operations, speeding up the calculations significantly.
00:06:34.200
In essence, splitting makes it possible to create multiple copies of a method for different call sites without introducing the complexity of branching.
00:06:46.200
The copies may include slight tweaks depending on caller characteristics, resulting in more efficient execution.
00:07:07.320
For example, if we only have one specific block being used, we can extend optimizations by moving constant variables outside the loop.
00:07:24.720
In Ruby, this leaves us with the crucial need to balance complexity and performance when calling various methods and inlining their behaviors.
00:07:37.920
We find it necessary to reduce the number of block references, or else we suffer from the inability to optimize due to handling multi-calls in a single method.
00:07:49.680
For this reason, we rely on profiling and splitting to maintain efficient method calls, particularly when working with dynamic languages.
00:08:04.680
The ability to create multiple copies allows the compiler to optimize one specific version of step at a time, streamlining execution.
00:08:17.180
The goal is to develop a clearer and faster Ruby implementation through a finely tuned process, optimizing logic for Ruby code given the context.
00:08:30.420
The benefits of splitting become apparent when it further reduces the number of method calls per block invoked, leading to faster compilation overall.
00:08:42.300
By profiling arguments, we allow subsequent methods and calls to become more streamlined and efficient, just as seen in statically typed languages.
00:08:55.740
In Ruby applications, methods might behave akin to C code if properly optimized, which is part of our goal.
00:09:10.440
Thus, we aim to maintain a competitive edge over traditional Ruby implementations by introducing these state-of-the-art optimizations.
00:09:23.040
As we move forward, the goal is to leverage advanced JIT compilation techniques, ensuring we can handle even complex expressions.
00:09:36.000
The latest metrics illustrate that splitting provides a noticeable speed up, with benchmarks presenting drastic performance improvements.
00:09:48.420
TruffleRuby without splitting demonstrates a significant increase in performance over the standard Ruby implementation.
00:10:00.300
In multiple benchmarks, you might find Ruby native code to perform comparably to C when optimizing Ruby effectively using splitting.
00:10:12.840
As we continue to explore these Ruby implementations, we see consistent performance increases attributed to splitting.
00:10:26.400
The benchmarks from profiling approaches yield meaningful insights into not just strengthening performance, but persistence as well.
00:10:39.660
By addressing the splits, we ensure uniform behavior across various blocks and method chains, reducing overhead in methods effectively.
00:10:52.020
With regards to practical implementations, it is vital to maintain both flexibility and speed in Ruby while seeing immense value through splitting.
00:11:05.580
The application of a systematic approach to profiling enables TruffleRuby to achieve benchmarks reaching up to 115 times faster than CRuby.
00:11:18.180
This is especially significant for library methods like opt_carrot, which capitalize on optimizations, reflecting strengthened performance.
00:11:32.220
The introduction of parallel execution allows for better processor utilization while executing these optimizations effectively in parallel.
00:11:46.140
In conclusion, integration with GraalVM opens a vast frontier for Ruby applications, bridging language features and enhanced code interoperability.
00:12:00.300
With that said, as we push for technical advancements in Ruby, we must recognize the potential these innovations hold for the future.
00:12:14.520
I would like to share additional findings and position ourselves for the upcoming interactions adhering closely to user and developer needs.
00:12:26.160
So feel free to ask questions following the talk, and I'd be eager to discuss complementing these ideas through collaborative efforts.
00:12:40.440
Thank you very much for your attention! I appreciate your time and engagement.