00:00:00
Hello everyone.
00:00:13.380
My name is Maple Ong.
00:00:15.240
Thank you, you’re all very kind. Today, we'll be looking into peephole optimizations.
00:00:18.619
I'm going to be saying 'peephole optimizations' a lot, so bear with me. Someone please keep a counter because I’ll be curious to know how many times I say it. Before I begin, I wanted to share that I work with Gusto, a company that provides HR and payroll services for small businesses. It's been really good so far, so if you're interested, reach out!
00:00:39.600
Previously, I worked on Truffle Ruby at Shopify. Truffle Ruby is a high-performance Ruby implementation built on top of the Graal VM and the Truffle framework. Working on this team inspired me to learn more about programming languages and how they're implemented, specifically Ruby.
00:01:12.240
Peephole optimization is an established technique; it was first discussed back in the 1960s. This is the earliest paper I could find on the topic. Now, the 1960s were a long time ago, and thank God computers have evolved a lot since then. So, you might wonder why I’m giving a talk about such a dated compiler optimization. I believe it's interesting and fun to learn about, which is personally why I do anything at work. After all, if it’s not interesting or fun, what are we doing?
00:01:42.540
But I also want to share with you another, perhaps better, reason to learn this foundational technique: it makes you a better Ruby developer! For example, learning how Ruby executes its code will help with debugging and problem-solving in your day-to-day work. If you have experience with compilers already, I hope this talk refreshes your memory on the topic and that you learn something new about how Ruby implements this particular optimization.
00:02:07.200
And if you’re a complete beginner, this talk is intended for you too. I hope you learn more about the Ruby compiler and this optimization, and I hope it sparks your interest to learn more after this talk. Before we begin, let me show you what to expect: we will discuss what peephole optimization is, when it happens in Ruby, and how Ruby implements it. Then we will move on to examples of peephole optimizations in Ruby, learn about some YARV instructions, and discuss limitations and benchmarks.
00:02:57.840
So, what is peephole optimization? It involves looking at a small set of code, usually intermediary instructions or bytecode. We will look at what this looks like in a moment. The goal is to find and change these instructions to make them simpler and more performant.
00:03:04.200
Because this optimization only looks at several instructions at once, it is called peephole optimization, or sometimes window optimization. Personally, I think it should be named 'window cleaner' because it sounds cooler than peephole optimization! Another important note is that this optimization is used in other layers of the stack and in other languages. For example, it's also used in Python and can even optimize assembly code. JIT (Just-In-Time) compilation also employs this technique.
00:03:58.159
Today, however, we will be focusing on peephole optimization in the context of Ruby, specifically in the C Ruby or MRI implementation. Another point to note is that Ruby already utilizes peephole optimization; it's enabled by default. So whenever you run Ruby files, like with 'ruby foo.rb', it's already turned on.
00:04:19.380
Now, let’s discuss when peephole optimization happens in Ruby. To give you an overview, let me briefly go through the entire Ruby execution process with you. On the far left, we have the Ruby code that we’ve all written. When you execute your Ruby code, it gets passed into a tokenizer which produces tokens. These tokens are then passed into a parser which outputs AST (Abstract Syntax Tree) nodes. If you attended Kevin Newton's talk, you will know all about parsing and AST nodes.
00:05:04.660
The AST is then fed into a compiler, which outputs bytecode for execution by the virtual machine. The optimization specifically happens between the compiler generating bytecode and the execution by the virtual machine.
00:05:36.480
To clarify, when I say bytecode, I'm referring to YARV instructions, the specific instruction sequence for Ruby. When I mention the VM or YARV, it's referring to the same thing because, in Ruby, YARV stands for Yet Another Ruby VM.
00:06:44.040
Let's now look at some examples to see what YARV instructions look like. For instance, consider the instruction 'puts 9 plus 8.' I often make the joke that I can't do mental math, so I have to type it out in Ruby! We will place this code into a variable called source code and then use the Ruby library called Ruby VM Instruction Sequence to compile and disassemble it. This will produce human-readable output.
00:07:14.580
When you run this, this is the output. I don't want to overwhelm anyone, but I want to highlight that this is what we will be examining today—YARV instructions. We will simplify this because there are 'put self' and 'put object' instructions alongside the number 9. These references serve as the arguments for the 'put object' instruction, which is similar to a method call.
00:08:33.660
During the compilation phase, Ruby sources instruction sequences as a linked list structure. This linked list eventually gets converted into an array later in the execution process. To connect this to peephole optimizations: after the compiler generates bytecode, that bytecode is then optimized before being passed to the virtual machine for execution.
00:09:54.780
Let’s dive into how Ruby implements peephole optimization. I know we all enjoy reading C code, so I want to present the 'iseek optimize' function found in compile.c. This file houses the code that translates the tree nodes into virtual machine instruction sequences. I've simplified the function to make it easier to grasp for this talk. The first thing we do is set up optimization options, as it's C programming.
00:11:00.060
Next, we loop through each element of the linked list, which contains the instruction sequences. We check whether the element is an instruction, since there are various types of instructions present. If it is, we verify whether peephole optimization is enabled. As I mentioned earlier, peephole optimization is on by default, but you can switch it off if you'd like.
00:12:03.680
During this process, various other optimizations take place, like specialized instruction optimizations. At the end of the loop, we move to the next instruction in the linked list. To recap, the 'iseek optimize' function traverses each element in the linked list and calls 'iseek peephole optimize' on them.
00:13:28.659
Now let's examine 'iseek peephole optimize' in follow.c. It's structured in a straightforward manner: when the function receives the instruction, all it does is check it against a series of if statements. For example, we check if the element is a new range—this is a type of YARV instruction. If it meets specific conditions, we optimize the linked list accordingly. Essentially, it works as a pattern matching and conditional replacement function.
00:14:26.700
I also want to share a really cool paper that discusses an approach to implementing peephole optimization using string pattern matching. Those interested in pattern matching might find this intriguing! Returning to our discussion, there are over 20 peephole optimization cases documented. If you're interested in specifics, I highly recommend checking out the function, as it's well-documented with before-and-after examples illustrating how the bytecode changes.
00:15:48.180
To summarize, the 'iseek' function traverses each element and calls 'peephole optimize' on any instruction that meets the optimization criteria, ultimately resulting in smaller and more performant instruction sequences.
00:17:06.540
Now, let's dive into some examples of peephole optimizations in Ruby. There are three types of optimizations we can perform: combining multiple instructions into a single instruction, eliminating redundant instructions, and using more efficient instructions.
00:18:36.819
The first optimization involves combining several instructions into one single instruction, which saves space. For instance, if we have a range from the string 'A' to 'Z', the double-dot indicates that it’s inclusive. If you’re unfamiliar with Ruby, calling count on it will return 26, the number of letters in the alphabet. We will do something similar to the previous example where we put this into a variable called source code, then compile and disassemble it to see what happens.
00:19:57.740
In this case, peephole optimization is turned off, and here is the output. The instructions are quite lengthy, and let’s break down what each instruction does. The first is 'put string A', which pushes the string literal onto the stack. Next, 'put string Z' does the same with 'Z'. The instruction 'new range 0' creates a range and takes one argument—zero if the end is inclusive.
00:21:30.120
By examining how YARV executes this sequence of instructions, we note that 'put string A' pushes 'A' onto the stack; 'put string Z' pushes 'Z' onto the stack. Then, 'new range 0' pops two objects off the stack, creates a new inclusive range, and pushes that back onto the stack, before we return.
00:22:40.080
Now, let’s examine what the bytecode looks like when the optimization is enabled. It is significantly shorter since it employs just one 'put object' instruction. 'Put object' pushes a known value onto the stack, in this case, the inclusive range from 'A' to 'Z'. This demonstrates that we have reduced the number of required instructions by three!
00:23:38.220
In a micro-benchmark I ran, this was 5.4 times faster. Now, moving on to the second example, we'll discuss eliminating redundant instructions.
00:24:14.660
In our Ruby code, 'hello world' is initialized into an array, which evaluates to an array containing 'hello world'. Conducting the same process as before, I turned off peephole optimization to observe the output. Here, 'put string hello' and 'put string world' are executed, along with 'new array', which initializes with two values from the stack.
00:25:21.180
Under normal circumstances, the second call to 'spot array' would pop one element of the stack and push it back after invoking '2A' on it. However, with peephole optimization turned on, we see that 'spot array' is omitted since we know that 'new array' will always return an array on the stack.
00:26:23.060
This results in a 1.3 times faster execution! Now, let's take a look again at eliminating redundant instructions.
00:27:20.140
In this example, we assign the string 'gummies are the best' to variable x. For some reason, we then want to assign x to x again! This very funny anecdote connects to a joke I shared with a friend about who would assign x to x, only to discover that he recently did that in a Rails commit.
00:28:01.440
This was to silence the warning about assigned but unused variables. So, if you're curious about the bytecode generated, it remains the same even though it's written differently.
00:28:53.880
Now, returning to my example with 'gummies', after turning optimization off, we receive the following output. The instruction 'set local at index 0' is noteworthy, as it sets the value of a single local variable.
00:29:33.480
Now, if we execute larger-scale instruction tests with peephole optimization enabled, the output will vary slightly. Again, we set local for the single local variable x, returning with a slight performance improvement.
00:30:14.639
The micro-benchmark shows a subtle difference between performance but is still relevant to note when optimizing. Lastly, let’s explore using more efficient instructions.
00:30:55.200
For illustration, we assign an array containing 1 and 2 to the variable foo, then splat foo into an array containing foo and 3. We disable peephole optimization to observe the output.
00:31:33.239
This represents a fairly lengthy process: 'dupe array' pushes values from the 1 to 2 range to the stack, followed by 'set local', which sets the value of foo and 'get local' that retrieves that value.
00:32:21.899
When peephole optimization is turned on, we see noticeable differences where 'dupe array' is replaced by 'put object.' The key distinction lies in how 'put object' does not allocate an array, whereas 'dupe array' does.
00:33:07.920
A recent micro-benchmark demonstrated a 1.26 times faster runtime. Those are all the examples; now let’s move into discussing limitations.
00:34:23.520
You've probably observed limitations inherent in the examples I've provided: first, it has a narrow scope; the optimizer can see only a limited context and visibility into the code when making optimization decisions.
00:35:10.320
Secondly, peephole optimization applies only to very specific bytecode and code-writing styles. Lastly, for Ruby, humans must find and implement peephole optimizations.
00:36:37.620
A question you might be asking is: why doesn't the compiler just generate optimized code? The reason is that it's simply how compilers are implemented. Compilers often split their optimization processes from the code generation process due to the dynamic nature of Ruby.
00:37:33.679
For instance, Ruby's dynamic language allows methods to be monkey-patched, meaning the compiler cannot predict which methods will be called until runtime. Because of these characteristics, some optimizations found in statically-typed languages, such as algebraic simplification or expression rearrangement, are considerably harder to implement in Ruby.
00:38:25.680
Notably, Ruby still does not perform inline methods, which would grant more opportunities for developers to optimize as inline methods pull all instructions into one long sequence, thus matching optimization conditions better.
00:39:31.319
Coming to the conclusion, I utilized a benchmarking library known as yjit-bench to measure the impact of peephole optimization—this process has shown modest improvements in performance when enabled.
00:40:30.120
While I haven’t gone through every single benchmark I mentioned, it's worth noting that there is a reduction in the amount of instructions emitted, which in turn means less memory is used.
00:41:29.379
I want to emphasize that I truly enjoy learning about these topics! If you’re interested in diving deeper or if I've sparked your curiosity about peephole optimizations, I encourage you to explore further.
00:42:16.020
I’ll share some resources: first, YARV, which was a hack day project at Shopify championed by Kevin Newton, who gave a talk yesterday. Essentially, we implemented YARV in Ruby, and we've also documented many of the YARV instructions comprehensively.
00:43:24.120
Another resource is a pull request by John Hawthorne, who gave a talk on the first day of this conference. He identified opportunities for optimization in his code change, so if you're ever interested in contributing to Ruby, please make a pull request!
00:44:38.020
Thank you for your time. You can DM or tweet me at Ong Maple. I would also like to thank everyone here for your gracious attention.