Looking Into Peephole Optimizations

by Maple Ong

In the presentation titled "Looking Into Peephole Optimizations" by Maple Ong at RubyConf 2022, the concept of peephole optimizations in the Ruby Virtual Machine is explored. This established technique, which dates back to the 1960s, focuses on improving the performance of bytecode by examining small sequences of instructions and simplifying them for improved efficiency.

Key Points Discussed:

Introduction to Peephole Optimization: The term refers to an optimization technique that looks at a small set of intermediary instructions or bytecode in Ruby, aiming to simplify and enhance performance.
Importance of Understanding Optimizations: Comprehending how Ruby executes code can significantly aid developers in debugging and problem-solving.
Execution Process of Ruby Code: The flow from Ruby code to execution involves several stages, including tokenization, parsing into Abstract Syntax Trees (AST), and compiling into bytecode. Peephole optimization occurs after bytecode generation and before execution by the Ruby Virtual Machine (YARV).
Implementation Details: Key functions within Ruby's C implementation are examined, detailing how optimization happens through traversing linked lists of instructions and applying specific criteria for simplification.
Examples of Optimizations:
- Combining Instructions: An example demonstrated how multiple instructions can be merged into fewer, saving space and improving performance significantly (5.4x faster performance).
- Eliminating Redundant Instructions: By identifying unnecessary operations, additional instruction steps are avoided, such as with arrays initialized multiple times (1.3x performance improvement).
- Using More Efficient Instructions: Changing inefficient instructions to more direct instructions decreased overhead and increased speed (1.26x faster).
Limitations of Peephole Optimization: The talk addresses constraints like narrow scope and specific applicability, reinforcing that Ruby's dynamic nature complicates static code optimizations often seen in more rigid, statically-typed languages.
Conclusions and Future Exploration: The benchmarking results confirm modest performance improvements when peephole optimizations are applied, demonstrating the value of understanding these techniques for Ruby developers.

Main Takeaways:

Understanding peephole optimizations can enhance debugging and efficiency in Ruby development.
There are significant numbers of documented peephole optimizations that help improve performance by altering instruction sequences for better execution efficiency.
Engaging with these optimizations not only benefits immediate performance but also fosters a deeper comprehension of Ruby's operational mechanics.

Maple Ong encourages developers to further explore this foundational topic to improve their Ruby programming practices and performance efficiencies.

00:00:00 Hello everyone.

00:00:13.380 My name is Maple Ong.

00:00:15.240 Thank you, you’re all very kind. Today, we'll be looking into peephole optimizations.

00:00:18.619 I'm going to be saying 'peephole optimizations' a lot, so bear with me. Someone please keep a counter because I’ll be curious to know how many times I say it. Before I begin, I wanted to share that I work with Gusto, a company that provides HR and payroll services for small businesses. It's been really good so far, so if you're interested, reach out!

00:00:39.600 Previously, I worked on Truffle Ruby at Shopify. Truffle Ruby is a high-performance Ruby implementation built on top of the Graal VM and the Truffle framework. Working on this team inspired me to learn more about programming languages and how they're implemented, specifically Ruby.

00:01:12.240 Peephole optimization is an established technique; it was first discussed back in the 1960s. This is the earliest paper I could find on the topic. Now, the 1960s were a long time ago, and thank God computers have evolved a lot since then. So, you might wonder why I’m giving a talk about such a dated compiler optimization. I believe it's interesting and fun to learn about, which is personally why I do anything at work. After all, if it’s not interesting or fun, what are we doing?

00:01:42.540 But I also want to share with you another, perhaps better, reason to learn this foundational technique: it makes you a better Ruby developer! For example, learning how Ruby executes its code will help with debugging and problem-solving in your day-to-day work. If you have experience with compilers already, I hope this talk refreshes your memory on the topic and that you learn something new about how Ruby implements this particular optimization.

00:02:07.200 And if you’re a complete beginner, this talk is intended for you too. I hope you learn more about the Ruby compiler and this optimization, and I hope it sparks your interest to learn more after this talk. Before we begin, let me show you what to expect: we will discuss what peephole optimization is, when it happens in Ruby, and how Ruby implements it. Then we will move on to examples of peephole optimizations in Ruby, learn about some YARV instructions, and discuss limitations and benchmarks.

00:02:57.840 So, what is peephole optimization? It involves looking at a small set of code, usually intermediary instructions or bytecode. We will look at what this looks like in a moment. The goal is to find and change these instructions to make them simpler and more performant.

00:03:04.200 Because this optimization only looks at several instructions at once, it is called peephole optimization, or sometimes window optimization. Personally, I think it should be named 'window cleaner' because it sounds cooler than peephole optimization! Another important note is that this optimization is used in other layers of the stack and in other languages. For example, it's also used in Python and can even optimize assembly code. JIT (Just-In-Time) compilation also employs this technique.

00:03:58.159 Today, however, we will be focusing on peephole optimization in the context of Ruby, specifically in the C Ruby or MRI implementation. Another point to note is that Ruby already utilizes peephole optimization; it's enabled by default. So whenever you run Ruby files, like with 'ruby foo.rb', it's already turned on.

00:04:19.380 Now, let’s discuss when peephole optimization happens in Ruby. To give you an overview, let me briefly go through the entire Ruby execution process with you. On the far left, we have the Ruby code that we’ve all written. When you execute your Ruby code, it gets passed into a tokenizer which produces tokens. These tokens are then passed into a parser which outputs AST (Abstract Syntax Tree) nodes. If you attended Kevin Newton's talk, you will know all about parsing and AST nodes.

00:05:04.660 The AST is then fed into a compiler, which outputs bytecode for execution by the virtual machine. The optimization specifically happens between the compiler generating bytecode and the execution by the virtual machine.

00:05:36.480 To clarify, when I say bytecode, I'm referring to YARV instructions, the specific instruction sequence for Ruby. When I mention the VM or YARV, it's referring to the same thing because, in Ruby, YARV stands for Yet Another Ruby VM.

00:06:44.040 Let's now look at some examples to see what YARV instructions look like. For instance, consider the instruction 'puts 9 plus 8.' I often make the joke that I can't do mental math, so I have to type it out in Ruby! We will place this code into a variable called source code and then use the Ruby library called Ruby VM Instruction Sequence to compile and disassemble it. This will produce human-readable output.

00:07:14.580 When you run this, this is the output. I don't want to overwhelm anyone, but I want to highlight that this is what we will be examining today—YARV instructions. We will simplify this because there are 'put self' and 'put object' instructions alongside the number 9. These references serve as the arguments for the 'put object' instruction, which is similar to a method call.

00:08:33.660 During the compilation phase, Ruby sources instruction sequences as a linked list structure. This linked list eventually gets converted into an array later in the execution process. To connect this to peephole optimizations: after the compiler generates bytecode, that bytecode is then optimized before being passed to the virtual machine for execution.

00:09:54.780 Let’s dive into how Ruby implements peephole optimization. I know we all enjoy reading C code, so I want to present the 'iseek optimize' function found in compile.c. This file houses the code that translates the tree nodes into virtual machine instruction sequences. I've simplified the function to make it easier to grasp for this talk. The first thing we do is set up optimization options, as it's C programming.

00:11:00.060 Next, we loop through each element of the linked list, which contains the instruction sequences. We check whether the element is an instruction, since there are various types of instructions present. If it is, we verify whether peephole optimization is enabled. As I mentioned earlier, peephole optimization is on by default, but you can switch it off if you'd like.

00:12:03.680 During this process, various other optimizations take place, like specialized instruction optimizations. At the end of the loop, we move to the next instruction in the linked list. To recap, the 'iseek optimize' function traverses each element in the linked list and calls 'iseek peephole optimize' on them.

00:13:28.659 Now let's examine 'iseek peephole optimize' in follow.c. It's structured in a straightforward manner: when the function receives the instruction, all it does is check it against a series of if statements. For example, we check if the element is a new range—this is a type of YARV instruction. If it meets specific conditions, we optimize the linked list accordingly. Essentially, it works as a pattern matching and conditional replacement function.

00:14:26.700 I also want to share a really cool paper that discusses an approach to implementing peephole optimization using string pattern matching. Those interested in pattern matching might find this intriguing! Returning to our discussion, there are over 20 peephole optimization cases documented. If you're interested in specifics, I highly recommend checking out the function, as it's well-documented with before-and-after examples illustrating how the bytecode changes.

00:15:48.180 To summarize, the 'iseek' function traverses each element and calls 'peephole optimize' on any instruction that meets the optimization criteria, ultimately resulting in smaller and more performant instruction sequences.

00:17:06.540 Now, let's dive into some examples of peephole optimizations in Ruby. There are three types of optimizations we can perform: combining multiple instructions into a single instruction, eliminating redundant instructions, and using more efficient instructions.

00:18:36.819 The first optimization involves combining several instructions into one single instruction, which saves space. For instance, if we have a range from the string 'A' to 'Z', the double-dot indicates that it’s inclusive. If you’re unfamiliar with Ruby, calling count on it will return 26, the number of letters in the alphabet. We will do something similar to the previous example where we put this into a variable called source code, then compile and disassemble it to see what happens.

00:19:57.740 In this case, peephole optimization is turned off, and here is the output. The instructions are quite lengthy, and let’s break down what each instruction does. The first is 'put string A', which pushes the string literal onto the stack. Next, 'put string Z' does the same with 'Z'. The instruction 'new range 0' creates a range and takes one argument—zero if the end is inclusive.

00:21:30.120 By examining how YARV executes this sequence of instructions, we note that 'put string A' pushes 'A' onto the stack; 'put string Z' pushes 'Z' onto the stack. Then, 'new range 0' pops two objects off the stack, creates a new inclusive range, and pushes that back onto the stack, before we return.

00:22:40.080 Now, let’s examine what the bytecode looks like when the optimization is enabled. It is significantly shorter since it employs just one 'put object' instruction. 'Put object' pushes a known value onto the stack, in this case, the inclusive range from 'A' to 'Z'. This demonstrates that we have reduced the number of required instructions by three!

00:23:38.220 In a micro-benchmark I ran, this was 5.4 times faster. Now, moving on to the second example, we'll discuss eliminating redundant instructions.

00:24:14.660 In our Ruby code, 'hello world' is initialized into an array, which evaluates to an array containing 'hello world'. Conducting the same process as before, I turned off peephole optimization to observe the output. Here, 'put string hello' and 'put string world' are executed, along with 'new array', which initializes with two values from the stack.

00:25:21.180 Under normal circumstances, the second call to 'spot array' would pop one element of the stack and push it back after invoking '2A' on it. However, with peephole optimization turned on, we see that 'spot array' is omitted since we know that 'new array' will always return an array on the stack.

00:26:23.060 This results in a 1.3 times faster execution! Now, let's take a look again at eliminating redundant instructions.

00:27:20.140 In this example, we assign the string 'gummies are the best' to variable x. For some reason, we then want to assign x to x again! This very funny anecdote connects to a joke I shared with a friend about who would assign x to x, only to discover that he recently did that in a Rails commit.

00:28:01.440 This was to silence the warning about assigned but unused variables. So, if you're curious about the bytecode generated, it remains the same even though it's written differently.

00:28:53.880 Now, returning to my example with 'gummies', after turning optimization off, we receive the following output. The instruction 'set local at index 0' is noteworthy, as it sets the value of a single local variable.

00:29:33.480 Now, if we execute larger-scale instruction tests with peephole optimization enabled, the output will vary slightly. Again, we set local for the single local variable x, returning with a slight performance improvement.

00:30:14.639 The micro-benchmark shows a subtle difference between performance but is still relevant to note when optimizing. Lastly, let’s explore using more efficient instructions.

00:30:55.200 For illustration, we assign an array containing 1 and 2 to the variable foo, then splat foo into an array containing foo and 3. We disable peephole optimization to observe the output.

00:31:33.239 This represents a fairly lengthy process: 'dupe array' pushes values from the 1 to 2 range to the stack, followed by 'set local', which sets the value of foo and 'get local' that retrieves that value.

00:32:21.899 When peephole optimization is turned on, we see noticeable differences where 'dupe array' is replaced by 'put object.' The key distinction lies in how 'put object' does not allocate an array, whereas 'dupe array' does.

00:33:07.920 A recent micro-benchmark demonstrated a 1.26 times faster runtime. Those are all the examples; now let’s move into discussing limitations.

00:34:23.520 You've probably observed limitations inherent in the examples I've provided: first, it has a narrow scope; the optimizer can see only a limited context and visibility into the code when making optimization decisions.

00:35:10.320 Secondly, peephole optimization applies only to very specific bytecode and code-writing styles. Lastly, for Ruby, humans must find and implement peephole optimizations.

00:36:37.620 A question you might be asking is: why doesn't the compiler just generate optimized code? The reason is that it's simply how compilers are implemented. Compilers often split their optimization processes from the code generation process due to the dynamic nature of Ruby.

00:37:33.679 For instance, Ruby's dynamic language allows methods to be monkey-patched, meaning the compiler cannot predict which methods will be called until runtime. Because of these characteristics, some optimizations found in statically-typed languages, such as algebraic simplification or expression rearrangement, are considerably harder to implement in Ruby.

00:38:25.680 Notably, Ruby still does not perform inline methods, which would grant more opportunities for developers to optimize as inline methods pull all instructions into one long sequence, thus matching optimization conditions better.

00:39:31.319 Coming to the conclusion, I utilized a benchmarking library known as yjit-bench to measure the impact of peephole optimization—this process has shown modest improvements in performance when enabled.

00:40:30.120 While I haven’t gone through every single benchmark I mentioned, it's worth noting that there is a reduction in the amount of instructions emitted, which in turn means less memory is used.

00:41:29.379 I want to emphasize that I truly enjoy learning about these topics! If you’re interested in diving deeper or if I've sparked your curiosity about peephole optimizations, I encourage you to explore further.

00:42:16.020 I’ll share some resources: first, YARV, which was a hack day project at Shopify championed by Kevin Newton, who gave a talk yesterday. Essentially, we implemented YARV in Ruby, and we've also documented many of the YARV instructions comprehensively.

00:43:24.120 Another resource is a pull request by John Hawthorne, who gave a talk on the first day of this conference. He identified opportunities for optimization in his code change, so if you're ever interested in contributing to Ruby, please make a pull request!

00:44:38.020 Thank you for your time. You can DM or tweet me at Ong Maple. I would also like to thank everyone here for your gracious attention.