Talks

Looking Into Peephole Optimizations

Looking Into Peephole Optimizations

by Maple Ong

In the video titled "Looking Into Peephole Optimizations," Maple Ong discusses techniques for enhancing the performance of bytecode generated by the Ruby compiler through peephole optimizations. The talk emphasizes that understanding these small yet powerful optimizations does not require prior experience with bytecode.

Key points covered include:
- What is Peephole Optimization?: An established technique focusing on small sets of code (intermediary instructions or bytecode) to streamline and improve performance. It reduces redundancy and complexity in bytecode, applicable to various programming languages, including Ruby, Python, and assembly code.
- Execution Process in Ruby: The transition from Ruby code to execution involves tokenization, parsing, and bytecode generation. The optimizations occur during the bytecode compilation stage before it is sent to the Ruby Virtual Machine (YARV).
- Implementation in Ruby: Ruby's implementation involves a default optimization pass provided by the compiler, particularly through the pseudocode function in compile.c, which iterates over the linked list of bytecode instructions to identify and apply peephole optimizations.
- Examples of Optimization: Maple presents various practical examples:
- Combining multiple instructions into fewer ones (e.g., creating a range from A to Z).
- Removing unnecessary instructions to simplify instructions associated with array management (e.g., handling 'Hello World').
- Utilizing more efficient instructions, such as replacing operations for better memory allocation efficiency.
- Limitations: Not all code benefits from peephole optimizations, particularly in dynamic languages like Ruby, which may rely on runtime behaviors. Confounding factors include monkey patching, which complicates knowing what methods will be invoked at runtime.
- Benchmarks: The video concludes by stressing the necessity of measuring the effectiveness of refinements made by peephole optimization through benchmarks, which reveal performance improvements and reduced instruction memory.

Maple encourages viewers to explore these optimizations further and shares resources for learning more about YARV and ongoing improvements in Ruby's optimizations. The insights provided during this talk empower attendees to delve deeper into compiler optimizations for Ruby programming.
This session was part of the Euruko 2022 event, with a focus on building community and collaborative learning.

00:00:12.240 Um, hi! I'm Maple. Today, we'll be looking into peephole optimizations. Before we start, I want to share something. I recently quit my job and have come to learn the importance of having a community. I'm actually here with you today because I'm sponsored by WMB2RB. I know Gemma has already said something about this, but I wanted to share that this community has been really helpful to me.
00:00:29.880 If you are a woman or a non-binary individual, please join us. We'll be meeting right after this talk at Espresso House Hakanimi. I hope to see you there; it will be really fun! Now, let's dive into peephole optimizations.
00:00:55.739 Peephole optimization is an established technique that was first discussed back in the 1960s. This is a really long time ago. The earliest paper I could find on this topic is quite interesting, especially because it helps us understand how the system works overall and builds our foundational knowledge for future compiler optimizations. If you have compiler experience, I hope this talk will refresh your memory on the topic and that you'll learn something new about how Ruby implements peephole optimizations specifically.
00:01:29.580 If you're a total beginner, don't worry; I've written this talk just for you. I hope you'll feel like you've learned more about how Ruby is compiled and how specifically this optimization works. So, what can you expect from this talk? First, we'll discuss what peephole optimization is, when it happens in Ruby, and how Ruby implements peephole optimization.
00:01:48.299 Next, we'll go through some examples of peephole optimizations in Ruby. In doing so, we'll learn about some YARV instructions. YARV is the Ruby virtual machine, which stands for Yet Another Ruby VM. I will explain how the bytecode works, and it will be really fun. Finally, we'll conclude with a big picture overview and some benchmarks.
00:02:01.380 So, what exactly is peephole optimization? It involves looking at a small set of code, usually intermediary instructions or bytecode, to find and change the instructions to make them more performant in some way. The resulting instructions are generally less redundant and less complex. The optimization gets its name because it looks at several instructions at once. I don't really know why it's named that—maybe because, in a peephole, you can only see what's in front of you.
00:02:49.080 This optimization can be applied to various languages and systems. For example, peephole optimization is also used in languages like Python, and it can optimize assembly code. YARV also uses it to optimize Ruby bytecode. However, today we'll focus on its implementation in Ruby's MRI (Matz's Ruby Interpreter). Interestingly, peephole optimizations are already enabled by default when you use Ruby.
00:03:31.500 Now, when does peephole optimization happen in Ruby? This graph illustrates the entire Ruby execution process, starting from your Ruby code, which is passed to a tokenizer. The tokenizer produces tokens that are then passed into a parser, which generates Abstract Syntax Tree (AST) nodes for the compiler. The compiler produces bytecode, which is executed by the virtual machine. The phases we will discuss in this talk are the compiler and bytecode phase, where the compiler generates the bytecode.
00:04:13.740 Let's zoom in a little on this specific phase. For example, we have 'puts 9 + 8' as our source code. You can ignore the graph on the left, which essentially represents the 'puts 9 + 8' source code in EST (Expression Syntax Tree) nodes. This gets passed into the compiler. But let's take a look at the bytecode representation. To do this, we'll use a library called the Ruby VM instruction sequence library.
00:04:51.840 We use a method called compile, which takes in the source code, and then we disassemble it. The compiled code is not human-readable, but when we disassemble it, it becomes readable. The output includes a bunch of code; don’t get overwhelmed. The main thing to focus on is this chunk, which contains the bytecode instructions or YARV instructions. We can conclude that the simplified output of the compiler essentially matches the representation from the previous slide.
00:05:40.860 When the compiler outputs this bytecode, it's arranged in a linked list structure before passing it to the Ruby VM. Returning to how this relates to peephole optimization: when the compiler generates the bytecode, it is optimized into a more efficient bytecode before being passed to the virtual machine for execution.
00:06:12.960 Now, how does Ruby implement peephole optimization? To illustrate this without overwhelming you with C code, I've written some pseudocode. Essentially, we have a function named 'optimize' found in compile.c within the Ruby implementation. The first thing it does is set up some optimization options, and then it iterates over the linked list in a while loop. For each element, we check if it is an instruction sequence. If it is, and if peephole optimization is turned on (which it is by default), we proceed to call the function that performs the peephole optimization.
00:07:03.540 This process involves traversing linked lists to identify elements that match optimization cases. It's worth noting that while the function name 'i_seek_optimize' is very long, I highly recommend checking it out after this talk if you're curious to know more about the various peephole optimization cases. This function is well-documented, and for each case, there is a comment about what the bytecode looks like before and after optimization.
00:08:08.460 During this traversal, if we find an instruction that matches an optimization case, we'll replace it with a more efficient instruction. There are about 20 to 30 different peephole optimization cases available. Additionally, there's a paper that discusses implementing peephole optimization using pattern matching. If you're interested in this topic, definitely check it out. To summarize, the i_seek_optimize function visits each element and initiates the peephole optimization process. When it encounters an instruction that needs optimization, it checks if it matches a case; if so, it replaces it with a more efficient instruction.
00:09:52.920 After the peephole optimization phase, the instruction sequences should be smaller and more efficient. Let’s look at some examples of peephole optimizations in Ruby. The first example involves combining multiple instructions into a single instruction. The source code we will use is creating a range from A to Z, which indicates inclusivity. We can call methods like ‘count’ to return 26, or ‘to_a’ to receive an array of letters. In this example, we’ll turn off peephole optimization to analyze what the output looks like.
00:11:20.220 The output shows multiple instructions, such as pushing string literals onto the stack, followed by creating a new range from A to Z without optimization. In contrast, when peephole optimization is enabled, it generates significantly fewer instructions. Instead of three instructions to achieve this, we only have two: ‘put object A to Z’. This results in a performance increase of 5.4 times.
00:12:00.480 Next, we have an example of optimizing by removing unnecessary instructions, specifically the phrase 'Hello World' in an array being splatted. Similar to the previous example, we will temporarily disable peephole optimization to see the resulting bytecode. The output will reveal some redundant instructions associated with managing the array. With peephole optimization turned on, you’ll see that it simplifies to just a single instruction, removing the unnecessary complexity.
00:13:36.600 Our next example entails the removal of redundant instructions, similar to the previous point made. The code consists of 'gummies are the best' assigned to X, then redundantly assigning the value of X to itself. When we turn off peephole optimization, we see a few added instructions to manage the variable assignment process.
00:14:22.380 However, when peephole optimization is enabled, these extraneous instructions are simplified or removed entirely. Thus, you can see how peephole optimization effectively cleans up the code, yielding better performance without compromising functionality. Up next, we will examine how using more efficient instructions can further simplify the bytecode.
00:15:15.180 We have an example where we create a splatted array containing the numbers 1 and 2. By turning off peephole optimization, we’ll observe the typical instructions used, such as 'duperate' and 'concat array'. But when optimization kicks in, we can see that the complexity is greatly reduced by combining certain processes.
00:16:27.360 Finally, the last example consists of showing how we can utilize more efficient instructions. In this case, we will again see the difference between using 'dupare' and 'put object', which can result in performance improvements as memory allocation efficiency is gained. With peephole optimization enabled, we can swap out certain operations to keep it efficient.
00:17:32.520 Throughout these examples, we’ve primarily addressed how peephole optimization can bring significant performance boosts to certain code segments. However, it’s important to understand the limitations of this technique. Peephole optimizations apply only to specific bytecode patterns; for instance, not all code will benefit from being folded together or optimized.
00:18:06.960 Additionally, dynamic languages like Ruby present challenges in optimization, since many optimization opportunities may depend on runtime behaviors rather than compile-time predictions. For example, Ruby supports monkey patching, which complicates knowing which method will be called during execution, and some optimizations may require local variables to persist even if they are seemingly redundant.
00:19:03.780 Questions often arise as to why the compiler does not generate optimized bytecode directly rather than relying on an additional optimization pass. The answer is that the compiler's purpose is to translate AST nodes to bytecode rather than optimize them, thereby necessitating this step to maintain separation of concerns within the compilation process.
00:20:12.480 We also see that this optimization approach depends heavily on human input. While optimizers may exist, they won't necessarily be implemented in Ruby without human recognition and adjustment. In light of this, benchmarks allow us to measure the effectiveness of peephole optimization, demonstrating both performance improvements and reductions in instruction memory usage.
00:21:01.380 As we near the end of the talk, I wanted to share some valuable resources, such as Kevin Newton's documentation on YARV, which provides insights into bytecode instructions in Ruby. Additionally, I highly encourage you to look at the recent pull request by John from GitHub. His work identified specific cases that could be optimized further, reflecting practical applications of peephole techniques in real code.
00:22:44.280 Hopefully, after this talk, you’ll feel empowered to explore peephole optimization in Ruby further. I encourage you to dive into the creative space of potential optimizations yourself. Thank you so much for listening to my talk today. It’s an honor to be here at Euruko 2022.