Talks

Compiling Ruby

RubyKaigi2017
http://rubykaigi.org/2017/presentations/kddeisz.html

Since Ruby 2.3 and the introduction of RubyVM::InstructionSequence::load_iseq, we've been able to programmatically load ruby bytecode. By divorcing the process of running YARV byte code from the process of compiling ruby code, we can take advantage of the strengths of the ruby virtual machine while simultaneously reaping the benefits of a compiler such as macros, type checking, and instruction sequence optimizations. This can make our ruby faster and more readable! This talk demonstrates how to integrate this into your own workflows and the exciting possibilities this enables.

RubyKaigi 2017

00:00:00 Hello, everyone! My name is Kevin Deisz, and I love music, open-source software, and craft beer. I think craft beer is the most delicious in Japan.
00:00:04 The title of my talk today is 'Compiling Ruby.'
00:00:39 Can everybody hear me? I'm going to start with a very quick story.
00:00:45 Yesterday, I flew into Tokyo, and sadly, my flight to Hiroshima was canceled because of a typhoon. I don't speak Japanese, and this is my first time in Japan, so I only know a few words.
00:01:11 When I was in Tokyo, I approached people and tried to ask them about the trains, conveying that I needed help.
00:01:24 So, this is our simple review program. It's quite simple. Here we go. When you run it, we get a 10, like I expected. This talk will cover everything that happens between the time you write Ruby code and when you get a response.
00:01:58 There are a couple of things that were introduced in Ruby 2.3 that I will discuss, along with what you can do with that capability.
00:02:09 We will talk about the execution process. The code is first loaded with 'require' or 'load' or however it gets included.
00:02:20 The Ruby interpreter reads the source of the file and converts it into tokens. Each individual element of that file, represented as tokens, forms an abstract syntax tree (AST).
00:02:35 We will explore how we used to immediately interpret that AST.
00:02:41 However, these days, we build instruction sequences, and those instruction sequences are interpreted by the Ruby Virtual Machine (VM).
00:02:56 So, let's discuss the execution process.
00:03:02 The first step is tokenizing the input, which is relatively well understood.
00:03:11 You take your example, find each element, and perform lexical analysis.
00:03:18 However, this is not semantic analysis; we don't know if this is valid Ruby code at this stage.
00:03:24 We simply get the order in which the tokens appear.
00:03:30 Different languages handle this in various ways; Ruby has its own mechanisms.
00:03:35 If I say anything outdated, it's unintentional; let’s proceed to tokenizing for this section.
00:03:48 This part is handled internally, but other tools like Flex and Bison can help with similar tasks.
00:03:56 Once we have our tokens, we can build the abstract syntax tree from those tokens.
00:04:11 As we walk down the list of tokens, we employ a recursive descent parser.
00:04:24 This parser matches patterns, helping us find the grammar structure.
00:04:32 We identify elements like assignments and local variable definitions, resulting in a tree structure.
00:04:43 At this point, we have transformed our source file into a tree structure and can interpret it.
00:04:58 Interpretation requires context and a stack-based machine to traverse the AST.
00:05:06 We begin at the root of the tree, moving down as we evaluate elements. For example, we may push a value onto the stack when we encounter a literal.
00:05:22 As we continue downward, we can assign variables and gather values from the stack.
00:05:34 When we're performing operations like addition, we're sending messages—not merely calling methods directly.
00:05:41 By sending messages, we pull operands off the stack and execute calculations.
00:05:53 Finally, we've interpreted our AST and obtained our result despite its complexity.
00:06:08 In Ruby 1.9, we got YARV, which introduced the concept of instruction sequences.
00:06:14 It turns out that using instruction sequences is much faster than directly interpreting ASTs.
00:06:31 Creating instruction sequences is a crucial step in modern Ruby execution.
00:06:39 Instead of interpreting directly, we build lists of instruction sequences that can be executed later.
00:06:48 We push values and build up our list incrementally.
00:06:51 Now, we have a complete set of instruction sequences that can be executed.
00:06:59 We've learned to interpret these sequences in a manner akin to interpreting the AST.
00:07:07 This allows the virtual machine to optimize and execute faster.
00:07:13 Each instruction sequence points to the next, streamlining execution.
00:07:25 We'll now discuss a new feature introduced in Ruby 3.
00:07:32 This feature allows us to take these instruction sequences that were previously compiled and persist them.
00:07:41 This introduces a binary format that we can read and write, allowing us to save these instruction sequences to a file.
00:07:55 This way, we can quickly load them without recompiling.
00:08:01 Let me show you how this works. When we compile a file, it returns an instance of the Ruby VM instruction sequence class.
00:08:10 This class contains all the necessary metadata for that program.
00:08:18 We can write this binary data, which isn't human-readable, but contains all the vital information needed to execute that file.
00:08:26 For instance, one example I encountered was surprisingly large, and it contained a wealth of metadata.
00:08:38 We can also read this binary data back into a string and evaluate it.
00:08:48 This is helpful, but the real advantage comes with 'load_iseq.'
00:08:56 'Load_iseq' allows us to modify our flowchart, as it can load compiled instructions without re-executing them.
00:09:03 This results in a significant speed boost, roughly 30% faster.
00:09:13 Internally, this process utilizes the 'vload_internal' function when a file is required.
00:09:19 This function checks whether instruction sequences can be loaded from our binary data.
00:09:30 If they can, execution continues without recompiling the entire file.
00:09:42 This example implementation provides insight into how we can use these techniques effectively.
00:09:53 The important point to note is that the source file must remain unchanged.
00:10:00 If the source file has not been modified since the instruction sequences were created, we can load them directly.
00:10:08 This whole process allows us to monkey-patch the compilation sequences.
00:10:18 There are major examples in the ecosystem that use similar approaches.
00:10:27 One is Shopify, which does excellent work on optimizing load paths, and others focus on compilation.
00:10:37 These methods significantly enhance performance and efficiency.
00:10:43 Returning to our discussion, 'load_iseq' allows us to break things down, opening up exciting possibilities.
00:10:57 For instance, we can modify the source string without actually altering the original file.
00:11:04 This flexibility enables a powerful programming and manipulation of Ruby’s internals.
00:11:13 We've even seen that 'load_iseq' was initially introduced to save memory and speed up execution.
00:11:24 It allows for a dynamic environment where we can programmatically load our custom bytecode.
00:11:31 This means that there are all sorts of fun things we can do with Ruby.
00:11:39 For example, we could use regex to dissect strings right within Ruby.
00:11:46 The flexibility that Ruby provides is incredible, as we can override built-in functions, such as integer multiplication.
00:12:01 While not typical, this showcases Ruby's capacity for customization.
00:12:08 Other languages don't share such flexibility. In those, operations are more rigidly defined.
00:12:21 Looking at the macro system, we can leverage this for code optimization.
00:12:30 Instruction elimination is about optimizing computations in Ruby.
00:12:38 For example, instead of multiple method calls, we can compile down to just a single instruction ultimately.
00:12:47 The beauty of this is maintaining readability while increasing performance.
00:12:57 We can write cleaner and more understandable code without sacrificing execution speed.
00:13:05 We can compile macros into Ruby, enabling advanced optimizations with ease.
00:13:16 This can even extend to date parsing, which can be optimized similarly.
00:13:23 The performance benefit from more straightforward date parsing is significant.
00:13:31 Moreover, I've developed a gem that allows for such optimizations to be applied.
00:13:38 It's part of a larger library called 'Vectrex,' focusing on enhancing Ruby's flexibility.
00:13:46 What this gem does is hook into the compiled chain based on 'load_iseq.'
00:13:55 This allows developers to modify their source code dynamically.
00:14:02 We can introduce more semantic richness through abstract syntax trees (ASTs).
00:14:08 Using parser gems to rewrite the source makes this even more powerful.
00:14:20 For example, you can modify method visibility and instance variables dynamically.
00:14:26 This gives room for innovation and creative coding solutions in the Ruby environment.
00:14:36 The 'parser' gem provides built-in support for rewriting source files seamlessly.
00:14:47 This gem allows you to manipulate ASTs for enhanced coding flexibility.
00:14:55 We can transform method calls based on the AST, automating some of our coding processes.
00:15:01 The result is a more efficient coding process without changing the original logic.
00:15:07 Moreover, we can wrap the method body in a begin-end structure to manage return types efficiently.
00:15:14 This wraps our logic cleanly while allowing Ruby to handle type inference smoothly.
00:15:22 And with that, we can extend Ruby’s capabilities rapidly, rather inventively.
00:15:32 With so many tools and techniques available, you can enhance Ruby's interactivity effectively.
00:15:41 Looking at the compiled structure, we can analyze Ruby bytecode as well.
00:15:53 If you examine the binary structure of Ruby, you'll find interesting headers and segments.
00:16:02 For instance, the first four characters always denote the bytecode structure, enriching our understanding of Ruby’s internals.
00:16:14 What follows includes platform information and specifics on instruction sequences.
00:16:23 All this information helps us gain insight into how our Ruby files will execute.
00:16:30 By storing debugging info in the compiled bytecode, we unlock significant metadata for enhancing Ruby's evolution.
00:16:41 This way, 'load_iseq' becomes a tool for observing the relationships between object states in memory.
00:16:52 Thus, programming in Ruby is ultimately about exploring and pushing the boundaries of what we know.
00:17:01 The Ruby community continues to grow, allowing even more insights and optimizations to flourish.
00:17:09 In conclusion, the introduction of load_iseq has paved the way for innovative approaches to Ruby development.
00:17:18 Thank you for your attention!
00:17:34 Now, I'll open the floor for any questions.
00:17:55 All right, let's kick this off!
00:18:04 If you want to dive deeper into the ideas I've discussed, check out my GitHub repository.
00:18:12 I chose the parser gem because of its built-in rewriter support and straightforward API.
00:18:23 Being able to rewrite source efficiently has made my learning process much smoother.
00:18:31 Thank you for your time. I hope you find these concepts as intriguing as I do!