00:00:00
Hello, everyone! My name is Kevin Deisz, and I love music, open-source software, and craft beer. I think craft beer is the most delicious in Japan.
00:00:04
The title of my talk today is 'Compiling Ruby.'
00:00:39
Can everybody hear me? I'm going to start with a very quick story.
00:00:45
Yesterday, I flew into Tokyo, and sadly, my flight to Hiroshima was canceled because of a typhoon. I don't speak Japanese, and this is my first time in Japan, so I only know a few words.
00:01:11
When I was in Tokyo, I approached people and tried to ask them about the trains, conveying that I needed help.
00:01:24
So, this is our simple review program. It's quite simple. Here we go. When you run it, we get a 10, like I expected. This talk will cover everything that happens between the time you write Ruby code and when you get a response.
00:01:58
There are a couple of things that were introduced in Ruby 2.3 that I will discuss, along with what you can do with that capability.
00:02:09
We will talk about the execution process. The code is first loaded with 'require' or 'load' or however it gets included.
00:02:20
The Ruby interpreter reads the source of the file and converts it into tokens. Each individual element of that file, represented as tokens, forms an abstract syntax tree (AST).
00:02:35
We will explore how we used to immediately interpret that AST.
00:02:41
However, these days, we build instruction sequences, and those instruction sequences are interpreted by the Ruby Virtual Machine (VM).
00:02:56
So, let's discuss the execution process.
00:03:02
The first step is tokenizing the input, which is relatively well understood.
00:03:11
You take your example, find each element, and perform lexical analysis.
00:03:18
However, this is not semantic analysis; we don't know if this is valid Ruby code at this stage.
00:03:24
We simply get the order in which the tokens appear.
00:03:30
Different languages handle this in various ways; Ruby has its own mechanisms.
00:03:35
If I say anything outdated, it's unintentional; let’s proceed to tokenizing for this section.
00:03:48
This part is handled internally, but other tools like Flex and Bison can help with similar tasks.
00:03:56
Once we have our tokens, we can build the abstract syntax tree from those tokens.
00:04:11
As we walk down the list of tokens, we employ a recursive descent parser.
00:04:24
This parser matches patterns, helping us find the grammar structure.
00:04:32
We identify elements like assignments and local variable definitions, resulting in a tree structure.
00:04:43
At this point, we have transformed our source file into a tree structure and can interpret it.
00:04:58
Interpretation requires context and a stack-based machine to traverse the AST.
00:05:06
We begin at the root of the tree, moving down as we evaluate elements. For example, we may push a value onto the stack when we encounter a literal.
00:05:22
As we continue downward, we can assign variables and gather values from the stack.
00:05:34
When we're performing operations like addition, we're sending messages—not merely calling methods directly.
00:05:41
By sending messages, we pull operands off the stack and execute calculations.
00:05:53
Finally, we've interpreted our AST and obtained our result despite its complexity.
00:06:08
In Ruby 1.9, we got YARV, which introduced the concept of instruction sequences.
00:06:14
It turns out that using instruction sequences is much faster than directly interpreting ASTs.
00:06:31
Creating instruction sequences is a crucial step in modern Ruby execution.
00:06:39
Instead of interpreting directly, we build lists of instruction sequences that can be executed later.
00:06:48
We push values and build up our list incrementally.
00:06:51
Now, we have a complete set of instruction sequences that can be executed.
00:06:59
We've learned to interpret these sequences in a manner akin to interpreting the AST.
00:07:07
This allows the virtual machine to optimize and execute faster.
00:07:13
Each instruction sequence points to the next, streamlining execution.
00:07:25
We'll now discuss a new feature introduced in Ruby 3.
00:07:32
This feature allows us to take these instruction sequences that were previously compiled and persist them.
00:07:41
This introduces a binary format that we can read and write, allowing us to save these instruction sequences to a file.
00:07:55
This way, we can quickly load them without recompiling.
00:08:01
Let me show you how this works. When we compile a file, it returns an instance of the Ruby VM instruction sequence class.
00:08:10
This class contains all the necessary metadata for that program.
00:08:18
We can write this binary data, which isn't human-readable, but contains all the vital information needed to execute that file.
00:08:26
For instance, one example I encountered was surprisingly large, and it contained a wealth of metadata.
00:08:38
We can also read this binary data back into a string and evaluate it.
00:08:48
This is helpful, but the real advantage comes with 'load_iseq.'
00:08:56
'Load_iseq' allows us to modify our flowchart, as it can load compiled instructions without re-executing them.
00:09:03
This results in a significant speed boost, roughly 30% faster.
00:09:13
Internally, this process utilizes the 'vload_internal' function when a file is required.
00:09:19
This function checks whether instruction sequences can be loaded from our binary data.
00:09:30
If they can, execution continues without recompiling the entire file.
00:09:42
This example implementation provides insight into how we can use these techniques effectively.
00:09:53
The important point to note is that the source file must remain unchanged.
00:10:00
If the source file has not been modified since the instruction sequences were created, we can load them directly.
00:10:08
This whole process allows us to monkey-patch the compilation sequences.
00:10:18
There are major examples in the ecosystem that use similar approaches.
00:10:27
One is Shopify, which does excellent work on optimizing load paths, and others focus on compilation.
00:10:37
These methods significantly enhance performance and efficiency.
00:10:43
Returning to our discussion, 'load_iseq' allows us to break things down, opening up exciting possibilities.
00:10:57
For instance, we can modify the source string without actually altering the original file.
00:11:04
This flexibility enables a powerful programming and manipulation of Ruby’s internals.
00:11:13
We've even seen that 'load_iseq' was initially introduced to save memory and speed up execution.
00:11:24
It allows for a dynamic environment where we can programmatically load our custom bytecode.
00:11:31
This means that there are all sorts of fun things we can do with Ruby.
00:11:39
For example, we could use regex to dissect strings right within Ruby.
00:11:46
The flexibility that Ruby provides is incredible, as we can override built-in functions, such as integer multiplication.
00:12:01
While not typical, this showcases Ruby's capacity for customization.
00:12:08
Other languages don't share such flexibility. In those, operations are more rigidly defined.
00:12:21
Looking at the macro system, we can leverage this for code optimization.
00:12:30
Instruction elimination is about optimizing computations in Ruby.
00:12:38
For example, instead of multiple method calls, we can compile down to just a single instruction ultimately.
00:12:47
The beauty of this is maintaining readability while increasing performance.
00:12:57
We can write cleaner and more understandable code without sacrificing execution speed.
00:13:05
We can compile macros into Ruby, enabling advanced optimizations with ease.
00:13:16
This can even extend to date parsing, which can be optimized similarly.
00:13:23
The performance benefit from more straightforward date parsing is significant.
00:13:31
Moreover, I've developed a gem that allows for such optimizations to be applied.
00:13:38
It's part of a larger library called 'Vectrex,' focusing on enhancing Ruby's flexibility.
00:13:46
What this gem does is hook into the compiled chain based on 'load_iseq.'
00:13:55
This allows developers to modify their source code dynamically.
00:14:02
We can introduce more semantic richness through abstract syntax trees (ASTs).
00:14:08
Using parser gems to rewrite the source makes this even more powerful.
00:14:20
For example, you can modify method visibility and instance variables dynamically.
00:14:26
This gives room for innovation and creative coding solutions in the Ruby environment.
00:14:36
The 'parser' gem provides built-in support for rewriting source files seamlessly.
00:14:47
This gem allows you to manipulate ASTs for enhanced coding flexibility.
00:14:55
We can transform method calls based on the AST, automating some of our coding processes.
00:15:01
The result is a more efficient coding process without changing the original logic.
00:15:07
Moreover, we can wrap the method body in a begin-end structure to manage return types efficiently.
00:15:14
This wraps our logic cleanly while allowing Ruby to handle type inference smoothly.
00:15:22
And with that, we can extend Ruby’s capabilities rapidly, rather inventively.
00:15:32
With so many tools and techniques available, you can enhance Ruby's interactivity effectively.
00:15:41
Looking at the compiled structure, we can analyze Ruby bytecode as well.
00:15:53
If you examine the binary structure of Ruby, you'll find interesting headers and segments.
00:16:02
For instance, the first four characters always denote the bytecode structure, enriching our understanding of Ruby’s internals.
00:16:14
What follows includes platform information and specifics on instruction sequences.
00:16:23
All this information helps us gain insight into how our Ruby files will execute.
00:16:30
By storing debugging info in the compiled bytecode, we unlock significant metadata for enhancing Ruby's evolution.
00:16:41
This way, 'load_iseq' becomes a tool for observing the relationships between object states in memory.
00:16:52
Thus, programming in Ruby is ultimately about exploring and pushing the boundaries of what we know.
00:17:01
The Ruby community continues to grow, allowing even more insights and optimizations to flourish.
00:17:09
In conclusion, the introduction of load_iseq has paved the way for innovative approaches to Ruby development.
00:17:18
Thank you for your attention!
00:17:34
Now, I'll open the floor for any questions.
00:17:55
All right, let's kick this off!
00:18:04
If you want to dive deeper into the ideas I've discussed, check out my GitHub repository.
00:18:12
I chose the parser gem because of its built-in rewriter support and straightforward API.
00:18:23
Being able to rewrite source efficiently has made my learning process much smoother.
00:18:31
Thank you for your time. I hope you find these concepts as intriguing as I do!