00:00:11.200
Hello everyone! My name is Kevin Newton, and I would like to formally welcome you all to the keynote session. I want to thank Matt for the introduction. Since I'm the first person speaking here in the keynote room, I feel privileged to kick off this live keynote.
00:00:22.560
I hope that joke landed well! You can laugh if you want; a little pity laugh would be appreciated. Anyway, as I mentioned, I'm Kevin Newton, and I work at Shopify on the YJIT team, alongside Aaron, Allen, Noah, and Maxime.
00:00:34.000
If you'd like to talk about that or any other topic, feel free to find me at the booth where the nerds with the green background are located. Now, I want to open up with a quick warning: I tend to speak pretty quickly when I'm nervous, and to be honest, I am a bit nervous. I've had a lot of caffeine today.
00:01:04.399
We're going to cover a somewhat complicated topic, and I've spent hours trying to make it accessible for everyone. So, if you're a junior developer, please don't leave the room. And if you're a senior developer, please stay with me—there's content in here for you as well. It might not start from the very basics, but it will be present.
00:01:21.920
Today, I want to talk to you about parsing Ruby. We'll explore how Ruby code has been parsed over time and how we can go from plain source text to a structure we can work with. To do this effectively, I need to backtrack a bit and discuss the fundamentals underlying these concepts to provide you with a comprehensive understanding.
00:01:58.240
Here’s the game plan: we will build a grammar for a simple language, and I'll explain what that means in a moment. We'll then build a parser for this language, look at the history of the Ruby parser, and examine how it has evolved over time. Finally, we'll investigate how Ripper works, which is a standard library used to gather information during the parsing process.
00:02:09.759
To start, we need to build a grammar. A grammar is a syntactical representation of what is allowed in a given language. In this context, 'language' is a broad term; it's not limited to English or Ruby, but refers to any set of constructs that can form a series of tokens.
00:02:40.160
For instance, if we're looking at a language that only accepted a single number, the grammar would look like this: a program points to a number. Although this is slightly more complicated than necessary—it could simply state that a program consists of a number—I'm doing this for illustrative purposes. The program serves as our root node, indicating that the only acceptable entity in this grammar is a number.
00:03:01.040
The number itself is a non-terminal token, whereas a terminal token represents the final element in the parsing process. Essentially, this grammar accepts a single number token. For the sake of clarity, I realize I've mentioned 'token' multiple times; this setup accepts tokens like 1, 2, or 7, but is not extensive enough for us to perform operations, so I'll expand it a bit.
00:03:25.680
Now, when we introduce addition, we begin accepting expressions like 'number plus number' or just an individual number. This change allows us to accept expressions like '1', '1 + 2', but not '1 + 2 + 3'. The reason for this is that there is no recursive structure yet.
00:03:56.080
To enable recursion, we need to extend our grammar further. This adjustment creates a left recursive structure—essentially, the expression node in the tree points back at itself—allowing for more complexity. We can accept expressions like '1', '1 + 2', and even '1 + 2 + 3'. With this change, we can add more rules to encompass additional operations like subtraction.
00:04:02.480
At this stage, we're defining the nature of expressions and terms in our grammar. A term can be a single number or an expression defined with basic arithmetic operations. We're also considering operator precedence, which dictates how expressions are evaluated. Lastly, we will include the concept of parentheses, allowing for complex nested expressions.
00:04:45.680
A language with parentheses allows us to express operations like '1 * 2' comprehensively. In our grammar, an expression will either be a single term or a composition of terms connected by operations, and we can wrap these expressions in parentheses to indicate precedence. Thus, we have a complete grammar to guide our parsing process.
00:05:34.320
Next, we need to construct a parser capable of interpreting this grammar. The grammar we've established is an abstract concept, and now it's time to implement it. Imagine having a source file that contains simple expressions, for example, a .numbers file or any language you prefer.
00:05:52.639
We'll loop over this source file, processing it in Ruby. The goal is to handle it until the input string is empty. We'll utilize regular expressions to identify numbers and operators, skipping whitespace as needed, and yielding individual tokens, such as number tokens or operator tokens, based on the patterns we find in the source.
00:06:21.360
At this point, we've parsed the source and should have identified various tokens, forming what we refer to as a 'token stream'. A token stream is simply a list of tokens produced by analyzing the source. However, while we've accomplished our lexical analysis and created tokens, we don't yet have semantic meaning—meaningful interpretation of these tokens.
00:07:01.680
To advance, we will run through our grammar and start accepting input. This is where the concept of a semantic stack comes into play. This stack will hold our tokens as we shift and reduce them, essentially correlating them with their meanings according to our established grammar. As we shift tokens, we recognize their equivalence within the grammar, allowing us to reduce them until we achieve a full syntactic structure that represents a valid program.
00:07:56.560
As this process continues, we transform our input tokens through shifting and reducing operations until we arrive at the final grammar structure: a complete program. Each shift and reduction helps us build up a tree structure that corresponds to the relationships illustrated in our grammar.
00:08:46.160
Although this entire operation is repetitive and can be language agnostic, parser generators emerged in the late '80s and early '90s as solutions. These programs take a grammar and the associated actions in your parsing process, effectively taking the burden of shifting and reducing off your shoulders. In Ruby, when you're building a parser, the preferred generator is called Rack.
00:09:30.720
Utilizing Rack allows you to define operator precedence and and include actions executed during the reduction of rules. This way, you can immediately evaluate expressions when the necessary grammar rules are matched. This implementation makes it easier to work with the Ruby abstract syntax tree (AST), which reflects the structure of Ruby itself.
00:10:17.440
Historically, the parser generator used in Ruby is called Yacc, which stands for 'Yet Another Compiler Compiler'. It became popular in 1993, coinciding with the early development of Ruby. In this time, some notable changes were being made to the structure of Ruby itself, including modifications to the basic syntax. These changes became essential as Ruby evolved into the language we know today.
00:11:05.120
Over time, Ruby has implemented various enhancements and adjustments that reflect its unique characteristics. For instance, Ruby's syntax has been compared to both Python and C++. Changes in early versions allowed for the introduction of hash literals and the correction of syntactical errors, such as misspellings in keywords.
00:11:49.680
As Ruby matured into version 1.0, its syntax began to resemble the more recognizable form we see today. Features like access to singleton classes, specific regex flags, and the introduction of keywords for true and false values showcase the growing intricacies embedded within Ruby's syntax. The changes laid the groundwork for subsequent versions, ultimately leading to Ruby's current functionality refresh.
00:12:47.440
Throughout the years, additional features were added, including binary number literals and rescue modifiers. These developments followed the introduction of Ruby 1.9, which transitioned Ruby to a bytecode interpreter context. Other significant milestones included the release of alternative implementations like JRuby and Rubinius, which reimagined how Ruby's parser could function in different environments.
00:13:35.520
As Ruby continued to evolve, many projects sought to capture the Ruby AST (Abstract Syntax Tree) in various forms, expanding its utility beyond execution. These efforts indicate the flexibility of Ruby's parser and the growing ecosystem surrounding language tooling. Yet, sustaining compatibility and ensuring consistent performance across all implementations has proven challenging.
00:14:23.280
Ruby 1.9 saw considerable advancements, including the merging of Ripper into the standard library. Ripper serves to provide a bridge between Ruby's parsing mechanism and external needs, allowing developers to gain valuable insights into the code structure. This functionality facilitates the development of syntax analysis tools and encourages further language experimentation.
00:15:05.440
As we look back on the evolution of Ruby's parsing capabilities, we recognize the significance of Ripper in enabling developers to access and manipulate Ruby's internal grammar patterns. Ripper now operates as a standard library, demonstrating Ruby's flexibility in adapting various paradigms while encouraging ongoing community engagement.
00:15:54.240
In conclusion, today we've explored the journey of Ruby's parser and the pivotal role of tools supporting language parsing. Ripper stands as an invaluable resource for developers, providing essential functionalities to dissect and interact with Ruby's syntax. I hope this talk inspires you to delve deeper into creating tools that enhance Ruby's capabilities further and that we can collectively contribute towards building a richer Ruby ecosystem.
00:16:40.560
Thank you all for attending. I appreciate your time and attention, and I'm happy to open the floor for any questions or discussions.