Let's get started. In my efforts to explain as much as possible,
I have included too many slides in this presentation.
It will be quite a miracle if I finish on time.
We will cover a lot of information. My goal for this talk is to give you a broad overview of what this project is,
dive into some technical details, and ensure that you are not left behind if you are less familiar with some of the concepts I'm going to discuss.
This talk is about Syntax Tree.
It's a project I started about four years ago.
In essence, it became a project funded by the Ruby Association to create a standard library formatter.
There are many formatters available now, but when I started this work, there wasn't a suitable one.
Since the inception of the project, it has become much more than just a formatter. It has evolved into a whole host of functionalities that I will discuss today.
My name is Kevin Newton, and I work on the Ruby and Rails infrastructure team at Shopify.
You may notice that there are quite a few of us here, so please come and talk to us.
If you want to learn about some of the problems we're solving at Shopify, particularly in improving Ruby, or generally trying to enhance the ecosystem.
You can follow me online at @kevinnewton on Twitter.
So, what is Syntax Tree? To explain what Syntax Tree is, we need to discuss a couple of concepts.
First, what is a parser?
What does a parser do? How does it function?
We also need to understand what a syntax tree is. Naming, as you may know, is one of the hardest things in computer science, and I unfortunately named my project Syntax Tree.
So you might hear me use 'syntax tree' to refer to the structure and 'Syntax Tree' to refer to the project. Now, let's also discuss the visitor pattern.
I want to explain how this pattern applies to syntax trees.
Let's start by discussing what a parser is. You may have seen the phrase, 'Mats is nice, so we are nice.' A parser's job is to take plain text content and transform it into a data structure that a programming language can understand.
The first step is lexical analysis, which takes a plain text segment and breaks it into chunks.
For example, we would classify 'Mats' as a noun, 'is' as a verb, 'nice' as an adjective, and 'so' as a conjunction.
The second step is semantic analysis, where we take two segments of the sentence to classify their relationship. We give a name to the concept, defining the grammar.
We create a definition of how the words can fit together to form larger concepts, such as a verb phrase. When we combine these segments, we end up with a subject phrase, which is a noun followed by a verb phrase.
By adding a conjunction, we form a tree structure in our minds. A complete understanding of these concepts is what a compiler does.
What does this look like in terms of a syntax tree? Let's assume we're in Ruby and create objects that represent the nodes.
This is a straightforward process: we can create a separate class for every single node. We can store data and use the nodes to represent the tree structure. We can also rearrange them slightly. The nodes on the left are tokens, wrapping a value from the source code, while the nodes on the right are other branches in the middle of the tree.
Next, let’s expand one of these nodes.
We can walk this tree and perform interesting operations. We can add pattern matching features and comparison methods, as well as a copy method for immutable copies.
The 'accept' method is the key focus, along with the 'child nodes' method. The accept method allows us to interact with the node, facilitating the visitor pattern's functionality.
The double dispatch visitor pattern allows for dynamic dispatch: when we call the 'accept' method on a node, it calls back into the visitor, which allows us to have different visitor methods for different node types.
As I mentioned earlier, the 'child nodes' method allows us to define the relationships between nodes, ultimately letting us iterate through them.
In summary, we can use our visitor pattern to implement specific visitors that only focus on a subset of nodes in a tree.
Syntax Tree also allows us to format trees based on various algorithms. One such algorithm is pretty print, which organizes code by applying a consistent structure.
Formatting involves creating groups of nodes to determine where line breaks should occur. For example, if we reach the end of a line, we break the outermost group first.
By building a syntax tree for Ruby, we can implement all sorts of analytics and developments, such as linting, formatting, and semantic analysis.
Syntax Tree is not just a formatting tool; instead, it forms an object layer that represents the result of parsing Ruby code.
It offers tools to interact with and manipulate this object layer to perform diverse tasks, like refactoring and code improvement.
In total, there are five main functionalities of Syntax Tree: building a syntax tree, formatting it, creating a command-line interface (CLI) to facilitate interaction, developing a language server, and translating syntax trees into other formats.
Now, let's first take a look at building the syntax tree.
Building a syntax tree involves defining all the nodes. In the 'node.rb' file within the Syntax Tree repository, you will see definitions for all the nodes in Ruby.
We provide named fields for every single value and sub-node in the tree, source location data, comments attached to the tree, accept and child nodes methods, and immutable behavior by default.
The interaction with Ripper, the standard library parser generator, helps us build the nodes for our syntax tree.
Ripper itself has over 190 events, helping us parse Ruby code effectively.
Furthermore, we utilize techniques to keep track of nodes' positional context to ensure proper mapping and referencing.
We aim to present an interface that abstracts away the complexities of using Ripper, allowing users to focus on Syntax Tree's functionality.
Ripper does not currently provide support for everything; some features, like handling comments, are something we must manage ourselves.
Walking the tree allows us to implement visitor functionality, where adding visit methods for each node type enhances interaction.
We can leverage these methods to perform certain operations, such as counting sentences or omitting unnecessary nodes.
In addition, we can serialize the entire abstract syntax tree (AST) to JSON, allowing for easy data access and manipulation.
Formatting requirements demand the use of a pretty print algorithm, which is based on a foundational paper known as `A Prettier Printer` from the early '90s.
The basics of pretty printing include managing text nodes and breakables to control how Ruby code is presented in an unobtrusive manner.
Then, we developed a command-line interface (CLI) that allows users to work with the AST more easily.
The CLI provides useful functionalities, like generating a tree representation of the source code, formatting it, and searching for specific patterns.
A significant feature is the 'match' command, which allows capturing Ruby expressions that correspond to specific nodes, greatly aiding in pattern matching.
Using the CLI, we can quickly check how expressions will be interpreted based on the coding style.
By employing a language server protocol, we guide interactions with programming tools without needing to delve into intricate implementation details.
I also wanted to highlight the Ruby LSP project at Shopify, which uses Syntax Tree to aid various functionalities like document highlighting, folding ranges, and semantic highlighting.
This provides our users with enriched experiences and powerful tools while working with Ruby code.
Finally, we can translate Syntax Trees to connect with various Ruby parsers, maintaining compatibility and increasing efficiency in code analysis.
The core takeaway is that syntax tree nodes can essentially interchangeably work with different parsers, ensuring our feedback remains robust.
By leveraging all these functionalities, we're developing a truly versatile and powerful toolkit for Ruby developers.
The goal is for users to have an experience where you interact with Syntax Tree effortlessly, regardless of changes in the underlying parser.
Thank you very much for your time. I appreciate your attention.
Does anyone have any questions that I can answer?
One question here is about providing a syntax rewriter.
The answer is yes; you can already do that today. My goal is to build more user-friendly and advanced features into the project.
This could involve running a script using Rails CLI to automatically update deprecated code.
A follow-up question relates to the Pretty Ruby project, which started as a Prettier plug-in for Ruby formatting.
Pretty Ruby has transformed to utilize Syntax Tree, making it a lightweight wrapper around the tools we now have available.
This project has greatly reduced its complexity, now comprising only a few hundred lines of code.
In terms of the dependencies for Syntax Tree, Ripper is used primarily, but we are seeking to create a new Ruby parser for better efficiency.
The overall aim is to ensure users can rely on Syntax Tree for consistent performance and features.
Thank you again for your time, and feel free to reach out if you have further questions!
I hope you enjoyed the presentation, and I look forward to engaging with you further.