Talks

Syntax Tree

Syntax Tree

by Kevin Newton

In the presentation titled 'Syntax Tree' at RubyConf 2022, Kevin Newton discusses the innovative toolkit designed for interacting with Ruby's parse tree. The tool aims to enhance the process of analyzing, debugging, and formatting Ruby code, evolving from a standard library formatter into a multifaceted solution.

Key points of the talk include:
- Introduction to Syntax Tree: Kevin introduces Syntax Tree, a project initiated around four years ago and funded by the Ruby Association, originally conceived as a formatter but has expanded its functionalities significantly.
- Understanding Parsers and Syntax Trees: Explanation of the roles of parsers and the concept of syntax trees, including lexical and semantic analysis to transform plain text into structures that programming languages can understand.
- Visitor Pattern: Discussion of the visitor pattern, particularly the 'accept' and 'child nodes' methods, which enable navigational and operational interactions with the syntax tree.
- Core Functionalities: Syntax Tree offers five main functionalities: 1) building a syntax tree, 2) formatting it, 3) a command-line interface (CLI), 4) a language server, and 5) translating syntax trees.
- Building the Syntax Tree: Kevin elaborates on the construction of the syntax tree, leveraging Ripper to define node structures and how to traverse them, while managing comments and keeping track of the nodes' positional context.
- Pretty Printing: The talk explores the formatting capabilities through pretty print algorithms that ensure organized Ruby code representation.
- CLI Features: The CLI serves various functions such as generating tree representations, formatting, and pattern matching, enhancing user interaction.
- Integration with Language Server Protocol: The implementation of a language server helps unify interactions with various development tools, making it easier for developers to use Syntax Tree without detailed technical knowledge.
- Future Directions: Kevin discusses the plans for further development of the toolkit, including the syntax rewriter and ongoing optimizations for dependencies.

The primary takeaway from the session is the versatility of Syntax Tree as a comprehensive toolkit that provides powerful interactions with Ruby source code, allowing for efficient analysis, formatting, and enhancement, making it a vital resource for Ruby developers.

00:00:00.299 foreign
00:00:11.120 Let's get started. In my efforts to explain as much as possible,
00:00:13.920 I have included too many slides in this presentation.
00:00:16.800 It will be quite a miracle if I finish on time.
00:00:18.600 We will cover a lot of information. My goal for this talk is to give you a broad overview of what this project is,
00:00:22.140 dive into some technical details, and ensure that you are not left behind if you are less familiar with some of the concepts I'm going to discuss.
00:00:28.920 This talk is about Syntax Tree.
00:00:30.420 It's a project I started about four years ago.
00:00:32.399 In essence, it became a project funded by the Ruby Association to create a standard library formatter.
00:00:34.500 There are many formatters available now, but when I started this work, there wasn't a suitable one.
00:00:37.200 Since the inception of the project, it has become much more than just a formatter. It has evolved into a whole host of functionalities that I will discuss today.
00:00:48.059 My name is Kevin Newton, and I work on the Ruby and Rails infrastructure team at Shopify.
00:00:49.980 You may notice that there are quite a few of us here, so please come and talk to us.
00:00:51.719 If you want to learn about some of the problems we're solving at Shopify, particularly in improving Ruby, or generally trying to enhance the ecosystem.
00:00:55.620 You can follow me online at @kevinnewton on Twitter.
00:00:57.420 So, what is Syntax Tree? To explain what Syntax Tree is, we need to discuss a couple of concepts.
00:01:01.000 First, what is a parser?
00:01:02.100 What does a parser do? How does it function?
00:01:03.960 We also need to understand what a syntax tree is. Naming, as you may know, is one of the hardest things in computer science, and I unfortunately named my project Syntax Tree.
00:01:11.960 So you might hear me use 'syntax tree' to refer to the structure and 'Syntax Tree' to refer to the project. Now, let's also discuss the visitor pattern.
00:01:18.600 I want to explain how this pattern applies to syntax trees.
00:01:19.739 Let's start by discussing what a parser is. You may have seen the phrase, 'Mats is nice, so we are nice.' A parser's job is to take plain text content and transform it into a data structure that a programming language can understand.
00:01:22.440 The first step is lexical analysis, which takes a plain text segment and breaks it into chunks.
00:01:27.240 For example, we would classify 'Mats' as a noun, 'is' as a verb, 'nice' as an adjective, and 'so' as a conjunction.
00:01:29.700 The second step is semantic analysis, where we take two segments of the sentence to classify their relationship. We give a name to the concept, defining the grammar.
00:01:34.680 We create a definition of how the words can fit together to form larger concepts, such as a verb phrase. When we combine these segments, we end up with a subject phrase, which is a noun followed by a verb phrase.
00:01:42.720 By adding a conjunction, we form a tree structure in our minds. A complete understanding of these concepts is what a compiler does.
00:01:51.720 What does this look like in terms of a syntax tree? Let's assume we're in Ruby and create objects that represent the nodes.
00:01:55.920 This is a straightforward process: we can create a separate class for every single node. We can store data and use the nodes to represent the tree structure. We can also rearrange them slightly. The nodes on the left are tokens, wrapping a value from the source code, while the nodes on the right are other branches in the middle of the tree.
00:02:06.300 Next, let’s expand one of these nodes.
00:02:07.200 We can walk this tree and perform interesting operations. We can add pattern matching features and comparison methods, as well as a copy method for immutable copies.
00:02:10.200 The 'accept' method is the key focus, along with the 'child nodes' method. The accept method allows us to interact with the node, facilitating the visitor pattern's functionality.
00:02:16.200 The double dispatch visitor pattern allows for dynamic dispatch: when we call the 'accept' method on a node, it calls back into the visitor, which allows us to have different visitor methods for different node types.
00:02:22.740 As I mentioned earlier, the 'child nodes' method allows us to define the relationships between nodes, ultimately letting us iterate through them.
00:02:29.460 In summary, we can use our visitor pattern to implement specific visitors that only focus on a subset of nodes in a tree.
00:02:34.740 Syntax Tree also allows us to format trees based on various algorithms. One such algorithm is pretty print, which organizes code by applying a consistent structure.
00:02:39.900 Formatting involves creating groups of nodes to determine where line breaks should occur. For example, if we reach the end of a line, we break the outermost group first.
00:02:44.100 By building a syntax tree for Ruby, we can implement all sorts of analytics and developments, such as linting, formatting, and semantic analysis.
00:02:51.300 Syntax Tree is not just a formatting tool; instead, it forms an object layer that represents the result of parsing Ruby code.
00:02:58.620 It offers tools to interact with and manipulate this object layer to perform diverse tasks, like refactoring and code improvement.
00:03:07.620 In total, there are five main functionalities of Syntax Tree: building a syntax tree, formatting it, creating a command-line interface (CLI) to facilitate interaction, developing a language server, and translating syntax trees into other formats.
00:03:16.740 Now, let's first take a look at building the syntax tree.
00:03:22.740 Building a syntax tree involves defining all the nodes. In the 'node.rb' file within the Syntax Tree repository, you will see definitions for all the nodes in Ruby.
00:03:30.840 We provide named fields for every single value and sub-node in the tree, source location data, comments attached to the tree, accept and child nodes methods, and immutable behavior by default.
00:03:43.140 The interaction with Ripper, the standard library parser generator, helps us build the nodes for our syntax tree.
00:03:46.920 Ripper itself has over 190 events, helping us parse Ruby code effectively.
00:03:55.140 Furthermore, we utilize techniques to keep track of nodes' positional context to ensure proper mapping and referencing.
00:03:58.740 We aim to present an interface that abstracts away the complexities of using Ripper, allowing users to focus on Syntax Tree's functionality.
00:04:04.500 Ripper does not currently provide support for everything; some features, like handling comments, are something we must manage ourselves.
00:04:12.960 Walking the tree allows us to implement visitor functionality, where adding visit methods for each node type enhances interaction.
00:04:20.700 We can leverage these methods to perform certain operations, such as counting sentences or omitting unnecessary nodes.
00:04:29.280 In addition, we can serialize the entire abstract syntax tree (AST) to JSON, allowing for easy data access and manipulation.
00:04:39.840 Formatting requirements demand the use of a pretty print algorithm, which is based on a foundational paper known as `A Prettier Printer` from the early '90s.
00:04:51.180 The basics of pretty printing include managing text nodes and breakables to control how Ruby code is presented in an unobtrusive manner.
00:05:00.300 Then, we developed a command-line interface (CLI) that allows users to work with the AST more easily.
00:05:06.960 The CLI provides useful functionalities, like generating a tree representation of the source code, formatting it, and searching for specific patterns.
00:05:14.640 A significant feature is the 'match' command, which allows capturing Ruby expressions that correspond to specific nodes, greatly aiding in pattern matching.
00:05:23.880 Using the CLI, we can quickly check how expressions will be interpreted based on the coding style.
00:05:28.500 By employing a language server protocol, we guide interactions with programming tools without needing to delve into intricate implementation details.
00:05:38.760 I also wanted to highlight the Ruby LSP project at Shopify, which uses Syntax Tree to aid various functionalities like document highlighting, folding ranges, and semantic highlighting.
00:05:46.440 This provides our users with enriched experiences and powerful tools while working with Ruby code.
00:05:55.620 Finally, we can translate Syntax Trees to connect with various Ruby parsers, maintaining compatibility and increasing efficiency in code analysis.
00:06:02.880 The core takeaway is that syntax tree nodes can essentially interchangeably work with different parsers, ensuring our feedback remains robust.
00:06:09.300 By leveraging all these functionalities, we're developing a truly versatile and powerful toolkit for Ruby developers.
00:06:15.840 The goal is for users to have an experience where you interact with Syntax Tree effortlessly, regardless of changes in the underlying parser.
00:06:20.940 Thank you very much for your time. I appreciate your attention.
00:06:25.640 Does anyone have any questions that I can answer?
00:06:30.920 One question here is about providing a syntax rewriter.
00:06:34.920 The answer is yes; you can already do that today. My goal is to build more user-friendly and advanced features into the project.
00:06:40.920 This could involve running a script using Rails CLI to automatically update deprecated code.
00:06:44.460 A follow-up question relates to the Pretty Ruby project, which started as a Prettier plug-in for Ruby formatting.
00:06:51.900 Pretty Ruby has transformed to utilize Syntax Tree, making it a lightweight wrapper around the tools we now have available.
00:06:59.100 This project has greatly reduced its complexity, now comprising only a few hundred lines of code.
00:07:02.640 In terms of the dependencies for Syntax Tree, Ripper is used primarily, but we are seeking to create a new Ruby parser for better efficiency.
00:07:10.740 The overall aim is to ensure users can rely on Syntax Tree for consistent performance and features.
00:07:16.520 Thank you again for your time, and feel free to reach out if you have further questions!
00:07:20.700 I hope you enjoyed the presentation, and I look forward to engaging with you further.