Talks

Syntax Tree

RubyKaigi 2022

00:00:01.159 Hello! I am here today to talk about a new project called Syntax Tree. My name is Kevin Newton, and you can find me online at kddnewton. Currently, I work on the YG team at Shopify.
00:00:07.560 Syntax Tree was a project that came about during the last round of the Ruby Association Grant. It was pitched as a project to build a new standard library Ruby formatter. Since then, it has evolved to provide a lot of additional functionality, supporting various tools and different implementations of language servers, linters, and more.
00:00:34.620 There are a few components that contribute to building something like this, and I will discuss each of them in turn. The first step is to actually build a syntax tree from an input source. Next, I aimed to format the syntax tree. Alongside that, I planned to build a CLI to interact with that formatting tool. After that, I developed a language server that interacts with your IDE to incorporate this functionality directly. Finally, I will discuss translating the syntax tree into other syntax trees and the implications of that process.
00:01:06.299 So, let’s dive into the first step: building a syntax tree. For each node in the tree, I needed to define its shape, types, and various methods that would be common to each kind of node. The general shape of a node includes several key components.
00:01:39.899 For example, in this instance, a node has a method call, which can represent either a method or a local variable. Additionally, it has a value, which is the actual identifier contained within it. Each node also specifies a location that indicates where it came from in the source code. This is important because it differentiates it from other tools like Ripper, as every node in the tree has associated location information. Furthermore, each node also contains comments, which are any inline comments appearing before, after, or on the same line as that node.
00:02:08.520 These elements—location, comments, and additional methods—like `accept` and `child_nodes` allow for iteration over the entire tree. The method `child_nodes` facilitates walking through the tree for every single node, while `accept` supports the visitor pattern, which we will touch on later. In essence, a node can accept a visitor and invoke a specific method on that visitor with `self` as an argument.
00:02:43.099 Every node in the tree also responds to `deconstruct` and `deconstruct_keys`, which enables pattern matching. An important point to note is that every node is immutable: all fields are read-only, and it's designed to be used in this way. If you want to modify the tree, you would need to use additional tools for that.
00:03:04.980 When defining each node, it’s essential to recognize the number of different node types available. Ripper, the standard library for parsing Ruby, generates nodes from a grammar file. I used Ripper to build my tree. There are 190 unique Ripper events, and while not all correspond to a node, many of them do.
00:03:35.299 It's worth noting that Ripper dispatches events for creating nodes in the syntax tree for things like keywords, and some keywords can even become nodes by themselves. In the context of Ripper, it has an interesting syntax defined in the parser grammar file, which includes special comment structures. These comments contain logic that gets processed by a preprocessor to generate another grammar file, which is further used by Bison to yield the final Ripper parser.
00:04:10.020 For instance, when you parse specific Ruby code, you could create a subclass of Ripper that implements methods handling dispatched events. Each method can be defined to act when specific parts of the syntax tree are accessed.
00:04:39.660 As you parse the code, you may note the lack of location information since Ripper only provides location information on the leaves of the tree. This means it provides details only for scanner events, like constants and identifiers, while beginning and class tokens may not have such information right away. To connect these dots, it’s necessary to maintain a stack of tokens as they are found, ensuring you have access to the correct context for any node.
00:05:27.840 While building the stack, we can interact with keywords in the scope. When handlers process specific keywords, we can pop off the most recent keyword stored to use as the location information, required for formatting.
00:06:21.600 The Ruby syntax tree is a complex representation that often does not quite align with the general abstract syntax tree (AST) perspective, as the AST only represents general ideas and lacks direct ties to the source. Instead, this effort moves closer to a concrete syntax tree model that includes actual location data from the source code.
00:06:47.760 Each keyword node we maintain on our stack allows us to pop relevant tokens as we process nodes within the syntax tree. This results in keeping the necessary information for placement and reformatting as required. Ripper can only handle certain assertions; hence, there are functionalities within the Syntax Tree that surpass what Ripper accomplishes.
00:07:47.880 One significant challenge comes with handling comments. As one can surmise, comments can occupy various locations in a Ruby file. To format these accurately, I maintain an extensive list, capturing every scanner event related to comments. When constructing the AST, we can refer to this list to place comments in their appropriate locations.
00:08:24.700 Walking the tree is a critical implementation, and I added a visitor pattern to accommodate this process more effectively. Initially, the visitor wasn’t integrated into the original Syntax Tree release. However, after adoption by Shopify for their Language Server Protocol (LSP), it became clear a visitor pattern was necessary to simplify navigation through the nodes while maintaining the tree structure.
00:09:54.960 A visit method lets us pass a visitor into the node, allowing for deeper traversal by instantiating methods through `accept`. This pattern facilitates running specific logic for every type of node efficiently.
00:10:28.680 Continuing with our example, if you wanted to interact with specific node types, you would create visits for each unique node type, allowing for detailed traversals that facilitate building additional functionalities.
00:11:07.680 Now that we have defined how to construct the syntax tree and traverse it, let’s discuss formatting that tree. There are various existing algorithms for tree formatting, and a notable one is the algorithm connected to the Pretty Print gem, which has been utilized for many years to transform complex structures into a reader-friendly string format.
00:11:50.760 As we begin formatting, the outermost groups are prioritized for breaking on width constraints. Using examples such as massive arrays, we can see where breaking occurs, examining how top-level elements are handled based on print width.
00:12:32.999 Within that context, the formatter builds a representation through nodes—whether text that needs breaking or unbreakable text for organized viewing. The overall algorithm seeks to format appropriately across various levels while ensuring maintainability of all text and comments.
00:13:19.080 Incorporating the ability to insert new lines becomes rather complex as comments are scattered throughout. Fortunately, we can implement a method to track the appropriate formatting of nodes and the necessary breaks that occur while maintaining all relevant comments.
00:14:09.660 As we explore formatting various constructs, we see opportunities to smartly modify syntax based on spacing within conditional statements, allowing us to utilize more concise constructs where suitable.
00:15:00.960 Thus, the formatting engine I built includes all these functionalities, carefully considering Ruby’s nuances and providing enhancements to many formatting aspects that help developers improve readability.
00:15:39.720 The CLI created for Syntax Tree integrates with various commands, allowing users to interact easily. Running the tool can generate an AST, serve documentation, and provide built-in visitors that help reformat Ruby code efficiently, giving an interface for debugging purposes.
00:16:07.740 As we extend the functionality, we also incorporate a language server through standardized protocols emerged by Microsoft, allowing different functionalities like text document formatting and inlay hints to show operator precedence.
00:16:57.740 My initial implementation of the language server may be limited in scope compared to the more robust capabilities of Shopify’s Ruby LSP project. However, it emphasizes ensuring a better experience for Ruby developers overall, particularly by facilitating seamless formatting.
00:18:22.680 With ongoing exploration, a key area of focus remains translating Syntax Tree’s primal representation into traditional parser outputs. My aim is to align this representation with existing tools to enhance interoperability across various static analysis tools.
00:19:02.160 In conclusion, Syntax Tree represents a pioneering approach to manage fast parsing and formatting using Ruby's standard library effectively. Continued efforts will develop its capabilities, ensuring users benefit from quick, efficient formatting mechanisms.
00:19:47.880 I encourage anyone who is in need of formatting assistance or has queries to reach out. Thank you very much for your time!