Richard Huang

Summarized using AI

Find and Replace Code based on AST

Richard Huang • May 11, 2023 • Nagano, Japan

In his presentation at RubyKaigi 2023, Richard Huang discusses the concept of "Find and Replace Code Based on AST" (Abstract Syntax Tree) and its relevance in modern coding practices. The AST is a critical data structure that allows programmers to understand and manipulate code more effectively, representing the code as a tree of nodes where each node symbolizes a distinct code structure, such as method definitions or class declarations.

Key points discussed include:
- Definition and Importance of AST: Richard explains how AST is structured, illustrating with an example of a class in Ruby. The AST provides structured representation, making it easier to search and replace code precisely compared to traditional text-based methods.
- Benefits of AST-based Tools: He highlights the advantages of using AST for finding and replacing code, allowing for more accurate searches. For instance, distinguishing between different usages of the same term in code and avoiding false positives in string representations.
- Available Tools: Richard mentions various AST-based tools available within the Ruby ecosystem, such as Rigid and Robocop, as well as his own tool, Synvert, which he created for upgrading Rails projects. Synvert simplifies the process by automating syntax changes during version upgrades.
- Synvert Architecture and Functionality: Synvert uses a domain-specific language (DSL) for defining transformations. Richard demonstrates how Synvert allows users to perform complex queries and mutations of the code’s AST, along with providing tools that cater to both seasoned and novice developers.
- Real-world Applications: He provides examples of how users can execute queries to locate specific structures in the code, like searching for hash rocket keys or identifying setup methods that do not call a super method. He explains how quick updates can be accomplished through Synvert's user-friendly interface.
- Interactive Features and Community Feedback: Richard shares insights about the GUI application for Synvert, enhancing usability for developers by allowing them to review, accept, or reject changes before applying them to their code base. He invites feedback and contributions from the community to continually improve Synvert.

Conclusions and Takeaways: Richard emphasizes the importance of leveraging AST for code manipulation, the power of Synvert, and its user-friendly features which can accelerate development and streamline code upgrades over traditional methods. This presentation serves as a guide for developers looking to enhance their code maintenance practices using AST-based tools.

Find and Replace Code based on AST
Richard Huang • May 11, 2023 • Nagano, Japan

RubyKaigi 2023

00:00:01.860 Hello everyone and good afternoon.
00:00:04.700 My name is Richard, and you can contact me on GitHub or Twitter at flyerHCM.
00:00:08.280 Today, my topic is 'Find and Replace Code Based on AST.'
00:00:15.839 So, what is an AST? AST stands for Abstract Syntax Tree. The AST is a data structure that makes it easier to understand and manipulate code.
00:00:23.580 It represents the code as a tree of nodes. Each node in the tree corresponds to a different structure found in the code such as a class definition or a variable assignment.
00:00:34.440 For example, here I define a class named RubyKaigi with two instance methods: year and location. The right part shows the corresponding AST: the class node.
00:00:51.899 For the class node, the node type is 'class', and the rest are class name, parent class, and class body. For the instance methods, the node type is 'def', and the other properties include method name, arguments, and method body.
00:01:13.619 So, why do we use the AST? Finding and replacing code based on AST offers several benefits over traditional text-based or regular expression-based find-and-replace methods. The AST provides a structured representation of the code, allowing for more precise searching and replacement.
00:01:36.180 For example, when searching for instances of postcode, accurately locating all occurrences can be challenging using text-based or regular expression-based searches. We want to find all the postcodes used in the left part of the code but not in the right part, such as when it’s contained in a string, a header, or a message named custom underscore ports.
00:02:07.560 By leveraging AST-based searching, we can precisely identify all intended instances of the postcode. Moreover, removing the postcode becomes straightforward using AST, and later I will show you how.
00:02:30.060 AST-based find-and-replace tools are powerful because they can perform complex searching and replacement. They can find hash pairs where keys and values are identical and delete values. They can also find old hash rocket key syntax and replace it with the new hash syntax.
00:02:53.820 Additionally, they can find any test setup methods that do not call super and prepare the super method.
00:03:17.040 So, what tools based on AST are available in the Ruby world? There are several tools that use AST for a variety of tasks, including code analysis, linting, and formatting. Some popular tools include Rigid, a Ruby code smell detector that uses AST to identify common code smells; Robocop, a popular Ruby linter that uses AST for its analysis; and Rufo, another Ruby formatter that uses AST to format Ruby code.
00:03:38.519 I have an alternative solution for you called Synvert. This tool enables you to write snippets of code to rewrite your source code. I initially developed Synvert to simplify the process of upgrading a Rails project.
00:04:00.960 At that time, I was working on a Rails project that used version 2.3, and I needed to upgrade it to version 3. The upgrade involved numerous syntax changes, and it was difficult to manually find and replace all of them, so I created Synvert to help automate the process. It worked successfully. Over time, I continued to use Synvert to upgrade the Rails project to versions 4, 5, and 6.
00:04:39.960 In 2014, I gave a presentation at RubyKaigi titled 'Write Ruby to Change Ruby Code', which introduced Synvert and its capabilities. This was the architecture of Synvert at that time.
00:05:00.180 Similar to Rubocop, Synvert uses AST to analyze and transform Ruby code and provides a set of DSLs for defining code transformations in a concise and readable way. The tool uses snippets that are predefined code transformations.
00:05:30.660 It is a command line tool that reads Synvert snippets and uses a single core to automate the code transformation. In recent years, I have focused on making Synvert more user-friendly and accessible to a wider audience of developers.
00:05:52.200 To achieve this goal, I've been working on improving the APIs and developing a GUI application. The in-house API of Synvert simplifies the process of writing code snippets, while the GUI application enables users to use Synvert without any need for writing code at all.
00:06:16.620 This means that Synvert can be used by both experienced and inexperienced developers to easily find and replace code based on the AST.
00:06:58.560 To begin, I extracted two gems from Synvert core: 'node query' and 'node mutations'. These two gems provide a specialized focus on querying and mutating code using the AST.
00:07:22.499 The 'node query' defines a node query language and the node rules to query nodes. NQL is a CSS-like node query language, which is more expressive and powerful, while node rules are simple hash objects.
00:07:45.300 Let's see some examples. To find the port and debugger codes, the above is the NQL query 'send'. It matches the node whose node type is 'send', and a string in the square brackets is used to match those attributes.
00:08:10.740 Here, it matches a receiver whose message is either 'ports' or 'p'. The below is the node rule, which is similar but uses hash objects to find two string arguments, checking the sender node whose message is 'gsub' and contains two string arguments.
00:08:42.240 We can also use NQL to find hash rocket keys. The NQL will locate the hash pair node whose key is a single node, applying a regular expression for a hash rocket key.
00:09:14.100 The node rules can also match a regular expression to find hash pairs where keys and values are identical. NQL queries can find hash pairs where the key equals the value, evaluating the value in double curly braces.
00:09:45.299 Now, let's discuss a complex case: finding mini test setup methods that do not call 'super'. The NQL query uses descendant and sibling selectors, similar to CSS selectors.
00:10:02.339 It searches for class nodes whose parent class is 'mini test'. Within each class node, it looks for the 'def' node with the name 'setup'. The sibling selector ensures there is no 'super' node within the 'def' node.
00:10:27.599 It is essential to note that this cannot be accomplished in a single node rule. Instead, it must be broken down into three steps. The first step is to locate the class node. The second step is to find the 'def' node, and the last step is to verify the absence of a 'super' node.
00:11:02.000 Additionally, to find a hash value, we can query the value of a hash key using NQL to find the hash node and check the value of the status key.
00:11:25.440 The NQL can do much more. I used Rex and Rack to build the legs and puzzle of the node query. If you are interested in this, you can check out the source code on GitHub.
00:11:59.940 Now regarding node mutations, the 'node mutation' provides APIs to rewrite the source code based on the AST. It tracks the start and end positions and the new code to replace while generating the new source code.
00:12:32.160 Here are some examples: if the source code errors at base equals a string or is not present, we call 'replace' with 'original receiver.add' with two arguments.
00:13:06.840 The source code will thus be replaced with 'errors.add' with the first argument as the single argument base and the second as the string argument. If the source code is 'cross post' inheriting from 'active record base', we call 'replace parent class with application record', and the source code gets replaced by 'application record'.
00:13:33.360 Here is an example of inserting a URI at the beginning, changing the source code to 'URI.open'.
00:14:05.099 You can also prepare the 'super' by using 'prepend' to add a super to the beginning of a setup method and 'append' to add the super to the end of a teardown method.
00:14:37.020 You can also use 'delete' with the receiver to remove the factory dot, thereby altering the source code accordingly.
00:15:01.680 The remove method will delete the entire node, while we have helpers for hash nodes, such as replacing the message after commit.
00:15:22.380 For instance, it replaces the last argument, which is a hash node, with the evaluated value of the last argument to create commit.
00:15:50.520 This allows the source code to be transformed into a cleaner, more efficient format. You can use the node query and node mutation outside of the Synvert context.
00:16:02.220 Here's an example demonstrating how to use them to remove port statements. First, include the necessary dependencies. We use the parser gem to convert source code to AST nodes.
00:16:43.680 We require personal node ext to assign names to child nodes in the parser node. Then we utilize the parser to transform source code to AST nodes and initialize a node query.
00:17:19.320 Following this, we call 'query nodes' to find the matching nodes, then initialize a node mutation object with the original source code, calling the 'remove' method to eliminate any matched nodes.
00:17:48.960 Lastly, using the 'process' method, we obtain the updated source code, which no longer contains the port statements.
00:18:10.560 I have created an adaptable interface to accommodate different parsers within node query and node mutation. Currently, it utilizes the parser gem to convert Ruby code into AST nodes.
00:18:24.480 However, it is designed to be flexible, capable of integrating alternative parsers. For instance, I am currently working on a syntax tree adapter.
00:18:39.240 Synvert offers a set of DSLs built on top of node query and node mutation, making it easier to write code and snippets. It uses visiting files to identify all Ruby files, using find node to accept NQL strings to query nodes.
00:18:56.460 It also designates 'remove' to the node mutation, allowing more streamlined operations.
00:19:11.520 You can leverage other DSLs like check dependencies and manipulate files to add or remove snippets. Users can write many small snippets to combine them into larger transformations.
00:19:37.920 Additionally, Synvert supports view files, allowing you to rewrite calls in ERB, Haml, or Slim files. Synvert has a GUI that enables users to execute local or remote snippets.
00:20:05.640 At present, I've introduced most of the Synvert DSLs that developers can use to write their snippets. However, there may still be a learning curve in mastering these DSLs.
00:20:43.860 To enhance the developer experience, I've created a GUI application that allows developers to execute code snippets without writing any code at all.
00:21:14.460 The Synvert application supports both Windows and Mac OS, enabling users to run code snippets and see the differences before applying changes.
00:21:57.240 This facilitates an overview of changes, allowing developers to review alterations individually and decide which ones to accept or reject.
00:22:30.600 Once reviewed, developers can selectively apply changes to the code base. The application includes a feature for importing some code with expected outputs.
00:22:55.560 It will automatically generate the corresponding snippet code for your source code. The generated snippet can then find or replace in your source code. It also provides a list of official code snippets that can be selected and run.
00:23:43.200 These snippets are well-defined and have already been reviewed and tested, ensuring they are safe to use.
00:24:05.520 Here is a demo using Synvert without writing any code at all.
00:24:31.320 On March 2nd, 2023, I read a tip in Ruby Weekly saying 'value.new is a lot faster than value not equal to new' — it is nearly five times faster.
00:24:56.679 So, I wanted to update my code base accordingly. I opened this Synvert application and copied the code from the tip, pasting it into the inputs and outputs.
00:25:23.339 Then I clicked the 'generate snippet' button.
00:25:54.300 After reviewing the snippet, I clicked the search button to find all occurrences of 'value not equal to new' in my code base.
00:26:16.920 I can selectively apply the changes or click the 'replace all' button to apply them all at once.
00:26:39.780 I can generate a GIF to view all the changes made to my code base.
00:27:02.040 On March 2nd...
00:27:17.520 As you see, it took about one to two minutes to see the tip applied to my code base.
00:27:29.580 We don't need to write any codes.
00:27:40.440 You may also know that Robocop has a cop to check and fix this issue.
00:27:50.100 Let me take a look — this is not a complete source code as I removed unrelated code.
00:28:05.640 Ruby Rubocop can identify the issue and auto-correct it using its own query language in node measurement.
00:28:28.800 It finds nodes that match the pattern where the node type is 'send', ignoring the receiver and checking if the message is 'not equal' with an argument of 'new'.
00:28:48.960 In the auto-correct method, it uses a regular expression to replace the source code.
00:29:05.640 This is the Synvert snippet code that initializes a writer to find all Ruby files.
00:29:46.740 It uses the 'find node' with an 'enqueue air' to locate the 'send' node.
00:30:13.200 It checks if the node type is 'send', with the message being 'not equal', where the argument size is one and the first argument is 'new'.
00:30:39.120 Then, it replaces it with 'original receiver.new'.
00:31:03.060 Using Synvert snippets is much easier to read, understand, and share.
00:31:24.339 You can simply copy and paste the snippet code from anywhere and run it within the Synvert UI application.
00:31:47.820 I also built a VS Code extension for Synvert, working similarly to the Synvert UI application.
00:32:04.560 You can use it to run the code snippet, generate snippets by input and output, and search for official snippets.
00:32:26.040 While it cannot run snippets, it can test snippets, generate snippets, and create AST nodes from source code.
00:32:46.560 This will help you experiment with the AST nodes.
00:33:05.640 In addition to Ruby, I frequently use JavaScript and TypeScript, so I created Synvert for JavaScript.
00:33:44.940 With this tool, you can write snippets to rewrite your JavaScript and TypeScript source code, similarly to Synvert for Ruby.
00:34:12.300 I posted video tutorials on Substack demonstrating how to use Synvert for specific tasks such as migrating from jQuery and upgrading Rails 4 to 5.
00:34:41.760 These tutorials provide detailed guides on using Synvert through a series of step-by-step instructions.
00:35:07.740 Synvert is not yet mature. If you have any issues writing code snippets or suggestions and feedback, please do not hesitate to contact us.
00:35:29.760 Thank you.
Explore all talks recorded at RubyKaigi 2023
+51