00:00:00.000
Ready for takeoff.
00:00:16.920
Hello everybody and welcome to my talk.
00:00:20.039
Today, I'll be analyzing an analyzer: a dive into how RuboCop works.
00:00:23.039
RuboCop is quite complex, and I don't think we can do an exhaustive study on it, so this will be more of a regular dive, not a deep dive.
00:00:26.640
My name is Kyle d'Oliveira, and I'm based out of Vancouver, Canada.
00:00:30.960
I've been working with Ruby and Rails for well over a decade now.
00:00:33.840
I love the language, I love the community, and I enjoy spaces like this where we can interact with each other and get a sense of what the Ruby community feels like.
00:00:39.600
I'm particularly drawn to tools that can benefit the entire community, and RuboCop is no exception to that. My hope is that after listening to this talk, you'll understand some of the basics about how RuboCop can analyze and correct code.
00:00:48.719
Maybe some of you will feel inspired to contribute custom rules for yourselves, your organization, or even to the open source repository, or to start playing around with new tools that utilize similar ideas or concepts.
00:00:52.860
I've been working at Aha! for the past two years, and it is one of the best workplaces I've been a part of. We are a human-centric company that helps other companies build the products that matter to them with our suite of products.
00:01:01.440
We have an amazing distributed team, by design, all over the world. We have one of the best company cultures I have seen, powered by the responsive method, which provides us a framework of shared values that we all agree upon and embody.
00:01:05.400
It really helps empower us so we can move quickly and stay aligned. So, if you’d like to be part of that culture, of course, that's higher.
00:01:08.880
Linters are static code analysis tools that can flag programming errors, suspicious constructs, and stylistic errors. But they can do more, too; they can alert around security issues, and they can be used as a tool for training other engineers.
00:01:15.420
RuboCop is one of the most popular linters for Ruby. I’m just curious: by a quick show of hands, has anyone here not worked with RuboCop before?
00:01:19.200
I don’t think I see a single hand, which is about what I expected.
00:01:23.820
Rails is one of those gems that is very closely tied to Ruby, so much so that there's often the assumption that if you say you're working with Ruby, they just assume you're working with Rails.
00:01:30.960
To put things in perspective, Rails has been downloaded about 387 million times from RubyGems and is about the 40th most popular gem.
00:01:37.680
RuboCop has been downloaded about 270 million times and is about the 76th in popularity. Although it is not as popular as Rails, RuboCop is significant enough that when you talk about Ruby, you're also thinking about RuboCop.
00:01:45.360
The original creator of RuboCop gave a talk about this in 2018 at RubyKaigi, so after this talk, if you're really curious about learning more about RuboCop, this is a great resource.
00:01:50.040
I wanted to begin today's talk with a little personal story.
00:01:53.260
I first got into using RuboCop several years ago at a time when we were trying to establish an agreed-upon style for all the code we were writing.
00:01:59.580
However, we often ended up with pull requests full of nitpicky comments.
00:02:01.320
These comments wouldn’t address the content of the pull request but would focus on how the code looked.
00:02:06.180
Sometimes these critiques were valid, but other times, we would get into long, pointless debates over trivial matters.
00:02:11.760
Debates would arise over the use of single quotes versus double quotes, the maximum line length, or whether we needed an extra line at the end of a guard clause.
00:02:16.200
We thought that if RuboCop could handle our linting and styling for us, we could focus all the comments on the actual content of the pull requests, which helped quite a bit.
00:02:21.180
However, it was also incredibly frustrating to work with. We'd write code, push it up to CI, and it would get rejected because RuboCop flagged violations.
00:02:25.860
We would then go back, fix those issues, and do it all over again. I had a project in which I needed to namespace a large number of constant references.
00:02:31.259
As a result, all the lines became longer, and the maximum line length rule became the bane of my existence. I had to hunt down every line that exceeded the length and figure out how to break it up into pieces, which was quite tedious.
00:02:37.020
Many engineers shared a similar experience, and the message of dissatisfaction with how RuboCop was rolled out became mixed with expressions of disdain for RuboCop itself.
00:02:42.180
However, we eventually leaned into RuboCop and learned how to use its autocorrect feature, and much of the frustration began to fade away. We discovered we could have RuboCop fix our code automatically when it detected violations.
00:02:51.420
Most of the time, those changes were spot on, so we continued to explore this further.
00:02:55.380
We started writing our own custom cops to help us transition from bad patterns to good ones. We also developed some to keep deprecations under control as we were upgrading Rails.
00:02:59.880
We utilized RuboCop's error messages as a way to explain concepts related to our internal documentation and to justify various decisions being made.
00:03:06.600
I gave a talk at RailsConf 2020 illustrating how RuboCop can be used to communicate information about bad patterns in code.
00:03:10.320
That is another resource you can look up later if you're interested in learning more.
00:03:14.040
This year, RuboCop turns ten, and in the open source world, this is quite a big deal.
00:03:18.000
Development of RuboCop has remained consistent over the years, and given where we are now, I don't see RuboCop going away anytime soon.
00:03:22.800
However, it is large enough that understanding how RuboCop works can be quite tricky; there are thousands of commits, thousands of closed issues, and hundreds of contributors and releases.
00:03:29.880
This is not going to be a code review of RuboCop.
00:03:31.620
I don’t think I could cover it thoroughly, and it would likely be overwhelming.
00:03:34.440
With the history of RuboCop being somewhat of a marathon, we have just 30 minutes here, which feels like more of a sprint.
00:03:38.520
Instead, this talk will focus on how the basics work. I'll outline the basics and help illustrate how some of the processes function within RuboCop.
00:03:41.880
The goal is to stay as close to how RuboCop works as possible, although I may simplify some points for the sake of clarity.
00:03:44.230
So, let's dig into this! How does RuboCop work? A good way to dive into the details is by looking at the command line interface.
00:03:47.820
Let's see how it operates for a single file and a specific cop.
00:03:52.560
When you run the RuboCop command, it first executes the file, which loads the Ruby library and processes it.
00:03:56.699
Then it loads some configuration files. These steps are the easy part; I’ll touch on them for completeness.
00:03:59.340
The real substance comes when RuboCop needs to process the file—this involves taking the existing file and performing actions to make decisions about the code inside it.
00:04:04.560
Once it has processed the code, it will run through a series of cops that will determine whether any offenses exist.
00:04:08.700
If there are any violations, it can write or rewrite the source code and change the file accordingly.
00:04:14.040
It will then loop back to finish making any adjustments and may start the process over again.
00:04:19.200
This loop is crucial, as multiple different cops can adjust the same line of code, sometimes requiring multiple passes.
00:04:22.920
Also, be cautious of infinite loops! We'll break down this process into clearer steps.
00:04:28.920
The first element in this entire process is the command line interface.
00:04:31.680
This part is straightforward, so I'll keep it brief, as this isn't primarily focused on command line interfaces.
00:04:35.040
There is an executable file called RuboCop, configured to utilize Ruby. It loads the appropriate libraries into the load path and requires RuboCop, and then it performs some processing.
00:04:38.639
Now we transition from the command line to the Ruby realm. This serves as RuboCop’s entry point to perform its intended tasks.
00:04:46.560
The next step is loading configuration, determining which cops are active, and defining what options need to be provided.
00:04:51.420
Generally, this is all done through YAML files. I won’t delve into the options here, as thorough documentation is available online.
00:04:56.760
Basically, there is a large YAML file where options can be set for all cops.
00:05:04.500
For example, you can specify which Ruby version you’re using, as well as file patterns to include or exclude.
00:05:10.260
You can also provide specific configurations for individual cops by nesting the settings according to the cop's name.
00:05:15.060
That’s a quick overview of YAML config.
00:05:16.760
Now onto the more interesting part: processing the code.
00:05:21.600
This topic might feel a bit meta because we're discussing code that is designed to understand code.
00:05:29.460
To illustrate how this works, let’s consider a specific example: RuboCop has a style cop called ArrayJoin.
00:05:34.560
Its purpose is to check whether the star method is used to join values of an array.
00:05:39.540
Imagine we have some code, and we want to determine if any bad patterns are present.
00:05:43.920
If it detects a violation, we want it to flag that code accordingly.
00:05:49.560
For instance, if we have an array containing 'foo', 'bar', and 'baz' that is being joined with the star method, it clearly violates the rule.
00:05:54.420
While humans can easily recognize this, we need to write code that identifies it programmatically.
00:05:59.040
One way to do this would be to use regular expressions, starting with a seemingly complex one.
00:06:03.960
It can quite literally search for patterns like an opening brace, a bunch of characters up until a closing brace, then the star method, and additional spaces.
00:06:10.800
However, regex becomes very convoluted when accounting for varying code styles.
00:06:15.960
What if we use single quotes, don’t include spaces, or vary the number of arguments passed? In these cases, regular expressions can rapidly become inadequate.
00:06:20.760
There must be a more efficient way to achieve this than endless variations of regex.
00:06:24.460
The solution is to take Ruby code and convert it into an abstract syntax tree, or AST.
00:06:31.320
The AST serves as a structured representation of the code.
00:06:38.040
For example, consider a simple begin rescue block. This illustration will help visualize its corresponding AST.
00:06:43.620
Ruby provides a gem called parser that handles this conversion. The parser comes with Ruby, requiring no additional installations straight out of the box.
00:06:51.240
Taking our example from earlier, imagine we want to see what the AST for our array looks like.
00:06:56.640
The entire construct represents a method call—a send node, with children that represent various elements.
00:07:01.560
The first child indicates the receiver (in this case, our array), while the second identifies the method (the join method).
00:07:06.780
Additional children will provide the arguments being passed, which in this example contains the string that specifies the joiner.
00:07:11.880
Now, if we want to determine whether our code violates any rules, we can apply the parser to obtain this AST.
00:07:14.880
Expressing the AST in a formatted manner will yield an easily digestible output.
00:07:18.380
The beauty of the AST representation is its consistency.
00:07:21.600
If our original code is amended—say, by switching quotes or adjusting spacing—the underlying AST remains unchanged.
00:07:26.520
This stability permits RuboCop to analyze code with confidence, leading to reliable conclusions.
00:07:30.600
Utilizing a utility gem known as RoboCop AST enhances this functionality, facilitating a more user-friendly interaction with our AST.
00:07:36.360
A class called ProcessSource within this utility can take some code in a string format and the expected Ruby version.
00:07:41.940
From there, we can delve into the AST and query it as per our needs.
00:07:44.520
In our previous example, we can check if it violates the array join rule.
00:07:52.440
By checking if the node type is send and verifying its receiver, we can ask the right questions.
00:07:58.320
These checks allow us to ascertain if a violation exists.
00:08:01.680
Now that we have some understanding of how RuboCop processes the code, let’s explore how it applies specific cops.
00:08:06.780
Let’s introduce a bit of complexity. What if our joining code is wrapped in a method?
00:08:09.840
This changes the approach; we can no longer simply check for send type.
00:08:14.400
Instead, we need to construct a method definition node, which lacks a receiver.
00:08:19.740
However, we can still identify the arguments passed to this definition, allowing us to inspect the inner workings.
00:08:24.960
To navigate through the nodes, we need to traverse the AST and analyze its components.
00:08:29.700
Through this traversal, we can visit every node and gather the necessary data.
00:08:34.260
For instance, let’s create a method called walk that will allow us to visit every node within the AST.
00:08:39.000
For each node type, we will specify a corresponding method that starts with 'on'.
00:08:44.520
For example, we need an onDef method since the def is the top-level node.
00:08:47.880
Next, we determine how to process the information within our definition node.
00:08:52.500
The arguments passed will yield their own children while the body of the definition may contain a varying number of types.
00:08:56.460
By defining an onSend method, we can analyze the operation more directly.
00:09:01.800
The onSend method will help dig further into understanding what is happening at this node.
00:09:06.180
Returning to the onArgs method, we can now work through each child of this AST and invoke appropriate methods.
00:09:09.780
With these methodologies in place, we can ensure we’re accessing every necessary part of the AST.
00:09:14.280
RuboCop AST has abstracted a lot of this away via a module called RoboCop AST traversal.
00:09:19.560
This module allows us to traverse over any arbitrary AST in a depth-first manner.
00:09:23.820
When visiting each node, if any of the active cops define methods that are pertinent to these nodes, they will be handed over for further processing.
00:09:29.040
When defining our own class, we can include this module to navigate any given AST.
00:09:32.520
Each node type will correspond to methods we designate in this class.
00:09:36.600
Thus, if we had our onSend method, we could apply it to each relevant node and perform the necessary assessments.
00:09:40.080
Our aim is to query attributes like if a receiver is indeed an array, is the method name correct, and if all conditions are satisfied, to register a violation.
00:09:45.480
This method serves as the backbone of how we determine if a violation exists.
00:09:50.760
For further examples, we could observe another cop—say, a MinMax cop.
00:09:55.620
This cop would take advantage of its own on array method to analyze the specific arrays it encounters.
00:10:01.380
From here, an evaluation is made—like checking if the minimum and maximum values match the expected conditions.
00:10:04.500
This structure allows the cop to focus specifically on parts of the AST that it cares about and provide precise evaluations.
00:10:08.220
There are a plethora of different node types possible in ASTs—from variable assignments to class definitions, each wielding its own specific evaluations.
00:10:14.459
The ability to recognize and represent them allows RuboCop to engage meaningfully with the replaced patterns.
00:10:19.800
Now, let’s see how RuboCop modifies the AST and autocrrects the identified issues.
00:10:24.600
If we revisit the earlier method needing adjustment, we would replace the 'star' with a 'join' while properly wrapping the arguments.
00:10:29.040
Luckily for us, the parser gem incorporates a class called TreeRewriter, which streamlines the rewriting process. This class manages various transformations in the proper order.
00:10:35.040
Now, the immediate questions that arise involve determining the range and deciding the new content.
00:10:41.520
Diving into the range, this is defined by a particular class that indicates character spans across Ruby expressions.
00:10:46.560
Every piece of parsed code has an accessible location method, revealing interesting facets of that node.
00:10:50.340
One of these facets is the expression itself, indicating precisely where that piece of the AST exists.
00:10:54.600
Focusing on the send node, identified as the third child of the overall AST, we can utilize location to find its corresponding source code.
00:11:01.680
By checking the location.expression, we discern what part of the source code requires alteration.
00:11:05.640
Once we establish the content to be substituted, we can ask the receiver for its source, revealing how it was originally expressed.
00:11:09.960
Likewise, we can assess the arguments passed in and obtain their original string representations.
00:11:16.500
Bringing these together, we set up the stage for our rewriting procedure, focusing explicitly on that send node and gathering its associated data.
00:11:24.840
Employing this knowledge, we apply the corresponding replace method to adjust the AST per our requirements.
00:11:29.760
With the tree successfully rewritten, we achieve the intended alterations, seamlessly transforming the source code as needed.
00:11:35.520
Effectively, this encapsulates how RuboCop operates. Upon detecting violations, the auto-correcting mechanisms within it will enact necessary changes.
00:11:40.680
As corrections are finalized, RuboCop writes the changes back into the file, reprocessing if further alterations are warranted.
00:11:44.160
Now that we've journeyed through this crash course on RuboCop, it's important to reflect on the insights gained.
00:11:49.320
We’ve explored components of the command line interface, seen how files convert into ASTs, and analyzed how RuboCop interacts with that tree.
00:11:53.880
I hope this enhances your understanding of RuboCop's functionalities, inspiring you to think about how to leverage this knowledge.
00:11:58.899
If you are interested in building a tool that needs to analyze source code, the concepts we've discussed about traversing and parsing the abstract syntax tree could be very beneficial.
00:12:05.880
Thank you for your time; I hope you found this engaging. If there are questions, feel free to approach me after the talk.