Communicating with Cops

by Kyle d'Oliveira

In his talk titled "Communicating with Cops" at RailsConf 2020 CE, Kyle d'Oliveira discusses the use of Rubocop as a tool for standardizing coding practices and improving developer onboarding within software development teams. He emphasizes the importance of communication and education in coding practices, especially when onboarding new developers into a codebase. His main arguments include the need to alleviate cognitive overload from excessive comments on pull requests and the benefits of automating feedback through static analysis tools like Rubocop.

Key points discussed in the presentation include:
- Challenges in Developer Onboarding: Kyle outlines how his team at Cleo previously utilized a "pet project" to help new developers acclimate to their codebase. He noted that this often resulted in overwhelming feedback and comments during code reviews.
- Cognitive Overload: The discussion highlights the issues of cognitive overload when numerous comments are provided on pull requests, hindering a new developer's ability to process useful information.
- Automated Education with Rubocop: Kyle presents Rubocop as a solution to automatically catch common styling and behavior issues in code. This simplifies the job of code reviewers and allows them to focus on the actual functionality of code rather than stylistic elements.
- Custom Rule Development: Kyle discusses the flexibility of Rubocop, mentioning that anyone can write their own set of rules (cops) to enforce specific coding standards relevant to their organization. He provides examples of customizing cops to catch problematic patterns and educate developers on best practices.
- Case Study: He showcases a real example of a subtle bug caused by improper handling of global variables in tests, and how custom cops could automate the prevention of such issues by providing immediate feedback during coding.
- Community Benefits: The talk also emphasizes the community aspect of Ruby and how shared tools and knowledge can significantly improve code quality and developer experience across teams.

In conclusion, d'Oliveira advocates for incorporating Rubocop as a valuable educational tool that can enhance both the quality and consistency of codebases while easing the onboarding process for new developers. The message is clear: by automating the checking of common errors and educating developers at the moment of writing code, teams can sustain high code standards without excessive human intervention, reducing bugs and leveraging collective knowledge effectively.

00:00:08.580 Welcome to my talk, 'Communicating with Cops'. In this entirely virtual conference, we will explore using Rubocop to communicate best practices and provide education to developers in an automated fashion.

00:00:15.129 My name is Kyle d'Oliveira, and I am from beautiful British Columbia, Canada. I've been working on Rails since the pre-1.0 days, and I've been hooked on Ruby on Rails not only because of how great it is to work with the language and framework, but also due to the amazing community.

00:00:28.130 Currently, I am a staff software developer at Clio, a 13-year-old SaaS company focused on transforming the practice of law for good. I work on their backend infrastructure team, which has three major areas of focus: scalability, approachability, and overall developer experience.

00:00:43.100 Back in the early days at Clio, when a developer was first on-boarded, we assigned them to a project called the 'Pet Project.' Clio is a platform that lawyers can use to manage their cases and contacts, alongside various other functionalities. We had a fake feature where new developers could implement functionality allowing lawyers to manage various pets as well.

00:01:09.590 The goal of this project was to familiarize developers with our codebase and align them on issues such as consistent styling and behavior traps. For instance, we often noticed that new developers would sometimes skip important elements like pagination, which allowed us to educate them through practical experience.

00:01:20.510 This project proved effective at acclimating developers to our codebase and getting them up to speed. We often received comments on pull requests like the one displayed here, where someone pointed out that it's easier to read code when there's a new line separating various contact blocks. Such comments, though small and concise, open the door for discussion and potential disagreements.

00:01:43.100 In other cases, developers might catch something that should be removed, such as a method that didn't contain any code. This observation is valuable; we should not be committing code that is unused as it complicates future work with these classes. However, the issue lies in the fact that neither of these comments truly addressed the actual behavior of the code.

00:02:01.970 The reviewer needs to provide enough context for the developer writing the pull request to understand the feedback, which can become quite labor-intensive and ultimately unnecessary. We observed that these pet projects often ended up as a dumping ground filled with various comments such as these. The intention was good since we aimed to use this as a teaching mechanism, but it was not uncommon to see hundreds of comments on a single pull request.

00:02:31.380 While you may not have faced a similar feature, I'm sure many of you have experienced situations where pull requests from new developers are filled with comments to align them on styling. As a developer writing that pull request, this can lead to information overload.

00:02:38.810 Imagine dealing with hundreds of pieces of feedback that you need to process and retain. Although all this information is aimed at educating and informing, when delivered in such a large dump, it leads to cognitive overload. What if we could provide this information to the developer while they were writing the code or before they committed it?

00:03:02.500 If we could do this, developers would have far more context about what's happening, making it much easier for them to process the information. Think about the reviewer’s experience as well—if they are constantly correcting style or common behavior traps, it can distract them from focusing on the actual behavior of the code.

00:03:24.799 For instance, in Ruby, did you know that you can call super without needing any parentheses? The reviewer here knew that and wanted to highlight it, but in this situation, parentheses were necessary because the number of arguments had changed. Imagine if all the styling and common issues could be automatically identified and corrected for the writer; the reviewer would no longer need to focus on those aspects and could concentrate on the actual behavior of the code.

00:03:51.210 This shift could lead to higher-quality code and result in fewer bugs. We need to find a better way to simplify this knowledge exchange and reduce the human element involved. There's no need to rely entirely on humans to catch these smaller errors, which can be error-prone and overwhelming.

00:04:12.650 Likewise, we don’t need to depend entirely on the tribal knowledge that developers gather over time. Instead, we can codify that knowledge and use it to educate developers at the moment it becomes relevant. By doing so, we can keep bad patterns and inconsistent styles out of the codebase permanently. Mistakes are inevitable, as evidenced by this code written by someone at Clio in early 2020, which contains a subtle bug.

00:04:42.880 While this test runs alongside a group of other tests and may execute in a specific order, it can cause those other tests to fail. However, I won't immediately point out what's wrong here. I've just discussed the need to minimize the human element in teaching these lessons, and simply indicating the issue relies on my tribal knowledge for education.

00:05:08.080 Let me illustrate how we can prevent this kind of issue from happening in the future. One effective approach to address this problem is through linters. Ruby has a fantastic linter called Rubocop, which benefits from extensive community support and offers powerful tools for static code analysis.

00:05:34.640 However, there are misconceptions surrounding what can be achieved with a linter that I want to clarify. When the topic of linting arises, it's common for people to think it's only about style or code layout. While Rubocop does provide built-in cops to manage these rules, such as highlighting empty else blocks, many of these do not influence the actual behavior of the code.

00:05:57.520 For instance, an empty else block does not impact the code's behavior, although it may affect readability and maintainability. Linters are capable of much more; they can handle complex behavior checks and provide education when a rule violation occurs.

00:06:12.400 Looking again at the dot positioning rule, one of the key features of Rubocop is its ability to auto-correct code. While not all cops provide this feature, many do, and I have found that the ability to auto-correct issues greatly helps overcome developer resistance to using linters, particularly those that are controversial.

00:06:38.500 When a linter can auto-correct code, it simplifies the process for developers, as they don’t have to do additional work. As I mentioned earlier, Rubocop transcends mere stylistic or layout changes. It can be used to perform security checks; for example, did you know there are certain methods in the llama class that pose security risks and could lead to remote code execution?

00:07:00.680 If you don't have a linter verifying this, then it could pose a considerable security risk during code reviews. However, having a linter in place alleviates this burden, reducing what developers and reviewers need to think about. As more of these rules are automated and enforced, they further simplify the development process.

00:07:27.850 Additionally, linters can monitor code for use of rescue from the general Exception class. Typically, this is not the right approach; instead, you should be rescuing from StandardError or a specific list of exceptions. I’ll elaborate on this later, but for example, many of you may have been involved in a Rails upgrade at some point. The community has developed cops to assist with this as well.

00:07:51.950 For instance, when upgrading from Rails 5.0 to 5.1, the method for passing parameters in controller tests changed from a reliance on argument position to using keyword parameters. This means that if you're undergoing an upgrade, you can not only add this cop to prevent the deprecated pattern's future usage but also leverage it to auto-correct existing instances.

00:08:13.350 This way, you can minimize work without losing any productivity. However, one of the most potent aspects of Rubocop, in my opinion, is its customizability. Anyone can create their own cops to filter out unnecessary warnings and provide just-in-time education for developers.

00:08:37.030 For instance, you can write a cop tailored specifically for your workplace, or create one that benefits the broader community. As previously mentioned, one of the standout elements of Ruby and Rails is the vibrant community. The community has not only supported Rubocop but has also developed numerous custom cops that can be immediately utilized by everyone.

00:09:09.170 Some individuals have created cops for performance analysis, helping protect against common performance pitfalls, alongside several Rails- and RSpec-specific cops. However, determining where to start when crafting a cop can be daunting.

00:09:35.890 So, let's take a deeper look at how Rubocop operates. Ruby profoundly supports metaprogramming, and under the hood, Rubocop analyzes Ruby code, generating an abstract syntax tree. It then provides hooks for executing Ruby code on various segments of that abstract syntax tree.

00:09:50.270 While I will be providing a high-level overview of abstract syntax trees and how they function in Ruby, there are plenty of fascinating concepts to discover regarding abstract syntax trees. For those interested in further learning, there’s a gem available called 'parser.' This tool, named Ruby Parse, takes arbitrary Ruby and converts it into its corresponding abstract syntax tree.

00:10:11.270 However, viewing this can appear complex and difficult to comprehend. Therefore, let's simplify the syntax using a couple of examples. Let’s revisit the dot position example in question: whether to use a leading or trailing dot. Here’s how Rubocop handles this internally.

00:10:41.480 We first examine the syntax tree generated by the code. The code is structured as 'something.method.' When observed in the syntax tree, this represents a method call (the send node). This send node has two or more children: the first is the receiver, along with the method name, which is our second child. If there are any arguments passed to the method, they follow as additional children.

00:11:00.530 Here, we have a method named 'method', which is the second child of the node. However, since we have no arguments, there are no further children. Now, let's explore the first child; in this case, it is also a method call represented by a send node. We continue this process recursively by analyzing the piece of code further.

00:11:27.250 We find that this method is being called with no receiver—it’s nil. Thus, we outline the complete abstract syntax tree corresponding to that single line of code. To guess how Rubocop will respond to this, we must watch for a send node being called on an object and verify the dot's position.

00:11:42.340 The Ruby code powering that rule demonstrates this concept. Rubocop provides an 'on_send' method triggered whenever the send node is encountered. Numerous such hooks exist in Rubocop. It enables us to write standard Ruby code to accomplish various tasks. We can examine the node and determine whether it features a dot or an ampersand dot, indicating it’s invoked on an object. We also ascertain if the dot is in the correct position; if it’s not, we raise an offense with an error message.

00:12:02.750 Upon executing Rubocop on the file, we will see the generated error. This is a relatively straightforward rule, consequently, the message is equally straightforward. It informs the developer that the dot should appear on the next line along with the method name, specifying the exact file and line where the error exists.

00:12:24.960 While the error message doesn’t clarify the reason why, this issue can be automatically corrected, making the rationale less critical. Developers can easily see where the error occurred without needing to resolve it manually. In fact, plugins can be set up to execute Rubocop within your editor, or you could make use of Git hooks, allowing for autocompletion to happen nearly simultaneous with code writing.

00:12:45.770 Now, let’s move on to an example that has slightly more complexity: examining whether we are rescuing from exceptions properly. In the previous example, we only dealt with a single line of code. So, let’s consider these five lines of code: it starts with a 'kw_begin' node. I assume 'kw' represents keyword, and there exist different 'begin' nodes; this is the one being rendered here.

00:13:09.460 This node has a single child. The child of the kW begin indicates the rescue keyword. Within the 'rescue' node, three children exist: the first child represents what occurs before the rescue, in our case, this line corresponds with a send node invoking 'do_something' on nil.

00:13:34.780 The second child represents the contents of the rescue block, depicted by the 'resbody' node, which too has three children. The first child is the list of exceptions we wish to rescue, represented here as an array of constants, in this case, 'exception' with no space in between. We only have one constant.

00:13:58.150 The next child of 'resbody' is the local variable name assigned for the exception—here, we're assigning that value to nil. The last child houses the code within the rescue itself: this is again a send node invoking a method called 'handle_exception' on nil. Lastly, I mentioned that 'rescue' holds three children; the final child pertains to the contents of the 'else' block, which, in this instance, is nil since we did not provide one.

00:14:22.970 This effectively shows what the syntax tree of those five lines of code looks like. To prevent rescuing from exceptions, we need to search for the 'resbody' node and inspect that list of exceptions, specifically its first child, to determine if any of them correspond to the class 'exception.'

00:14:50.350 Examining the Ruby code for the cop, it functions precisely as described: examining the children of 'resbody'. If it lacks any specified exceptions (is simply an empty rescue), that's acceptable. However, if any exceptions are present, we need to verify if any correspond to the class 'exception.' Should that be true, we can record an offense.

00:15:20.950 Executing Rubocop here shows us the error message, which can appear both from the command line or within our editor, providing timely feedback as we write the code. This message confirms what's wrong—indicating how to address it—though it still lacks some information. A developer unfamiliar with this problem would still need to research to comprehend why this is a problematic pattern.

00:15:48.370 If one reviews the Rubocop documentation for this rule, they may come across a link leading to a page explaining the risks associated with blind rescues. When you rescue exceptions indiscriminately, you may inadvertently rescue signal exceptions—causing the Ruby process to not terminate as expected. The goal here is to ensure developers are aware of these links early in the process.

00:16:19.510 Let's return to that initial bit of code I mentioned earlier, which had an identified problem. I now need to disclose what's wrong with this code. The issue lies in this specific line: we are assigning the value 'true' to 'permit_all_parameters' within our configuration.

00:16:37.780 By examining the 'permit_all_parameters' method, we recognize it to be a class accessor, which effectively serves as a global variable. Assigning a global variable within a test, as illustrated, can be acceptable, but the key takeaway is that we didn't reset it to its original value prior to the test conclusion. Thus, when that test runs, it could modify the global state, resulting in subtle bugs.

00:17:12.790 If we didn’t reset this, the correct code should look something like this: whenever we set a value to a global variable, we should be within a 'begin' block, remembering to store the original value. Furthermore, we must ensure to reset the original value after the test concludes, which might seem complicated initially, but it can be manageable.

00:17:29.420 Let's remember that Rubocop is simply Ruby architecture, and we can write tests for Ruby code effectively. It can feel somewhat meta—writing Ruby code for a class that scrutinizes Ruby code—but taking a TDD approach with this cop is indeed possible. First, we can start with boilerplate, then write a test stipulating that if someone assigns a value to a global variable, the custom cop code must view this as an offense.

00:18:03.060 Next, we need to explore what the cop should look like. As we identify that 'permit_all_parameters' is a method call, we can tie this into the 'on_send' method, implementing a guard clause dictating that this method must modify a global variable. This could involve maintaining a whitelisted array of methods deemed problematic. Moreover, we can ensure the code is nestled within a begin block.

00:18:28.360 While this doesn’t entirely define the rule, it sets our test path. Continuing to iterate, if the assignment within the begin block is inappropriate, we should enforce another test—thereby enclosing that assignment in a begin-end block and asserting it fails in this scenario.

00:18:53.130 Now, we can implement the expected behavior: why is it not sufficient for the assignment to be encapsulated within the begin block? This is because we have yet to cache the original value. Rubocop enables us to navigate through abstract syntax trees comprehensively; we can even access the local variable names.

00:19:15.730 For instance, if a local variable doesn’t exist or hasn’t been written, we can simply record an offense. Our tests should pass again, indicating we make progress but still do not cover the full behavior. So, what renders the assignment inside the begin block inappropriate if it caches the variable?

00:19:44.330 The critical aspect is that it doesn't reset the value at the end. We will need to outline a test incorporating all the necessary elements: we have a begin block, we're capturing the original value, and we possess an ensure block. We then have to determine what occurs if we don’t assign it back to its initial value, raising an offense if this is the case.

00:20:04.580 Currently, our rule appears functional, but let’s write one more piece to ensure it truly executes as anticipated. We’ll write a final test affirming we possess all these elements: a begin block, cached original value, an ensure block, and a reset of the original value in the end. Under these conditions, there should not be an offense.

00:20:30.450 Now, having created a fully functioning custom cop, we finally have control over the error message and can be extensive in our explanations. When a new developer encounters this issue in their code, they will have the linter active while they write it, receiving immediate feedback regarding what they wrote.

00:20:57.660 This capability is invaluable because it conveys that they modified a global variable, which could lead to transient spec failures if the original variable's state isn’t preserved. Given the full context, we are effectively alerting them to an issue they may not even be aware of, while this knowledge allows reviewers to pass over these problems in the future since it has now been codified into the patterns we're enforcing.

00:21:24.610 Crucially, enabling this check within our codebase allows other developers to enhance it further, making it applicable to similar situations or different methods. We routinely employ such cops at Clio, with many alternatives rolled out over time.

00:21:53.420 Some other examples include issues we’ve encountered, such as when someone created a constant that uses 'time_now' or 'two_days_from_now.’ This constant would have its value set upon booting and typically wouldn't match the developer's expectation for this to continuously equal two days from the present, leading to unexpected errors.

00:22:20.390 Somebody learned this lesson, and we can now prevent it from occurring again with a simple validation check, simultaneously providing context and education as the developer writes the code. We also have the potential to extend beyond code and apply this to the Ruby language overall.

00:22:46.570 For instance, we devised a custom rule ensuring that we adhere to Rails conventions when naming items. In one case, a developer made a typo when crafting a test file, mistakenly pluralizing the filename even though it belonged to a singular 'user' class nested within a managed module. We can employ a custom check to ensure that these file names comply with expectations.

00:23:14.180 Developers indeed wrote the test and have full context surrounding the error we are addressing. It's essential to remember that these rules don't bombard developers with all this information at once when review time arrives; rather, they provide context and education as close to the writing process as feasible.

00:23:48.030 At Clio, we established multiple custom cops and consistently develop new ones. For example, we have custom cops that educate on poor stubbing patterns in specs, deprecated classes we intend to retire, and warnings for methods that may no longer be recommended, advising on replacements. We can also ensure tmp files remain closed—alerting developers to the risks of leaving them open.

00:24:09.770 Through this proactive approach, we continuously enhance our codebase, addressing recurring issues with custom cops designed to prevent these patterns from arising again. Leveraging the linter grants us consistent styling throughout our application while guaranteeing that problematic behaviors are eliminated from our codebase.

00:24:33.580 Moreover, since reviewers no longer feel burdened to focus on inconsistencies, they can direct their attention toward assessing the quality of the code, enhancing the overall craftsmanship and leading to fewer bugs. We are able to codify all the valuable lessons learned over time.

00:25:03.870 Rather than relying on individual developers to catch and educate others, we can start amassing these insights into an automated system that checks for commonly seen issues. In essence, we can deliver tools that benefit the community, enhancing code quality for all.

00:25:27.350 Now, if you're interested in getting started with linter implementation, here are insights we’ve gathered at Clio that have effectively helped with developer buy-in. Firstly, Rubocop comes equipped with numerous rules; some may resonate with your organization, while others could be contentious.

00:25:48.700 You aren't compelled to enable every rule by default; in fact, you can choose to run custom code solely. You may begin with the rules that align best with your standards and gradually incorporate more as you see fit. Next, it's crucial to utilize auto-correct features where possible.

00:26:13.190 While I have not elaborated on integrating auto-correct with custom cops, this capability exists, and you could easily implement it to provide custom auto-correct behavior too. If the rules are enforced with auto-correct features, many conflicts over styling disappear; who cares if there's a leading or trailing dot when those details are automatically resolved?

00:26:40.640 It's essential to provide developers with feedback as rapidly as possible in as many locations as feasible. Encourage them to set up linting within their editors to receive instant feedback while writing code. If you leverage Git, establish a pre-commit hook that runs linting on modified or newly-added files; this way, developers are alerted as soon as they attempt to commit code.

00:27:01.990 While this method may be slightly slower than the immediate feedback provided in the editor, it still comes close enough to the writing process. Additionally, setting up a continuous integration server that conducts a full linting pass is advisable. At Clio, our extensive codebase allows such steps to be completed in parallel with our testing suite.

00:27:24.350 This setup guarantees that all the code in every pull request adheres to the expected rules. However, and perhaps most importantly, we should treat linting as a method for educating and training our developers. To gain buy-in within an organization, share how codifying development knowledge and delivering just-in-time education will mitigate bad patterns and uplift the skills of every developer.

00:27:47.340 Likely, they will see the inherent value in investing their time in a tool designed to improve all developers' skillsets while elevating overall code quality. Thank you.