Escaping The Tar Pit

Talks

Ernesto Tagwerker

#software-maintenance

#refactoring

#static-code-analysis

Escaping The Tar Pit

by Ernesto Tagwerker

In his talk "Escaping The Tar Pit" at RubyConf 2019, Ernesto Tagwerker addresses the issue of managing technical debt and preventing projects from devolving into excessively complicated and unmanageable states, likened to a "tar pit." The session aims to equip developers with tools and strategies to assess and improve code quality, thereby escaping the challenges of inheriting problematic projects.

Key Points Discussed:

- Introduction to Technical Debt: Tagwerker begins by defining the tar pit as a metaphor for projects that are difficult to manage due to accumulated technical debt, which often leads to performance issues and budget overruns.

- Open Source Contributions: He shares his experience with open-source projects, highlighting gems like Database Cleaner, Bundler Leak, and NextRails, which help maintain code quality.

- Assessment Methodology: The talk outlines approaches for quickly assessing code quality, emphasizing the importance of tools like SimpleCov for code coverage and RubyCritic for quality assessment. Tagwerker distinguishes between code complexity and coverage as critical metrics.

- The Stink Score: A notable concept introduced is the "stink score," which combines metrics of code quality and coverage to quantify and prioritize files requiring immediate attention. This metric aids in identifying which parts of the code are foremost in need of refactoring.

- Practical Strategies to Address Technical Debt: Tagwerker presents methods to manage technical debt, such as eliminating unused files via Coverband, refactoring complex files, and enhancing test coverage for critical components.

- Continuous Improvement: He discusses the importance of monitoring and communicating progress in addressing technical debt, through tools like Skunk, which calculates and compares stink scores over time.

- Final Thoughts: In conclusion, Tagwerker emphasizes the perspective shift required from simply maintaining legacy projects to actively improving code quality and reducing technical debt.

Conclusions and Takeaways:

- The effective management of technical debt is crucial in software development and requires a structured approach.

- Tools that assess code quality can significantly aid in prioritizing refactoring efforts and improving the overall maintainability of a project.

- Successful navigation out of the tar pit involves continuous assessment, collaborative communication, and a commitment to improving software quality.

- Tagwerker encourages developers not to weaponize these tools but to foster a collaborative environment focused on quality improvement.

00:00:12.200 Well, welcome back. Here to talk about the tar pit: how to avoid it in the first place and, if you're stuck in it, how to get out of it. My name is Ernesto Tagwerker, and you can find me on Twitter, GitHub, or anywhere with the handle @tagwerker. My pronouns are he/him/his. I'm honored to be here in the code quality track.

00:00:29.850 I'm originally from Argentina, so if you hear an accent or a weird word, it's because English is not my first language. I’ll try to do my best. I have been living in Philadelphia for the past three years with my wife and daughter, and I love open source. I wouldn't be here today if it weren't for open source. A lot of the code that I'm going to talk about was written by other people; maybe some of you out there contributed to it. I wrote only about 5% of the code that I'm going to discuss today, and it's basically just gluing together other open-source libraries.

00:01:02.370 I am the founder of a small software development shop called Humble Labs. We focus on working with Ruby on Rails, JavaScript, and open source. We try to give back to the community as much as possible through a bunch of gems that you might have heard of. A few years ago, we found ourselves working a lot with Rails upgrades, and we thought, why not make a service out of this? That's how FastRuby.io was born. It's basically a productized service that helps companies upgrade their Rails applications in just a few weeks.

00:01:30.149 Now, the problem is that we get a ton of projects that are not ready for an upgrade. So a lot of the insights I'm going to share with you today come from our experience doing Rails upgrades, but also from assessing code quality very quickly. We spend one week diving into our clients' projects, and based on that assessment, we decide whether to take them on or not. We'll never do a Rails upgrade for a project that has no code coverage, so that's one of the things to keep in mind.

00:02:10.229 When I'm not working on Humble Labs or Fast Ruby, I like to maintain a few gems. Database Cleaner is a tool that helps you clean your database between test runs. Bundler Leak is a gem that lets you find leaky dependencies in your application. There's a known database of leaky dependencies out there, and you might be using one of them; Bundler Leak can help you with that.

00:02:30.149 NextRails is a toolkit that helps you find incompatibilities between your current version of Rails and your next version, assisting you with tasks like dual booting.

00:03:00.989 The inspiration for this talk comes from a book published many years ago called "The Mythical Man-Month." The very first chapter talks about the tar pit. Fred Brooks describes a prehistoric scene where beasts would get stuck in tar pits and struggle to escape. I love this quote: 'The fiercer the struggle, the more entangling the tar.' No beast is so strong or skillful that it ultimately doesn't sink.

00:03:42.209 He goes on to discuss large system programming, and to be honest, this book was published in 1975, and today we still face the same issues discussed within it—the mainframe problems, and so on. You can replace 'large system programming' with 'cloud computing' or 'minimum viable product development.' Fred Brooks also authored an essay called 'No Silver Bullet,' which I highly recommend. In it, he discusses how our problems aren't solved by one technology; our software engineering problems are more about processes, communication, teams, and people.

00:04:34.689 So returning to the tar pit, I believe we're in one of two states: we're either in the tar pit right now, or we're trying to avoid it. If we are in the tar pit, we're attempting to get out. We've all inherited projects that sucked, thinking, 'Yeah, I'm going to pay off this technical debt, and it’s going to be awesome!' But then you dive into the project, start refactoring, and the more you refactor, the harder it gets.

00:05:15.910 To bring this back to a modern context, I often reference The Simpsons as a gift for this moment. But returning to reality, the tar pit sometimes manifests itself in projects that are running over budget. You might say you'll ship something in a month, and then six months later, you have no idea when it will be done. Small changes seem to take forever to ship, and you end up Sacrificing quality while increasing technical debt to be paid off later, yet later never comes, and the tar pit just becomes stickier and stickier.

00:05:51.810 For example, you could find yourself in a situation where your boss says, 'Yeah, could you come in on this great legacy project and maintain it from now on? That would be great.' It's often a client with a Rails project that’s totally out of date, needing an upgrade.

00:06:07.610 The problem is that if you agree to maintain this project, the next question you'll get is, 'How long will it take to ship this small change?' Of course, the follow-up will be, 'Because I need it by the end of day today.' You'll have to quickly assess code quality before committing to maintaining this project. Sometimes you can say no, but other times you're forced to maintain it, which can be quite challenging.

00:06:39.099 Part two will build on top of this and will use some of the tools that I’m going to discuss in the first part of the presentation. I’ll address the question: how can we gradually pay off this technical debt?

00:07:04.800 Alright, let's get started. Imagine next week at work, your boss comes to you and says, 'Yeah, Carl quit.' He's the only one who knew how to maintain the legacy system. 'It can't be that hard,' they say, as you end up stuck in the tar pit.

00:07:24.280 In our industry, it looks like this: a Ruby project, right? Ruby was designed for programmer happiness, so it should make you happy. Well, yes, that was the vision, but then we encountered some situations that don't inspire joy. How can we quickly assess code quality? Sure, there are a bunch of ways to do it. The main ones are: you can pay someone to do it for you, use tools like Code Climate, or you can use a free and open-source gem.

00:08:00.800 For the sake of this talk, I’m going to focus on open-source gems, because even the paid services use some of these gems to generate scores for applications. These gems offer various functions, one of which is static code analysis. They read every single line of code in your application and assign a complexity score to each file, allowing you to identify how complex the project is. Additionally, you can determine your test coverage, which tells you what proportion of your code base your test suite exercises.

00:08:33.970 Finally, there are tools that recognize all the known code smells in the world. They compare a database of code smells with your code base to evaluate how smelly your code is. By doing so, you gain a comprehensive picture of software quality within your project. So, what is software quality anyway? Unfortunately, there are countless definitions for this concept scattered across books and models, and I will focus on a few key definitions.

00:09:37.390 The first definition is from the IEEE, stating: 'It's the degree to which a system, component, or process meets implicit and explicit requirements.' This definition is accurate, but when I need to maintain the project, it sounds more like, 'It works as expected.' To me, software quality also includes the criteria that it’s not a pain to maintain.

00:10:02.070 Next, we have the ISO 9126-1 definition, which mentions five different aspects of software quality, particularly maintainability, as that's what we are focused on. In terms of maintainability, I like to emphasize two aspects, and the order is crucial: the first aspect is code coverage—how many statements in my application are exercised by my test suite. This is important because it affects the second aspect: code quality.

00:10:39.320 I will identify complex files and smelly files while I refactor, and it’s essential to have tests to ensure that the behavior remains consistent as I improve code quality. The good news is that in Ruby, we have several excellent tools available that are very easy to set up. For code coverage, we have SimpleCov. SimpleCov will let us know how many statements are executed by our test suite.

00:11:26.760 However, the issue is that it won't indicate whether the tests are good or bad; it only indicates if the statements are executed, which is better than nothing. You can set it up in your project to receive output, generating a visually appealing HTML report that tells you, for instance, that 82% of the project's statements are exercised by the test suite. This signal provides insight about the project’s state.

00:12:09.810 For code quality assessment, it becomes more complex. There are many tools out there like flog, Flay, reek, metric_fu, and RubyCritic, among others. For this discussion, I decided to showcase RubyCritic. I like it because it’s currently maintained, active, and it utilizes other gems like flog, Flay, reek, and SimpleCov.

00:12:54.520 Now, two concepts that RubyCritic uses are churn and complexity. Churn is interesting; it tells us how many times a file has been changed since the project's inception. For this, we can use Git to determine how often a file has been committed. This metric itself isn’t particularly enlightening; it shows which files have changed the most, and while there’s some value in that, it becomes more interesting when combined with complexity.

00:13:37.140 Complexity, as calculated by RubyCritic, is determined by flog, a gem developed by Ryan Davis that has been around for over ten years. You can use it by running a simple command. Flog assigns a numeric value to various operations and statements within methods, producing a complexity score for the file. By examining this score, we can ascertain whether a file is complex. But when churn is factored in, things become more interesting.

00:14:36.610 Michael Feathers has written extensively on this concept. Moreover, Sandi Metz wrote an insightful article explaining a graph with the y-axis representing complexity and the x-axis representing churn. If a file changed only twice but had a complexity score of 100, it would be positioned in the upper left quadrant. Conversely, if it had changed 27 times but had a complexity score of 100, it would reside in the lower right quadrant.

00:15:14.370 This assessment itself isn’t particularly revealing. But when you focus on all the files, it yields remarkable insights. For instance, RubyCritic can generate reports that provide visual representations of your project’s overall state.

00:15:54.760 We can use a GPA pie graph where every file in the project gets graded, from A to F. For those unfamiliar with the American grading system, think of it as being from 100% to 0%. The GPA is calculated using metrics from reek for code smells and flog for file complexity, merging all that data into a single score. Looking at these scores, you might find that 60% of the project is graded either a B or an A, but you also might discover that 25% falls into the D or F range, indicating that there's still uncertainty regarding the project's quality.

00:16:49.090 On the other side of the analysis, if you examine the churn versus complexity graph, every single file in the project gets plotted. At first glance, this seems like an overload of information, making it difficult to interpret objectively. It helps to visualize dividing the graph into four quadrants, based on a theoretical asymptote. The best quadrant, the 'good' quadrant, is where you want your files to be—those with low churn and low complexity.

00:17:32.020 In the upper left quadrant, you find files that are complex but haven’t changed much; they're probably functional and not a priority since the changes they might necessitate won’t cost much. The lower right quadrant features files that change frequently, but are simple and therefore not significant priorities for refactoring. Things get most interesting in the upper right quadrant, where modules are both complex and change frequently. Michael Feathers notes that sometimes a class becomes so complex that refactoring appears daunting.

00:18:08.680 You realize the need to refactor, but every time you attempt it, you lose too much time, and it can often feel unattainable in a single sprint. This visual analysis can help you ascertain whether you're stepping into a tar pit, whether it’s a dumpster fire, or if it’s a genuinely maintainable project. At times, you might glance at that graph and conclude, 'No, thank you, I don't want to work on this project.'

00:19:10.670 So far, we have introduced several metrics: code quality and code coverage. These serve as signals regarding the state of your project. You have one signal indicating code coverage, and another signal assessing complexity. Considering they're analyzing the same code base, quickly combining them into a single graph could amplify insights derived from your code.

00:19:50.440 I like to call this the stink score. Of course, the higher the stink score, the worse the file. This stink score stems from code quality and code coverage, while code quality comprises both complexity and various code smells. The idea behind this metric is to indicate that files lacking tests should be penalized, emphasizing those files that are especially complex and have insufficient coverage.

00:20:50.440 Let’s say we have a file, foo.rb, with its complexity and its respective smells. If we multiply the complexity score by its smell points, we can evaluate how 'stinky' the file is. For instance, suppose we have two files, foo.rb and bar.rb, both exhibiting a hundred smell points. Thus, at face value, we might consider them equally stinky. However, if we further analyze their code coverage, where foo.rb has 0% coverage and bar.rb has perfect coverage, we can derive an accurate stink score.

00:21:30.570 This informs us that foo.rb is 'stinkier' than bar.rb, establishing a decisive conclusion. Of course, while this comparative approach can be tedious when done manually, we've devised a tool to automate this process. Enter skunk. This is a stink score calculator designed to provide quantitative assessments regarding file quality.

00:22:34.070 Simply put, it utilizes RubyCritic and integrates many of the metrics already mentioned, such as SimpleCov. You can install it easily, and although the current version isn’t very user-friendly, the key advantage is that it generates a sorted list of files from the stinkiest to the least stinky. The reports highlight the files needing immediate attention.

00:23:37.420 While the stink score could be particularly useful in visualizing the project's health over time, it also offers a summary of total scores across the project's components. It's worth noting that as more features are added, the stink score tends to increase. However, the distinct score average serves as a key measure of success, revealing the overall quality of your project by correlating the total stink score with the number of modules.

00:24:15.600 Nonetheless, bear in mind that skunk is still a work in progress. A few caveats include that you need to have your tests run with coverage set to true before running skunk. Skunk will essentially load the JSON file of coverage results from SimpleCov to determine which files are covered and which ones are not.

00:25:11.860 Feedback on this tool is welcomed, as my aim is to ignite a conversation surrounding code quality and the efforts we undertake to manage technical debt. Alright, so now we have three key metrics: the maintainability metrics of code coverage, quality, and the distinct score. The good news is that by understanding where we stand, we also understand where we need to go regarding technical debt.

00:25:50.450 So, part two delves into utilizing knowledge gained from part one to effectively pay off technical debt. Picture yourself in a scenario where you can’t turn down a project because you appreciate your job and the associated benefits, but now you must balance shipping features, patches, and improvements to code quality. Where do you begin towards this goal?

00:27:01.100 One effective approach is to identify and eliminate unused files. We all love that, right? A valuable tool in this process is a gem called Coverband, which you can install and run in production to track every code statement executed by actual user interactions, generating a report that indicates everything in use.

00:27:36.920 The report generated by Coverband, similar to SimpleCov, will help you identify files or methods that are no longer utilized. Another method to reduce your stink score involves refactoring complex files. This is a common task in any project, and as you tackle it, it’s essential to analyze the complexity. We typically have many candidates for refactoring, but you might not know where to start.

00:28:15.230 This is where assessing churn against complexity falls short in indicating where to prioritize. However, utilizing the stink score can assist greatly. You can scrutinize all files, pinpoint those with sufficient code coverage—generally, files with 60% or more coverage are ideal candidates for refactoring.

00:28:49.640 Let’s take the example of gate.rb, which offers 60% code coverage and is currently among the stinkiest files. Once you identify such files, it’s time to conduct a thorough analysis of their content, looking for methods to refactor. You can achieve this by splitting files with multiple responsibilities; for instance, separating data fetching from statistics calculations into distinct files.

00:29:37.439 This process will lead you to pay off technical debt, but how do you quantify your improvements? One challenge with tackling technical debt is effectively communicating progress. It’s essential to articulate this in numerical values, as this communicates the impact of your refactoring efforts.

00:30:14.350 Fortunately, Skunk accounts for this by enabling comparisons between your branch and a standard branch—essentially comparing stink scores to illustrate improvements in your project. Consequently, you can confidently state, 'Look, in this sprint, I improved the file by 10%.'

00:30:49.750 Another effective method for addressing technical debt is to focus on developing tests. Writing tests helps to reduce the penalty factor associated with each file. You can examine the score table once again, identifying files currently at 0% coverage to develop testing for them.

00:31:31.460 While the first two files will typically be test files (which shouldn't be there), the third will often be a frequently invoked method that you can implement coverage for. It is important to have the roadmap laid out for this, as it is crucial for identifying what is included in the coverage.

00:32:18.780 It would be tremendously beneficial to establish a system that aids in communicating this data clearly. The key takeaway is that you now have a distinct average score to monitor. You can run reports periodically—every few weeks—to establish a trending line, thereby identifying any technical debt that seems to be escalating.

00:33:07.090 The primary goal of this presentation has been to discuss measuring technical debt while also addressing the progress made in mitigating it. Communicating the percentage of technical debt addressed provides valuable context to non-technical stakeholders.

00:33:46.830 If you're looking for alternatives, you can use SimpleCov and RubyCritic independently to assess the overall quality of code in your project. Please remember, however, the intent is not to weaponize these tools against others. It's essential to foster a collaborative environment instead of encouraging blame.

00:34:12.790 Ultimately, this tool is intended to serve as your compass guiding you out of the tar pit. It can offer numeric values showing whether you're genuinely making progress or heading in the wrong direction. Thank you so much for your attention! If you have any ideas or questions you'd like to discuss, feel free to engage with me about the stink score and skunk project. A big thank you to Humble Labs for sponsoring my appearance here. We are hiring and fully remote, so feel free to check us out! And if you want to reach out via Twitter, I’m always eager to hear more ideas. Thank you.

RubyConf 2019