Escaping The Tar Pit

by Ernesto Tagwerker

In the talk titled "Escaping The Tar Pit," Ernesto Tagwerker addresses the challenges of maintaining legacy software projects and the concept of software quality. He emphasizes the importance of assessing code quality through three main metrics: code coverage, complexity, and code smells.

Key points discussed include:
- Understanding Technical Debt: Tagwerker reflects on the difficulty of managing technical debt and how it can lead to projects being stuck in a "tar pit." He shares insights from his consulting experience with projects that often run over budget and sacrifice quality.
- Assessment Tools: The speaker introduces various tools for evaluating code quality:
- SimpleCov for measuring code coverage, which shows how much of the codebase is tested.
- Flog, Flay, and Ruby Critic for measuring code complexity and identifying code smells.
- The StinkScore: Tagwerker presents a new metric he developed, called the StinkScore, which combines metrics of code quality and coverage to prioritize refactoring efforts for complex files lacking adequate tests.
- Decision Framework: Before taking on a project, stakeholders are encouraged to assess how long it would take to ship features or fix bugs, alongside the project's technical debt.
- Improvement Strategies: Aimed at reducing technical debt, Tagwerker suggests removing unused files, refactoring problematic code, and increasing test coverage. He highlights the importance of tracking the StinkScore over time to understand progress regarding technical debt.

In concluding his talk, Tagwerker encourages the open-source community to engage in discussions about software quality and to use tools like the StinkScore as a compass for improving maintainability within codebases. He invites feedback on his projects and emphasizes the importance of shared knowledge in tackling technical challenges in software development.

00:00:00.120 Whoo! So, our next speaker, Ernesto Tagwerker, has a great surname. I love German words, especially the really long ones, but 'Tagwerker' is a great name. I know it means 'day worker.' According to the internet, there are only 443 people with this surname, and we have one on stage today. Ernesto is a prolific Ruby blogger; I've read some of your posts. He's also the founder of Ombu Labs and Fast Ruby IO, and he's an open-source maintainer. Today, he's going to show us how to escape the tar pit. So, welcome Ernesto!

00:01:02.309 Thank you very much! My name is Ernesto Tagwerker, and you can find me on Twitter and GitHub with the handle @tagwerker. This is my very first time in Australia, and in RubyConf Australia, so thanks a lot for having me! Like Mel said, I am originally from Argentina. I am actually one of three Tagwerkers in Argentina; which is weird because I know all of them. You might hear some funny words since Spanish is my native language, but I've been living in Philadelphia for the past three years, so you'll probably hear an American accent, which is also a bit odd.

00:01:36.000 I love open source! I wouldn't be here speaking in front of you today if it wasn't for a lot of other people who published many gems that I use on a daily basis. I maintain a few gems that you might have heard of: 'database_cleaner' is a library that helps you keep a clean state between test runs; 'bundler_leak' is a gem that you can use to find gems known to have memory leaks in your application; and 'next_rails' is a toolkit that assists you in upgrading to the next version of Rails. You can perform tasks like dual-booting your application and finding incompatibilities with the next version of Rails.

00:02:06.970 If you're interested in any of those projects, feel free to check out my GitHub page. I am the founder of a small software development shop called Ombu Labs. We love to work with Rails, Ruby, JavaScript, and we try to contribute as much as possible back to the community, which is how these projects came to be.

00:02:52.840 A few years ago, we found ourselves working on a lot of Rails upgrade projects and decided to launch a service dedicated to these upgrades, which we called Fast Ruby IO. Every month, we assess technical debt in various projects we've never heard of before. We work a lot with legacy applications that are running outdated versions of Rails and really need our help to get them to the next version. We need to quickly assess code quality and technical debt in order to decide whether we want to take on a project or not. We don’t take on every project; that would be crazy! We also don’t want to get stuck in the tar pit because some of these projects have a lot of technical debt.

00:03:25.240 A lot of insights that I’m going to share with you today are from this service and our process. The inspiration for this talk comes from one of my favorite books of all time, 'The Mythical Man-Month' by Fred Brooks. In the very first chapter, he discusses the tar pit: a prehistoric scene with beasts getting entangled in the tar pit, trying to escape but only becoming more stuck. Here's a great quote from the book: 'Large system programming has, over the past decade, been in such a tar pit.' He goes on to state that 'the fiercer the struggle, the more entangling the tar.' No beast is so strong or skillful that they ultimately do not sink.

00:04:04.420 What I find fascinating is that this book was published in the 1970s, referring to the 1960s. To this day, we still have trouble maintaining and working with legacy projects. Fred Brooks also published an essay called 'No Silver Bullet,' which I highly recommend. In it, he discusses our struggles as an industry. I believe that we are often in one of two states: either we are in the tar pit, or we are trying to avoid it. Sometimes, while in the tar pit, we try so hard to get out that all our efforts resemble comical struggles.

00:04:54.160 Let's come back to reality—how does this issue show up in our day-to-day work? One example is projects running over budget. For instance, saying 'Yes, we’ll ship this project in three months,' then six months in, we still have no idea when we will be done. Another case is taking a long time to ship small changes; for example, saying 'Sure, I can do that in two days,' but then two weeks pass, and I still don’t know when I’ll finish. Lastly, there's the issue of sacrificing quality and increasing technical debt with the intention of paying it off later, which we all know rarely happens.

00:05:16.820 In our case, we often just switch jobs to find a new project. I decided to split this talk into two parts. The first part addresses how we can avoid getting into a project in the first place—how can we decide 'yes, we will take on this project' or 'no, there's no way I will take on this project.' The second part will cover what to do if we can't refuse the project. We're stuck with it, and we need to find ways to gradually and incrementally pay off technical debt in a controlled manner.

00:05:59.140 Let's get started. Next week, your boss might approach you about taking on a legacy project, asking if you could maintain it from now on. As a consultancy, it often looks similar when a client tells us about their project and how their previous developer had to leave for another engagement. Before you commit to something, you should consider this question: How long will it take to ship this feature? How long will it take to patch this bug? And be aware that the next words out of your boss's mouth will probably be, ‘because I need it today,’ which always brings urgency to the situation.

00:06:41.420 So, how can we quickly assess technical debt or code quality before we take on a project or commit to a timeline? The good news is that we have two options. There are paid services available like Code Climate and CodeScene, along with open-source gems that you can use. I'm not here to sell you on the paid services, but I will talk about open-source gems, because even some paid services utilize them.

00:07:01.220 These gems can perform static code analysis by reading every statement in your application and telling you how complex it really is. They will also calculate code coverage, which won’t tell you if your tests are good or bad, but will inform you how many statements in your application are actually covered by tests. Additionally, these tools can identify code smells, comparing your codebase to known code smell patterns and informing you how many code smells exist in a specific file. This information gives a clearer picture of software quality.

00:07:45.750 What is software quality, anyway? The truth is that there are books written on this topic, and many definitions of quality models. However, here’s a small rant: just because there are hundreds of definitions out there doesn’t mean you can throw your arms up and say, 'It’s too complicated, we can’t do it.' Instead, you can define your own baseline for software quality. There are many tools in our Ruby community that can be configured to your preferences and then utilized as your baseline.

00:08:30.890 Returning to definitions, the IEEE defines quality as this: it works as expected. If I were to translate this into simpler terms, it would mean that it functions as intended. But it's not that simple. We don’t want to maintain things that just work as expected, because we know that underneath there might be a mess. My definition of software quality combines the idea that it works as expected and is not a pain to maintain.

00:09:03.340 When we examine the ISO 9126-1 software quality model, the focus of this talk will be on maintainability, which involves two key factors: code coverage and code quality. First, we need to assess code coverage—does the application have any tests? If it does, how many statements in the application are covered by automated tests? I emphasize this point because it greatly affects the next one: code quality.

00:09:51.070 We can judge the quality of code and certainly refactor it, but we’re not going to refactor untested code. We could, but that would require us to manually test that everything works, which is not efficient. The good news is that in our wonderful Ruby community, we have many tools to assist us with these two metrics.

00:10:17.880 For code coverage, we have SimpleCov, which can be easily installed. You can load it into your test suite prior to running your tests. It generates an HTML report displaying how much of your codebase is actually covered by your tests. By itself, this provides a data point—like knowing that 82% of your code is covered by tests, which is a good indication. However, we want to further gauge the quality of the project before taking it on.

00:11:06.820 To analyze code quality, we have several options, including Flog, Flay, Ruby Critic, and others. Some of these tools are abandoned, while others are actively maintained. For this talk, I chose to use Ruby Critic. It’s interesting because it combines several tools, including Flog, Reek, SimpleCov, and more, and is currently an active project with ongoing development.

00:11:36.340 Ruby Critic considers two main concepts: churn and complexity. Churn indicates how many times a file has changed over a set period, utilizing Git to track this. For example, we can see how many times a specific file has been updated since the project began. While knowing change frequency is useful, it's even more insightful when combined with complexity measurements.

00:12:15.040 Complexity is determined by Flog, a tool developed by Ryan Davis, that runs or reads every line in your application, assigning a complexity score based on various factors. For instance, an assignment operation carries a score of 1, while an eval operation scores 6. The overall complexity score represents the sum of these individual statement values.

00:12:56.700 Once you combine churn and complexity data, you can create insights regarding the health of the code. Ruby Critic generates a report that helps you visualize this information. The x-axis represents churn, while the y-axis represents complexity, allowing you to assess files where the two metrics intersect.

00:13:38.320 To run Ruby Critic, you can install it simply and then execute it on your project. The HTML report generated will include grading for each file in your application from 0% to 100% or from F to A. Upon reviewing these results, you might notice that a percentage of your files are graded D or F, indicating the need for attention.

00:14:04.840 Ruby Critic also takes churn and complexity into account, calculating the maintenance cost of individual files. Through this analysis, you can begin to prioritize which files require refactoring based on their technical debt.

00:14:31.130 Michael Feathers and Sandi Metz have written extensively about the correlation between churn and complexity. Understanding this relationship allows you to group files into four quadrants, focusing on where you want most of your files to be: changing minimally while remaining simple.

00:15:08.240 The upper left quadrant contains files that aren’t often changed but could become problematic, as working on complex files might slow further development. Conversely, the lower right quadrant, filled with simpler files that change often, might not be your priority, as they are easy to manage.

00:15:48.560 The upper right quadrant signifies the most concerning files—those that are frequently changing but are highly complex, suggesting a need for refactoring due to managing high-tech debt. At a glance, you can determine whether or not to take on a new project based on where the majority of code resides in the quadrants.

00:16:25.720 To summarize, we utilize SimpleCov for code coverage and Ruby Critic for code quality, yielding insights about the same code base. We had a thought process at Fast Ruby IO: What if we could combine these two metrics? Thus, we developed a new metric called StinkScore.

00:17:20.310 The StinkScore is a function of code quality and code coverage, aiming to prioritize complex files that lack tests. To clarify this concept, let's imagine two functions, Foo and Bar, each rated with complexity and smells of ten points. Initially, we could perceive them as equally 'stinky.' However, when you consider code coverage, it becomes clear that if Foo has 0% coverage and Bar has 100%, the StinkScore dramatically shifts, emphasizing that Foo is much 'stinkier'.

00:18:01.320 When it comes to addressing these issues, we need to automate the analysis so it isn’t done manually. My company previously did this manually, and it was a tedious process. We created a tool called Skunk to streamline this process, which utilizes multiple concepts from Ruby Critic and generates an initial command-line table report of files ranked from the stinkiest to the cleanest.

00:18:42.510 As more features are added to Skunk, the StinkScore total will reflect the cumulative complexity of code files. This provides a way to track the average stink score, comparing the average score for each module, which will allow teams to see progress over time. The StinkScore factors in both code smells and complexity while accounting for code coverage.

00:19:11.450 Three key metrics should be evaluated: the maintenance cost of a file, the number of known code smells, and whether the file has adequate coverage. A crucial aspect of measuring this is using Ruby Critic’s cost method, which adds penalty factors to the scores.

00:20:05.600 The penalty factor is determined based on coverage; if a file has perfect coverage, the stink score equals its cost, whereas a file with 20% test coverage incurs an 80-point penalty. This methodology gives us three meaningful metrics.

00:20:41.090 Now that we have a clearer picture of the StinkScore average, we can identify where we stand with regards to our code base. This kind of insight allows us to engage in meaningful discussions about our current debt and plan ways to resolve or pay it off incrementally. There are several approaches we can take, including removing files, refactoring, and increasing coverage.

00:21:25.750 One way to reduce technical debt is to remove unused files. For this, we can use another tool called Coverband. This tool enables you to run it in production for a week, enabling you to see code usage data and identify unused or dead code that can be removed.

00:21:44.050 Another method involves refactoring files, grading candidates based on the churn vs. complexity graph. However, the StinkScore provides the clarity needed when considering which to prioritize. For example, if a file currently has 60% coverage, refactoring it can yield immediate feedback on the build to ensure you don’t break anything.

00:22:36.080 Increasing coverage is also vital; by writing tests for files with 0% coverage, we can reduce the penalty applied to the StinkScore. While there are files intended for testing purpose might be excluded, ones like 'rake_tasks.rb' can be improved by adding coverage to enhance overall project quality.

00:23:18.950 To recap, maintainability is contingent upon code coverage, complexity, and code smells. The StinkScore is an innovative way to talk about technical debt. Although many words are used to describe technical debt like, 'I’ve paid off some debt' or 'I’ve paid off a little,' this StinkScore provides a standardized approach.

00:23:50.400 The truth is there are algorithms out there to measure these qualities but not many are open source. Tools like Code Climate quantify technical debt costs, but they lack open-source flexibility. The StinkScore is open source, allowing for transparency and collaboration, thereby fostering healthier discussions on technical debt.

00:24:38.680 Instead of vaguely claiming to reduce technical debt, you could specify, 'I’ve improved the debt by x%.' Conversations around potential projects could involve citing a project’s StinkScore average, promoting transparency on code quality expectations across all projects.

00:25:20.020 The tools in this ecosystem are designed for easy configuration, allowing stakeholders to establish their own baselines for software quality and metrics fitting their conventions. Remember, this tool can be misused if not approached properly—advocating for positive change is the true utility.

00:26:01.860 I’d like to think of the StinkScore as your compass out of the tar pit. Consistently tracking the metric also ensures you know whether your actions are making things better or worse. If you notice a steady rise in the average stink score, it might be worth discussing team culture.

00:26:45.480 I want to thank the organizers of RubyConf and Fast Ruby IO for sponsoring my visit here. I constantly seek feedback regarding my initiatives. If you have comments or want to try the tools on your project, it would be greatly appreciated. You can check it on GitHub. If questions arise after the conference, feel free to contact me on Twitter or come speak with me afterward, I also have stickers to give away.

00:27:34.580 Thank you very much!

RubyConf AU 2020