Down The Rabbit Hole: An Adventure in Legacy Code

by Loren Crawford

In the talk "Down The Rabbit Hole: An Adventure in Legacy Code" at RailsConf 2018, Loren Crawford explores the complexities and challenges of working with legacy code while offering practical strategies for improvement. The main theme revolves around understanding legacy code, its pain points, and effective methods for enhancing it.

Key points discussed include:
- Definition of Legacy Code: Crawford defines legacy code as working software that is difficult to change and often contains complex business logic written by various developers over time. Legacy code, while cumbersome, often signifies the success of a company.

- When Not to Change Legacy Code: Highlighting Sandi Metz's advice, Crawford suggests that certain pieces of code, if seldom accessed, may not merit changes due to the risks of introducing bugs.
- Reasons to Change Legacy Code: Using Michael Feathers' framework, the speaker outlines four primary reasons for altering legacy code: adding features, fixing bugs, optimizing performance, and refactoring.
- Add Tests Before Changes: Crawford advocates for adding tests before modifying existing code to ensure current functionality remains intact. He emphasizes that tests serve as documentation and help prevent regressions.
- Incremental Improvements: The talk underscores the importance of incremental improvements to legacy code by integrating enhancements while addressing new functionalities, rather than setting aside time for large refactoring efforts.
- Testing and Bug Fixing: Crawford shares methods for writing tests to replicate bugs and ensure that changes made to the codebase do not disrupt existing functionalities.

Throughout the talk, Crawford shares practical examples, such as an online yarn marketplace where new features and bug fixes are addressed through proper testing strategies. He illustrates how to add tests for searching specific yarn products and handling unexpected inputs.

In conclusion, the key takeaways from Crawford’s presentation include:
- The value of understanding and embracing legacy code as a part of successful software development.
- The necessity of writing tests for untested code to enhance knowledge and documentation of the codebase.
- The importance of making incremental improvements to legacy code, rather than attempting to refactor everything at once.
- Encouragement to seek help and foster collaborative efforts within development teams for better handling of legacy code.

Crawford recommends several resources for further reading, including works by Michael Feathers and Martin Fowler, to support ongoing education about legacy code management.

00:00:11 Okay, thank you! I want to take a little bit of time to talk to you about myself. I'm a software engineer at MailChimp in Atlanta, Georgia. Throughout my career, I have developed a love for refactoring and testing, specifically test-driven development. I also have a wonderful dog, Luna, whose pictures I can't help but share. If you want to reach me on the internet, you can find me on Twitter at @elcrawfish.

00:00:44 So, what are we going to talk about today? You probably saw that I was going to discuss legacy code, and I’m not tricking you! We will talk about what's up with legacy code and what I define as legacy code. I'll cover some pain points illustrated by adding a feature to some legacy code I built and fixing a bug. After that, I’ll review some lessons I hope you learn today and give you some other recommendations.

00:01:18 Let's begin with the question: what's up with legacy code? Can I see some hands in the air if you currently work with legacy code? Oh yeah, have you ever felt like Ron Swanson? I certainly have. It seems like many of us work on legacy code, and we all encounter those frustrating moments where we don't know why something is happening or what's even broken.

00:01:34 But I want to start by discussing why legacy code isn't that bad. In the early stages of my career, I would look at some code I was working on and immediately think, 'Who wrote this?' I would whisper to myself, 'Who is this guy?' I had a bad connotation about the people who created this horrible code. However, as I’ve grown, I've come to realize that the developers before me operated under different constraints. They had different information and faced tighter time constraints than I do now. Generally, they were trying to do their best with what they had.

00:02:50 I just want to emphasize that legacy code isn't awful. In fact, if you're working with legacy code, it often means you're at a successful company—at least one that makes enough money to pay you, hopefully.

00:03:05 I’d like to quote one of my colleagues at MailChimp, Dustin Shields Klaus. He says on Twitter, 'When someone makes fun of legacy code, remember: legacy is a funny way to spell successful. Your startup with perfect code will fail.' This point about legacy code is significant—it’s cumbersome, curmudgeonly, and doesn’t always function as we think it should, but it still indicates success and was created by individuals who contributed to that success.

00:03:30 Now that I've laid out that legacy code isn’t all that bad, I want to share my definition of legacy code. Legacy code is a system of functioning software written by many developers over time that is often hard to change and contains business logic complexities that are not easily distilled from the code. At the bottom of this slide, you'll see my working definition, and I want to explain that the reason I put this there is that, in the early stages of my career, I would have had a very different definition of legacy code. It might have had some not-so-nice words in it. But as I’ve progressed in my career, I've expanded my definition. I'm sure in the next five to ten years, my view will change again.

00:04:45 This talk is about how to change legacy code and how to handle it. But I also want to discuss when not to change it. I’m going to quote Sandi Metz, who once said, 'If you have this really ugly piece of code, like nested if statements or a switch statement, if no one ever opens that file, if no one ever reads that, don’t change it.' The risk of introducing bugs by changing it is significant, and if no one is accessing it, you aren't gaining anything from changing the software unless it’s broken.

00:05:15 Michael Feathers, in his book Working Effectively with Legacy Code, outlines four reasons why you would change legacy code: adding a feature (which many of us do), fixing bugs (though we hope we don’t do too much of that), optimizing performance, and refactoring. I’ve grouped these into two main categories: changing behavior and improving design. The reason for this separation is that, as you change behavior, you can also improve design. It’s a more pragmatic way of enhancing your application’s design.

00:06:04 Many of us work in environments where we need to add features and fix bugs, and asking the business for a couple of months to refactor messy sections of code likely won’t happen. Therefore, I encourage you to take the opportunity to improve design as you change behavior. When I refer to improving design, I mean adding tests and also refactoring, which we’ll discuss more shortly.

00:06:46 Let’s go down the rabbit hole with an example to illustrate this. First, we’re going to add a feature. Let’s assume we work for a company called Yarn Hearts, an online yarn marketplace where people from around the world can buy different types of yarn. You don’t need to know a lot about yarn to follow along since we’ll have plenty of examples.

00:07:21 First, let's discuss the requirements for our feature. Users want to be able to search our application for online sellers who sell alpaca yarn. This alpaca yarn must also be fair trade and organic. So, with these requirements in mind, what do we do? I'm going to advise you to start by looking at the tests. Tests are a fantastic place to begin!

00:08:02 In Rails, you already have a built-in test directory, with subdirectories for controllers, mailers, systems, and models. If you’re using RSpec, you’ll look under the spec directory. Either way, hopefully, you have a good directory of tests. If you’re unsure where to begin, think about what type of object you would be calling. Is it going to change data? Is it going to modify a view? That sort of thing. You may also want to reach out to your teammates.

00:08:38 If you don’t fully understand this new codebase, your teammates can be an excellent resource, especially if they’ve been at the company for a while. They hold valuable information, even if it's currently just stored in their memory. We may eventually develop a better way to transfer that knowledge, but for now, remember that expertise in their heads is priceless.

00:09:09 For the sake of this example, while we’re adding a feature, let’s assume that our Yarn model has some tests related to it. Quoting Michael Feathers again, he mentions that it's better to confront the Beast than to hide from it. What he means here is that if you have no tests surrounding current functionality, you should pause to add tests to preserve current functionality. Let’s move ahead and add tests to maintain current behavior.

00:09:57 You may be thinking: why should I be testing? Hopefully, you all understand the value of testing, but if you’re not fully convinced, let me share my reasons. First, I want to ensure that my code works and that the assumptions I’m making about my code are accurate. Testing can also help you uncover real bugs. I can’t count the number of times tests have revealed a bug where I expected something to accept a string parameter, but it received an integer instead.

00:10:46 Another key reason for testing is that it acts as documentation, illustrating how your code is actually functioning. I’ve experienced situations where there’s a divergence between how code is supposed to work and how it actually does, which can be confusing for new programmers entering the codebase. Finally, testing can help you design better code; if something is difficult to test, it’s likely not well designed.

00:11:31 We’ve all been in situations where we think we don’t have time to write tests. It’s easy to think, 'I need to do this other ticket first.' But I would argue that you probably do have the time. And during this exploration, let me invoke Lewis Carroll's words from Alice in Wonderland, where the White Rabbit said, 'Oh dear! I’m late!' This serves as a caution against the mindset of believing you don’t have time for tests.

00:12:11 Making that choice is a trade-off—you’re deciding that the technical debt you’re accumulating will eventually need to be repaid. Hopefully, your legacy code is successful, and someone will eventually have to work with that code again. You and your team should recognize that this code will decay, and by not improving its structure, you're risking future problems.

00:12:51 If you’re short on time but need to add tests, I'd recommend reading Michael Feathers' book, Working Effectively with Legacy Code. He has an entire chapter dedicated to scenarios where you don’t have the time but need to change an area of code. Okay, let's dive into an example. I’ve done a lot of talking, so let’s look at some code.

00:13:33 Returning to our requirements, our users want to search for online sellers of alpaca yarn, ensuring that the returned sellers will provide yarn that is fair trade and organic. We’ll explore our legacy code, focusing on the Yarn model to check the database. We see the model has attributes like yarn name, gauge, weight, color, organic, fair trade, and seller ID. I’m presuming that seller ID is a foreign ID. Looking at the seller table, we find attributes like name, address, good standing, organic, and fair trade, and I discover that the Yarn model has a class method called get_sellers.

00:14:27 As I work through adding this feature, I think I know where it needs to go. I look at the code—please don’t read this too closely, it might be overwhelming—there’s a lot of nested logic, multiple switch statements, and various queries. It's a complex situation! But now that we're focused on adding tests to preserve the current behavior, it's a significant undertaking.

00:15:13 Let's start by examining the more intricate pieces of the code, particularly the nested logic within the switch statement related to 'wool.' We’re looking at the Yarn model and concentrating on the method called get_sellers. It accepts a name and some optional parameters, and we’ve initialized an empty array called preferred_sellers. Within this, we iterate over sellers, checking their good standing, organic status, and fair trade status.

00:16:14 If a seller checks all the boxes, we add them to our preferred_sellers list. If they are organic but not fair trade, we put them in mid_sellers. I realize we’ve done a lot of background work regarding what the code currently does, and it's essential to understand the decisions made during the writing process.

00:17:24 At this point, I recognize that adding tests to preserve current behavior is vital before introducing any new features. Our first step is to write the simplest test possible. For this, we’ll create a context. When we pass in 'wool' as the yarn name to get_sellers, the expected outcome is that it returns one found seller. Hence, we need to establish a seller that meets our criteria of being organic, fair trade, and in good standing.

00:18:08 Next, we’ll create a yarn instance associated with that seller. The yarn must also meet the same organic and fair trade criteria. We run the method under test, invoking yarn.get_sellers with the argument 'wool.' Our expectation is to receive one seller back that matches the seller we’ve established. Thankfully, this test passes, confirming our groundwork.

00:19:08 Let’s move beyond just preferred sellers; we also need to consider mid_sellers. In our previous example, we only focused on creating preferred sellers. Now, we must ensure that if we have preferred and mid sellers, the preferred sellers come first. Therefore, we're going to set up another test. When we have more preferred sellers than mid sellers, the expected return should present the preferred sellers before the mid sellers.

00:20:21 Once again, we need to prepare our setup—here we create two preferred sellers (both organic and fair trade) and one mid seller (organic but not fair trade). After building our yarn instances, we’ll call get_sellers on the yarn instance, passing in 'wool.' The expectation is that we receive three objects in its array, verifying the last entry is a mid_seller. Thankfully, this test passes too.

00:21:17 You may have noted that I’ve emphasized the importance of testing and refactoring several times. If you’re still not convinced about this process, that's understandable. You might feel apprehensive since dealing with legacy code can be complex. However, the goal is clear: we’re here to improve legacy code.

00:22:33 If you add tests to a section of the codebase, even if you can’t immediately refactor it, you can still make it better than you found it. By gradually improving legacy code through testing and refinement, we can transform it from a gnarly mess into a more manageable structure. As Martin Fowler states in his book Refactoring, 'Whenever I do refactoring, the first step is always to build a solid set of tests for that section of code.' Tests are vital because, despite following structured refactoring, humans are imperfect and make mistakes.

00:23:18 Having established our tests to preserve existing behavior, we can now incorporate the new feature our users want: to search online sellers for alpaca yarn that is fair trade and organic. Let's add a test. When we pass in 'alpaca' as the yarn name, the result should return sellers with only alpaca yarn that is fair trade and organic.

00:24:04 As with previous tests, we set up again: we’ll create one seller who is organic and fair trade and another who is not. This will ensure that, when we run yarn.get_sellers with 'alpaca,' we only exceed our expectations by finding one seller. However, upon execution, the test fails, which is good; it means we’ve captured the requirements in our test.

00:25:07 The error we encounter is 'undefined method count for nil class.' This indicates that 'found_sellers' is nil, meaning no results were returned. This makes sense, as the new functionality hasn’t been implemented yet. To rectify this, we dive into the existing code where we've previously added tests and include a new switch statement to handle 'alpaca.' This adjustment checks for both the correct yarn name and relevant sellers.

00:26:05 While making this change, we must ensure that our solution addresses all areas and logic. After successfully implementing this, we expect the tests to pass—indeed, they do! However, the complexity remains; the logical flow in the current structure can be overwhelming. Addressing only one aspect of the issue might leave other areas in disarray.

00:27:10 To manage this effectively, we might explore refactoring opportunities to streamline further. As you refine segments of your code, consider where you might enhance readability or simplify logic—potentially by breaking out complex structures into separate methods or objects. Doing so can provide more confidence as the code is no longer so tightly woven into a single method!

00:28:11 It’s important to be strategic with refactoring. While I encourage writing tests and subsequent refactoring, recognize that not every situation permits large-scale changes. You may encounter scenarios where tightly-bound processes generate significant revenue for your company, in which case, refactoring should be approached cautiously. It's often best executed alongside team members well-versed in that area of the codebase.

00:29:07 In conclusion, our journey has examined pain points in legacy code, how to add features, and methods to fix bugs. Now it’s crucial to share the lessons I hope you'll carry forward: add tests when you touch code that is currently untested. Your fellow developers will appreciate this effort. Moreover, you will learn a lot more about areas of code by testing them.

00:30:01 Next, refactor to improve design incrementally. In reality, we rarely have the luxury of time to refactor large segments. The key is to approach improvements in digestible portions. Lastly, do not hesitate to ask for help from teammates—they are invaluable sources of insight. Always add tests for new functionalities. Thank you for your time, and if you're eager for more resources on effective practices when handling legacy code, check out Michael Feathers' Working Effectively with Legacy Code and Martin Fowler's Refactoring.

00:31:01 I want to acknowledge the assistance of my colleagues who aided me in getting here today, including my conference buddy Megan, co-workers Tim, Kylie, Michael, and Chelsea, and lastly my wonderful husband, Gabriel. Thank you very much!