Talks
Summarized using AI

One Undo

by Katrina Owen

The video "One Undo" presented by Katrina Owen at RubyConf AU 2016 explores the complexities and strategies involved in refactoring code effectively. Refactoring often leads to compounding issues, especially when multiple changes are made to improve code quality while still maintaining functionality. Owen emphasizes that the primary challenge in refactoring is knowing where to start and how to proceed without breaking existing tests.

Key Points Discussed:

- The Refactoring Process: Owen describes a common scenario where attempting to consolidate similar code leads to a cascade of failing tests and challenges like edge cases and dependencies.

- Initial Considerations: Before starting a refactor, developers should assess whether a change is necessary, which is sometimes counterintuitive if the code seems to be functioning correctly.
- Testing Prior to Refactoring: The importance of establishing tests before making changes is highlighted. Testing helps ensure that the changes do not inadvertently break functionalities.

- Identifying Duplication: Owen discusses the importance of recognizing areas of code duplication and categorizes them into templates and data.

- Iterative Improvements: Through a systematic approach using a children's song as an example, Owen illustrates how to make small, incremental changes by isolating decision points and reducing complexity.
- Naming Conventions: The video underscores the significance of applying proper naming conventions during refactoring, which aids in clarifying the code’s purpose and structure.
- Parallel Implementations: Owen explains how creating parallel structures can help in understanding and refining code without breaking existing functionality.
- Conditional Logic: Owen discusses the role of conditionals during refactoring as a transitory tool, emphasizing the need for simplification.

- Final Structure and Clean-Up: The talk concludes with the importance of adopting clear structure and naming that represent the code's intention, focusing on patterns rather than refactoring for the sake of applying design patterns.
- Learning Through Practice: Owen advocates for regular practice with safe, small steps in refactoring to develop an intuition for recognizing effective abstractions and simplifying code.

In conclusion, Owen reinforces that effective refactoring requires a blend of strategic thinking, testing, and a focus on maintaining functionality while improving code quality. The process is systematic, advocating for clear naming, recognition of differences, and incremental improvements designed to avoid overwhelming failures in the test suite. By following these principles, developers can improve code quality without incurring significant overhead or lost time due to undoing extensive changes.

00:00:01.680 There's this thing that we do, and I'm pretty sure we all do it, where you realize that this bit of code and that bit of code are basically doing the same thing, except they're implemented differently. So you decide to switch everything over and use just one of them.
00:00:10.000 About 17 tests break, and you remember that there's this one edge case. You quickly patch a conditional, and now 24 tests are failing. Then, you realize that there's a missing dependency, and you pull that in, which leads to about 50 failing tests. It turns out there's a conflict between two of the temp dependencies, so you swap them around to load in the opposite order.
00:00:19.760 Now, almost 80 tests are failing. By the way, that encoding thing seems to have gone haywire again; you know, where it's assuming it's getting UTF-8, but actually, it's getting Latin-1—or is it the reverse? I don't know. The method you're trying to use needs to access some stuff that's not a part of the public API, and so you use 'send,' and the whole bunch of stuff blows up due to unexpected nils. It's been hours since the tests were last passing, and now you're considering monkey patching the middle class.
00:01:14.400 The most difficult part of refactoring isn't actually making the change; it's knowing where to begin and what to do next, and figuring out how to make that change safely. I'm going to use the children's song that goes 'I know an old lady who swallowed a fly' as an excuse to talk about decision making and refactoring.
00:01:24.320 Now, some of this is incredibly simple, but it has an algorithmic component, and so this gives it enough complexity so that we can talk about real-world refactoring principles without getting bogged down in real-world code. The entire example is all just text; it's hardcoded into a heredoc, wrapped in a method, and wrapped in a class.
00:01:50.560 The very first thing you need to ask yourself when you're about to refactor is whether or not you should do it at all. In this case, frankly, the answer is no. The song hasn't changed in generations, the code could be appalling, and it doesn't matter. If the code works, we're never going to touch it again—just put it in production and walk away.
00:02:22.080 So, this refactoring is going to be completely spurious, which under normal circumstances I would advise against. Now, once you've decided to refactor, the next question is whether you have tests. Again here, the answer is no because why on earth would you test a hard-coded string? Before we start refactoring, I'm going to copy the string into a test to ensure that the code still works once we start changing things.
00:02:45.760 Once you have a passing test, the next step is to just look at the code for a bit—notice stuff, see what jumps out at you. Look for things in the code that you don't like, and then pick one thing to focus on. I tend to pick either the thing that I hate the most or the thing that I understand the best.
00:03:13.200 What most people remark on here is all of the duplication in the strings. There's duplication between verses and duplication within verses, and of course, it's not exact duplication. The bits that change are embedded within the things that stay the same, and also, it's stuck in this large heredoc, which makes it hard to deal with a small piece at a time.
00:03:36.320 The first change I'm going to make is to introduce a little bit of indirection. I need to be able to handle each individual verse separately, and for this, I'm going to use a case statement with one branch for each verse. This means we have to loop over the range of verses and join them to get the full lyrics, and you can do it all in the same method. However, we only care about the case statement, so I'm going to isolate it into its own method.
00:04:01.599 Finally, we can start touching individual strings. I want to tease apart the bits that vary from the bits that stay the same. In other words, some of this is template and some of this is data. We immediately run into the problem of naming things. Typically, you extract it and name it, but if you get the names wrong, it can make the code dramatically harder to understand, and code that's harder to understand is harder to change.
00:04:21.440 The easiest time to get the names wrong is at the beginning of refactoring, when you understand the least. Of course, even if you get the names right, it doesn't fix the problem of having everything mixed together. So when dealing with this sort of thing, I like to use C-style string formatting using the percent method on string. This provides a clear separation between the static part and the variations.
00:04:53.760 I like that it doesn't matter where in the string the placeholder is; all of the data ends up outside of the template, and I haven't had to name anything. If you apply this to each of the verses, the duplication between them has gone from being similar to being identical, and all the bits that vary are off to the side. The same goes for the duplication within the verses; each entire verse now has all the template-like stuff on the left and all the data on the right.
00:05:31.920 It's a tiny change, but it's noticeable. This is what we started off with, and now we have this. You might argue that I just shoved code around on the slide and pretended that I've refactored, like some people shove food around on their plate and pretend that they've eaten. It seems kind of pointless. However, some people can look at duplication and see what methods and objects to extract, and I can't do that—usually not at first.
00:06:02.800 I dissect it in place to get more information. In this code, separating the template from the data has made the algorithmic part a lot more obvious. Each verse grows systematically, adds repetitions, and stops abruptly. Most of the verses take phrases and then repeat them a certain number of times, passing in a variety of different creatures.
00:06:24.000 For the first time in this refactoring, I want to name something. There are three common strategies for naming things, and two of them are not very effective. The first is to name it using some fragment of the implementation, like 'swallow' or something equally concrete. This type of name quickly gets outdated—you end up with a method named 'blue' that returns the hex code for red.
00:06:48.000 The second strategy is to name it structurally, so this is a recurring line or it's an incremental sentence or it's the middle phrase. This kind of name is technically correct but perfectly unhelpful. The problem domain isn't really about phrases; it's about a little old lady who inexplicably swallows a fly and compounds the problem by swallowing larger and larger creatures. This part of the song attempts to explain the reasoning behind her behavior—it's trying to explain why she would ever do such a thing.
00:07:12.320 In other words, we need to call 'motivation' some number of times, each time with different data. Ignoring the fact that the cow and goat are vegan, the predator in one verse becomes the prey in the next. What we really want is a single list, so that we're not writing everything twice. Ruby offers an enumerable method that conveniently yielding each consecutive pair given a list of critters; we loop over them, pass each pair to the motivation method, and join it into a single string.
00:07:47.680 This chunk of code seems like a complete thought, and I'd like to name it. Here we're talking about the sequence or chain of events—it's a bizarre food chain. Two things just happened: we identified and extracted the algorithmic piece, and we named it. One thing explicitly did not happen—we didn't use the new code anywhere.
00:08:06.400 Technically, this is called a parallel implementation, but in reality, it just means I wasn't sure what I was doing and wanted to figure it out separately without breaking the code. After you create a new implementation in parallel, you should be able to swap it in, and it should just work—or not, like here. Sometimes it fails, indicating there's something you haven't understood yet. Here, one part of the chain is not like the rest.
00:08:30.080 What you'll see in most codebases when you have a rule with an exception is a conditional. These can be problematic in a 'foot-in-the-door' kind of way—blink, and you have 12 conditionals. Eventually, someone is going to get hurt, but within the context of a refactoring, conditionals can actually be a very useful transitory tool. It's two steps: first, you create the conditional, and then you name what it represents, providing more insight to make better decisions about what to do next.
00:09:06.400 The conditional itself should contain the smallest possible difference. This makes sense; if you have the same thing in the 'if' and the 'else,' then it's not conditional on anything. That gives us the tangled web of the spider, but there's nothing to put in the 'else' branch, and you could drop it. However, that might lead you to try to name just the exception. The conditional doesn't represent the exception to a rule; it represents two variations on the same idea.
00:09:44.560 It's not two different things; it's one idea, and that idea is what you have to name. In the song, we're talking about the critter's distinctive features or some sort of qualifier. Just some critters aren't all that distinctive or special. If you had left only the bit about the spider, you would probably end up naming it in terms of the spider, which would complicate your refactoring. The method needs to take an argument; 'prey' is about hunting—not just one animal killing another but about the critter itself.
00:10:52.800 Here's the payoff: notice how the method has completely isolated this conditional. The result of this method relies entirely on the argument you pass in. The method could live anywhere; it could be a global function for all we care. It doesn't depend on anything related to the song class. In other words, we have this critter string and we're trying to decorate it with more behavior. Instead of passing the critter to the qualifier method, we should invert it and send the qualifier message to the critter, which means critter has to be an object.
00:11:30.560 The data can be extracted from the verse, so we create all the critters and add a handy method to get just the ones we need for whatever verse we're on. Now, 'predator' and 'prey' are no longer just strings; they're objects to which we can send messages. This chunk seems a bit dirty; it's not really dirty—it's just in the wrong place. It belongs inside the critter, which means we have to name it.
00:12:00.480 Finally, we can slip the chain back into the case statement without causing a single test to fail—our only test, by the way. There's this abstraction that wasn't at all obvious at the start, but it was there, just buried beneath the surface. Focusing on the differences isolated meaningful ideas. All these ideas culminated in building this tiny object: the critter.
00:12:33.040 For all this time, we've focused on differences and similar similarities. We started out with fragments of plausible duplication throughout the song, and now there are whole chunks of unmistakable duplication. Even the data has transitioned from being mostly a jumble to having recognizable patterns. It's tempting to focus on the sameness—to extract it or name it, but resist that temptation for as long as possible. Identify the smallest difference and see if you can make it go away.
00:12:55.920 Then ignore it and look at the remaining differences. The first difference here is easy; it's the critter's name. Once we get that fixed, we are left with just one difference: the critter object needs to be a little bit smarter, and we're stuck trying to name this concept. It's somewhat like commentary or maybe an aside. We will also need to pass more data when creating critters and then use the abstraction.
00:13:24.960 Now, this is identical, and we can ignore it. Take a moment to squint at it—notice that the red stuff is strings, and the black stuff is stuff we've named. The template is exactly the same in each verse, and it's all in red, while the data is the same in each verse, and it's almost all black. The asymmetry sticks out; it's the aside for the fly. If you have an abstraction for something, you should use it.
00:13:53.200 Now something interesting has happened: all that sameness we've been ignoring can now mostly go away. Sameness is a trap; it's a distraction. Differences are the parts that are interesting because they're fragments of a larger idea. Once you've named the larger idea, the sameness either evaporates or condenses into something obvious. This all started with identifying duplication as being the ugliest, most obvious, and most understood problem.
00:14:34.560 We encapsulated the algorithmic portion of the song, discovered and extracted the critter, and now most of the duplication has collapsed. We need to look at what's left. The song class is shorter than it was, but it's doing a lot more. There are four methods in there; three of them are private, and all three of those are about the verse. There’s still some duplication in that case statement—it's here and here.
00:15:08.800 All the private verse-related stuff seems like it could be its own thing. The verse depends on 'i'; we should pass it into the initialize method and expose a reader for it. A class named 'verse' shouldn't have a method named 'verse.' This would be better as '2s.' Finally, we can swap out the call to the old verse method with the parallel implementation, and then delete the old implementation.
00:15:43.040 I'm not saying that you should always split things up into new methods and classes, but it can be useful to ask yourself, 'If this were two things, what would they be?' If you come up with a plausible answer, something that's not too far-fetched, then it might be worth exploring. Just be ready to inline it back if you don't like it. In this case, encapsulating the verse makes it incredibly easy to fix the remaining duplication issues.
00:16:47.680 Last, critters can be assigned to an instance variable with a reader. The same goes for the first and last critters, and as before, the duplication draws the eye. Ignore it and instead find the smallest difference. Isolate that somewhere, name it. In the context of the song, this is everything that happens after the little old lady swallows the critter—it's the aftermath of the incident that kicked off the entire chain of events.
00:17:28.960 If this is the aftermath, then the first part is the incident. Now '2s' tells a story with all of the details relegated to private methods. Lastly, since we have a critter's method on the instance, we don't have to pass it around as an argument; we can just rely on the reader. The verse structure is decent, relying on incident and aftermath, which in turn relies on the chain.
00:18:10.160 This is pretty good, but I still have one complaint left about this: the case statement. I just added it at the beginning to get the refactoring started, thinking it would be temporary, and now it's still hanging around. That's kind of annoying, so the final refactoring is one of the classics from Martin Fowler's book called 'replace conditional with polymorphism.' It's a step-by-step recipe.
00:19:06.880 Refactoring is not about reaching maximum design pattern density; it's about balancing simplicity, readability, and changeability. It can sometimes feel a bit like a dark art—and it kind of is, in the sense that there's a bit of gut feeling involved in recognizing the stuff you don't like. However, that gut feeling and intuition can be learned.
00:19:41.680 Read about code smells, look at code, do code reviews, practice refactorings, and try things, then back them out. Try something else. It should feel like you're discovering abstractions, not inventing them. A good abstraction feels obvious in hindsight—it was already there; you just needed to clear away some of the noise around it.
00:20:13.760 Don't name the noise; don't name the fragments and slivers and shards of ideas—name the whole concepts. This clearing away is a controlled and systematic process. It’s meticulous; it means taking safe steps, and safe steps don't break your test suite.
00:20:25.600 Notice how different this is from TDD. In TDD, a failing test is a precondition; you make it stop failing by moving forward. In refactoring, a failing test is feedback that you did something wrong; you fix it by backing out that change and trying something else. Safe steps are deliberate, careful, and small.
00:20:46.240 When I say small, I mean much smaller than you'd probably be used to. I'd encourage you to make your steps so small that if you get a failing test, you can get back to a green test suite with one single undo. I'm not suggesting you take steps this small always or forever, but practice it long enough to gain an intuitive understanding of how to make changes safely. I promise you will never again get reset hard away hours of your life.
00:21:16.160 Thank you.
Explore all talks recorded at RubyConf AU 2016
+11