Better Code Through Boring(er) Tests

by Betsy Haibel

In the RubyConf 2016 talk "Better Code Through Boring(er) Tests", speaker Betsy Haibel addresses the common frustrations developers face with testing and offers practical solutions. She emphasizes the need for tests that are simple and easy to manage, echoing developers' desire for 'boring' code that is easy to understand over time.

Key points discussed throughout the talk include:

- Understanding Developer Frustration: Haibel asserts that despite the general agreement on the value of testing, many developers find their tests annoying and troublesome, partly because they often reflect our imperfect understanding during development.

- Test Smells: She identifies three specific "test smells" that can complicate testing:

- Testing Private Methods: Instead of directly testing private methods, developers should consider making these methods public or extracting them to a new class, thereby simplifying tests.

- Test Duplication: Haibel cautions against mirroring application code in tests, which can lead to conflicts during refactoring. Instead, she proposes capturing shared functionalities in application code directly instead of in testing code.

- Inventing the Universe: She warns against excessive setup in tests that leads to unnecessary complexity. Developers should wait until they have multiple test cases before creating abstractions within their tests.

- Complex Test Setups: Haibel discusses how using tools like FactoryGirl can mask complexities in testing setups. She encourages developers to pull essential logic into application code to maintain clarity and control.

- Boring Tests as a Strategy: Overall, the main argument is that boring tests lead to more maintainable and comprehensible code. By striving for simplicity and avoiding clever or complex constructs, developers can ensure that their tests remain useful as the application evolves.

The talk concludes with the notion that listening to your tests can highlight opportunities for refactoring both the tests and the application code, ultimately reducing the frustration associated with both. Better understanding and structuring of tests can lead to better application development practices, aligning both the tests and the code for future enhancements.

00:00:15.299 Welcome to 'Better Code Through Boring(er) Tests.' If you're in this room or one of the few stragglers still filing in, don't worry, you're welcome. We got started late, and I'm assuming two things: first, that you value testing, and second, that you hate your tests.

00:00:20.890 In part, this is because these are the problems I told you I would solve in my abstract, but it's mostly because I think these two things are the default state of developers. Everyone agrees that testing is important. Even people who may not find as much value in test-driven development or in other specific testing practices still agree that you need some kind of testing.

00:00:37.299 There’s a blog post about why things that are not TDD are still awesome. But we thought about testing, even though we collectively value it, even though we all know that we need something because no one, literally no one, loves testing all the time. Even die-hard little TDD fanatics like me. It's okay to admit it.

00:01:07.479 The testing police are not coming for you today! I'm going to help you fix this mess. Today, we are going to learn why we hate our tests. We will explore why we fight about them and also learn about some ways we accidentally make our tests worse, sometimes by getting fancy and making them too interesting. But all is not lost; we will also discover ways we can fix these problems while making our application code better at the same time.

00:01:43.409 Because at the end of the day, all I want as a developer is boring code that I know I will understand in four months. Empathy for my future self, and connections through building between me and my coworkers. To do this, first we're going to talk generally about some underlying causes of test hate and test fights. Then we're going to talk through three specific ways that our tests can get annoying and learn how to fix the root causes.

00:02:09.269 So, why do we hate our tests? The simple answer is that the test code is terrible, and therefore the tests are terrible. But that's cheating. The version without cheating is that we write our tests with imperfect knowledge. We write all of our code, including application code, with imperfect knowledge. We can make educated guesses about what we might need in the future, but all they are is guesses.

00:02:31.080 We can never entirely separate ourselves from our assumptions or our current mental model when we write application code, and of course, our tests run in the same context. Whenever we've made some incorrect assumptions in our codebase, it typically makes our tests or our application code, or more usually both, kind of painful to work with.

00:03:06.250 There’s always going to be some die-hard TDD fanatic like me who insists on making this jerk of themselves and says, 'Oh, it's not a problem; you should just try listening to your tests.' I know this because I have been this jerk before, and I am so sorry.

00:03:42.750 Why is saying 'listen to your tests' such an annoying and jerky thing to say? It's because you are presenting your listener with this useless, hole-ridden process and calling it a map; you are telling them they're stupid if they don't magically understand what the question marks are and how to fill them in. If you are the person who is annoyed by this, your annoyance is legitimate. I ask that you empathize with the person who's being annoying because I can tell you from intense personal experience that people aren't saying we need to listen to our tests because they want to be arrogant know-it-alls. They see a thing that is wrong and want to fix it, but they don't quite know how.

00:04:23.580 In this talk, I want to help fill in the question marks so that no one here has to feel stupid or sound like a jerk again. Before we get started, everyone take a deep breath: bad code happens, bad tests happen. This is fine. At one point, there was a 'this is fine' dog slide right after this, and I deleted it because it's actually fine. Code is terrible because the world is complex.

00:04:59.069 It’s good for us to be honest with ourselves about whether that complexity is useful or necessary, right? And this takes discipline. Complexity happens; that’s why we are paid a lot of money. Complexity happening to you does not mean that you're a bad developer or a bad human.

00:05:27.269 Next, we invented more esoteric testing techniques. I'm going to call these test smells later, because sometimes you actually need them. Be honest with yourself about whether you actually do need them, but none of the techniques I'm about to talk about—and say you maybe shouldn't use—are inherently bad.

00:05:54.149 Finally, and most importantly, I have done everything I am about to show you in production. Everything I am about to call out as a mistake is a mistake I have made.

00:06:08.069 The first test smell we’re going to look at today is testing private methods. When you see this line of code in the test, when you see that `send` method, chances are someone is using it to try to test a private method. Gasp! I've heard a lot of different justifications for testing private methods directly instead of indirectly through public methods. These boil down to three things.

00:06:44.799 First, the method might have weird indirect results that are hard to introspect in the public method. Second, the public methods that you'd use to test it might be really expensive to test. And third, someone might just want to isolate the test from weird side effects elsewhere in the code. The thing about these reasons is that they’re hard to argue with, even if you're feeling uncomfortable about testing a private method.

00:07:06.129 So what I suggest is that instead of trying to confront these arguments, we can sidestep them. Instead of testing a private method, we can first just make the method public. This is an underrated idea! If you're feeling an urge to hit something directly and test it because it's important, maybe it's important enough that in your application code later, you'll want to hit it.

00:07:35.200 Second, you can build more introspection logic. It's hard to check whether a process completed properly in tests. So do your future self a favor: if the logic is hard to check while you're debugging, it will make it hard to check while testing as well. Lastly, you can extract a simple class. I'm going to show you a really quick and dirty way of starting to do this.

00:08:10.300 You can use this shim: replace the contents of the private method with an object, a new object, and we pass the object that has the private method in it into this class. Then we just copy and paste the code from the private method into that new class’s call method. Once you've done this, you no longer have a private method; it's super easy to refactor it down later. If you pull on a thread and sometimes just get a bigger tangle, that’s okay. If it doesn't work out, feel free to abandon it.

00:08:44.600 If you pull on a thread, sometimes it gets tangled, and I'm not referring to knitting. This is something I think about a lot. That's why things like `dash dash dot` were invented. It’s probably still a smell that you want to test a private method, but sometimes it is okay to fix things later. We call these test smells because it's not the end of the world if we cannot untangle them right away.

00:09:21.820 We don’t refer to them as test errors. In fact, if you don’t know how to unstick something in the time you have, it is usually better—and not just okay—to wait until you have more time and knowledge. In the meantime, practice harm reduction. You can isolate the weird part, make it dead obvious that's weird. Maybe that means leaving the direct private method test in, or it means running the slower direct tests; either way, it means commenting on it.

00:09:49.120 Once you've communicated that this is ugly, that someone needs to fix it, and you’ll know more later, you can feel free to move on.

00:10:06.440 The next smell I’m going to cover is test duplication. Let's pretend we work for a service that coordinates donations to nonprofits or political groups, perhaps similar to my previous employer. Our clients, the nonprofits we serve, each have a primary contact phone, and we look up whether the phones are mobile or not.

00:10:34.570 Here are some tests that check on that mobile lookup process. We set the client's phone number to a predetermined phone that we know is a mobile phone, and then we check the result of that mobile phone method. Then we do the same for a known landline phone. Now one thing that should be clear is I'm using MiniTest as a least common denominator for all my test examples, but everything I’m saying here applies to RSpec too.

00:11:06.160 If you look at these two test cases, the syntax is different, but the content is the same. Both are bounded test cases where we set a client phone number, and in both, we verify the mobile method returns the thing we expect. That’s because the actual activity you’re doing is the same, regardless of the library you choose to express it with.

00:11:42.230 And so, we look at these clients, and we need to know whether they're using a mobile phone or landline phone. Later on, we get donors—who are distinct from clients but also have phone numbers—and we test in exactly the same way. You may have noticed that like one thing changed on the slide back and forth, so we’ve got some test duplication.

00:12:19.920 I don’t know about you, but anytime I see code duplication, it makes my brain feel weird. It makes me itch and wonder if there’s complexity that I am not yet managing to organize. So how should we deal with this duplication? One wrong way we could use is shared example groups.

00:12:46.130 There’s a special DSL for these in RSpec. I’m not going to cover it in MiniTest, but we would just share code between the two test classes with a module just like we would in our application code. With a shared example group, we put the shared code into a module. Note how little has changed when we include this module in our test classes, and while this is a good testing technique in some circumstances, I’m calling it out as a mistake here.

00:13:17.830 That’s because it should almost never be your first go-to when you see test duplication. When we look at the application code we’re testing, we see why our tests are basically just a one-to-one mirror of our application code. This means that our tests are making the exact same assumptions that our application code is making.

00:13:59.060 Over time, this will lock us into these assumptions. When we go to refactor this code later, we may find ourselves changing the application code but not the tests, leading to an awkward fight between our application code and our tests. We may be frustrated by conflicts with future requests that are at odds with the way we've overrefactored.

00:14:39.580 Instead of using tests as we have now and because new assumptions create conflicts with the assumptions we’re making in tests, we often have to change a lot of test code before we can even think about adding new application code for those features. One alternate thing we could try is to reduce duplication by testing the module directly. In this technique, you create a fake mini class.

00:15:22.050 You include the module in it and then run your module tests against that fake class. This again can be a great technique, particularly when you are convinced that the code needs to be in a module, but it ties you hard to the assumption that your code should be in a module, and that assumption rarely holds true.

00:15:58.750 If we turn that module into a class instead, then we can test that class directly. We can reduce duplication in the same way as when we extracted the shared example group or tested the module directly, but unlike those cases, we would be attacking the application code first.

00:16:36.820 Looking at our tests and seeing what we can learn by listening to this awkward duplication in our tests allows us to identify shared functionalities. Hence, we can improve our application code by properly encapsulating the single responsibility of phone number lookup. This is a much more sustainable solution than the way the shared example group doubled down on the module architecture and locked us into long-term decisions.

00:17:14.750 As a side note, I modeled this example after a lot of times I’ve seen or have made API clients that are inserted as modules, but it can apply anytime you have a module that really wants to grow up into a real class.

00:17:52.420 The last smell I’m going to show you is ‘inventing the universe.’ There’s a particular kind of test where, to paraphrase Carl Sagan, you are trying to bake an apple pie from scratch. Don’t even bother reading that code; there’s too much before reasoning, and so you need to invent the universe. You need to plant some trees, water them, pick the apples, harvest the flour, and so forth before you can finally test if the pie came out tasty.

00:18:36.009 The first mistake we make when faced with tests like this is to address the iffy setup bit immediately. This again undermines our core job as developers, which is organizing complexity. When we see unorganized complexity, it can feel overwhelming, especially when it’s just a mess of code in our tests. But boring tests lead to boring code.

00:19:24.139 If we make any new test abstractions now, we'd be pretty doomed. This is the same mirroring nonsense that we just learned how to refactor away from, just like in the module example. This is because we only have a little bit of knowledge about our domain so far.

00:20:01.510 Specifically, we understand one possible path for creating one possible kind of pie. We know that this one possible path is strictly required for the application, yet we know nothing else. This means that anything we do to abstract this further would be nothing more than guesswork.

00:20:38.150 We would be using our current understanding and assumptions about the problem to guess what abstractions you might need. Guess where else we are using assumptions? In the application code. When we try to create new test abstractions in advance of test duplication, we risk creating abstractions that parallel our application code. Just like we saw in the module extraction example.

00:21:22.200 The first important thing you need to do here is to wait until you have two tests or three tests. Once we have multiple test cases, we have enough information to start formulating our abstractions and can begin thinking about techniques to encode them.

00:22:01.199 In this section, I will show you two ways that I’ve seen people make mistakes by trying to encapsulate abstractions within test-only code. Then I will show you what I recommend as an alternative. Things in this section are a little less clear-cut. The techniques I’m calling out as mistakes are indeed valid techniques in specific contexts.

00:22:58.360 One of the first test approaches I’m going to tell you about is shared contexts. RSpec has a DSL for this, but in MiniTest, all you need to do is write a method and include it in your setup block. You can even include it in a module to share between test classes. While using shared contexts can be benign, reinventing the universe from scratch every time you write a test can be painful.

00:23:35.970 Sometimes, it’s not particularly useful pain. For example, if your application has a database—perhaps you are using a popular application framework—it is okay to extract that database cleaning logic into a shared context, as that popular framework does by default.

00:24:13.620 The difficulty lies in telling what’s test-specific and what’s not. In the case of cleaning your database, one way to model this is to consider: Are we ever going to want to drop our database under normal circumstances? Probably not; therefore, it’s safe to assume it’s a test-only concern.

00:24:48.850 When we’ve stripped out all the incidental lines in the test or moved all the elements we’re sure are test-only outward and still have an extensive test setup phase with a lot of code, it’s a sign that an object has too many dependencies. It’s always wise to examine what those dependencies are. Sometimes they are necessary.

00:25:23.920 A pie without filling is super sad, and while I will flag shared context here as a warning sign that perhaps there are too many dependencies, I won’t downplay it. When people discourage this, they imply that the domain complexities leading to too many dependencies are unnecessary or easy to resolve, which is rarely true.

00:26:02.390 Fortunately, mitigations for necessary domain setup complexities and unnecessary ones have a lot in common, especially in complicated legacy code or when deadlines prevent immediate removal. We're going to get to those shortly, but first, I want to address one more false trail.

00:26:39.620 I should emphasize that this is not a rant against FactoryGirl. I use it in many projects because it is a great DSL for describing tests and seed data when used effectively. It has a few traps, though. FactoryGirl enables and encourages us to create factories—dedicated test setup helpers—like `FactoryGirl.create(:pie)` creates a pie with a filling, and `FactoryGirl.create(:filling)` creates a filling.

00:27:12.330 This is a problem, but not due to dependencies. If we say, 'Oh, we can’t see the direction of dependencies,' we revert to the notion of listening to our tests as merely a magical process. The real problem isn't that something hides dependencies; it usually just means code you're not looking at right now. Ninety-five percent of the time, I do that on purpose.

00:27:51.479 I want to look at my dependencies when they’re wrong or when they misbehave; otherwise, I’d like to extract them to think about them less. The problem with `FactoryGirl.create(:pie)` versus `FactoryGirl.create(:filling)` is that whether a pie belongs to a filling or vice versa is really important domain logic.

00:28:36.070 As a maintenance developer, I want to know that we make fillings in the course of making pies, not the other way around. Thus, I suggest that when your test setups get complex, consider moving the important aspects your test setups are reflecting about your domain logic into your application code instead.

00:29:17.790 The first and simplest strategy is to pull more of the setup logic into the constructor. But be careful; this could lead to a complex constructor. Instead, consider making factory methods on a class. These methods usually offer shorthand for invoking the constructor with certain configurations. You can also build out genuine factory classes if necessary. There’s nothing wrong with going full Java if your application needs it.

00:30:00.710 Sometimes, when you've got a particularly complex piece of domain logic, it's important to understand how that complex logic functions. When we hide this logic in FactoryGirl, that can become a problem. When we try to dig it out, it feels akin to a startup giving an agency its prototype. Later on, when you have more money and you've proved your concept, you want to bring development in-house.

00:30:43.245 In many instances, the core function that makes your business tick is too important to give up control. If we delegate essential code and functionality to FactoryGirl, we lose the ability to control and understand it.

00:31:20.049 Let’s look at a simple before-and-after example; this is how our universe test appears before we start extracting some of its logic into factory methods. If we begin using application code factories judiciously, we can significantly enhance how readable and understandable this code is for future developers.

00:32:07.050 We have lost sufficient lines to display this on our screen, so I will not do a victory lap here, but we’ve covered everything we wanted: underlying causes, test smells, and so forth.

00:32:55.210 But we are not quite done yet. I did this because it gave me the opportunity to add a DC flag. I could not resist! Our statehood vote was probably futile, but I am so proud of it. Thank you for humoring me.

00:33:29.050 We’ve just performed a magic trick by insisting that our tests be as boring as humanly possible. We’ve isolated a domain concept that we can use to make our application code more readable to new developers. By insisting on boring test code, we have made our application code more boring.

00:34:05.410 Here’s the trick: as part of this magic trick, we need to acknowledge that this is a series of subjective value judgments. The value of boring code over clever code is a widely popular idea and is why I got you in this room. Because we can all agree with that concept, we sometimes miss that it's a value judgment.

00:34:40.800 If we don’t define what boring code is very specifically, we risk merely creating another way to label code as bad without explaining what makes it bad. When I say that `FactoryGirl.create(:pie)` is the opposite of boring, I mean it’s a line that should strike fear into the hearts of new developers and even seasoned ones.

00:35:09.370 What I’m reacting to is a history of seeing that one line imply thousands of lines of code that I suddenly need to know about but have no easy way of searching. However, the developer that wrote that line—the one who has worked on that code base for years—might disagree violently with me.

00:35:51.760 That developer has a great intuitive sense of the thousands of lines of implicit domain knowledge embedded in that line and doesn’t need pointers to refresh their memory on details. Therefore, they might see the Java factory version as an impervious velocity booster.

00:36:22.920 I'm pointing this out so you can figure out which parts of this talk are relevant to you and which are not. Regardless of what parts of this talk you keep or discard, what I want you to take away is: whenever you're writing tests, you are writing based on your current understanding of your code and application needs.

00:36:57.200 Duplication is cheaper than creating the wrong abstraction. Quoting Sandy Metz—who is wonderful—this applies to test code too. When you introduce new test abstractions, they likely mirror the assumptions you are making in your application abstractions.

00:37:33.210 Application code influences test code, and test code, in turn, influences application code. Whatever boring means to us, sometimes the easiest way to make our tests more boring—to hate our tests less—is simply to make them more boring by sheer willful effort, glaring at our application code until it aligns.

00:38:09.740 The only thing that ‘listening to your tests’ means is that the places where you hate your tests are opportunities to refactor your application code, leading to a mutual decrease in hatred for both your tests and your application code.

00:38:51.210 This is Betsy Haibel. I tweet @BetsyHaibel. Most of my tweets are about feminism, cats, and science fiction—not so much code. But if you want to find me on the internet, I'm at BetsyHaibel.com, and on GitHub at BetsyHaibel.

00:39:27.080 I work for a lovely company called Rousseff that tries to make mortgages less terrible. You may have heard of us. I'm here with my coworker Tara, who also recently spoke. We are in fact hiring! Thank you for your interest!

00:40:10.000 Before we get to the questions part of the talk, I want to do a brief advertisement for the other testing speakers—Justin Searles and Null Rappin, as well as Sampath, who organized the testing track and I. We are all doing a special question-and-answer session.

00:40:45.000 Find us in the lunch line. Can you stand up, Sam, so that people can spot you? Sam is very easy to find; he's got great demeanor. We’d love to see you there, and if there are questions you'd rather ask now, please feel free to do so.

00:41:00.000 The first question was about techniques that can make you feel safer about a test refactor. The best advice I have is: boring is good, but I mean something very specific by that. Rather than that misguided platitude, what I mean is to deduplicate.

00:41:40.000 Don't try to do anything interesting with it! One of the ways tests often become confusing is that you notice a duplicated process or something complex that you want to hide. You may abstract it too early.

00:42:24.000 So, to fix your tests and reduce the assumptions encoded within them—or to support more types of application code changes in the future—just deduplicate. Take whatever method you have, and replace all the instances where you call that method with the contents of that method.

00:43:04.000 The same goes for unpacking a factory method into many boring test setup lines. When you engage in refactoring tests, the goal is never to make them clever or concise; it’s to make them more verbose so that you can see how your application code shapes your tests.

00:43:40.000 The question was about the technique of moving setup code into a module and what the downsides are. It depends specifically on what that module is doing. If the module is focused on testing-related functions, there are likely few downsides.

00:44:19.000 However, the issue is that it can add more interaction complexity to your tests. I want my tests to be reliable and devoid of bugs. I want them to avoid doing anything interesting or unexpected.

00:44:54.000 One big trap is if you extract significant domain logic into a test dependency. Another issue could arise if you move too far into helper classes or use tools like RSpec shared metadata. This can create an artificial world that does not effectively simulate what's going on.

00:45:33.000 Do I recommend testing your test helpers? It depends on their complexity. If it’s a one-liner, and you’re pretty sure you know what it does, maybe not. But after a certain complexity threshold, yes—they are code, and can definitely fail.