RubyConf 2021

Fake Your Test Away: How To Abuse Your Test Doubles

Fake Your Test Away: How To Abuse Your Test Doubles

by Jenny Shih

In this talk titled "Fake Your Test Away: How To Abuse Your Test Doubles" presented by Jenny Shih at RubyConf 2021, the speaker explores the complexities of unit testing, particularly the pitfalls of using test doubles. She emphasizes that while test doubles serve as substitutes for actual dependencies, their misuse can lead to unreliable and unmaintainable tests. Jenny draws parallels between the evolution of civilizations as described by Douglas Adams and her personal evolution in understanding testing, highlighting three phases: survival (how to use test doubles), inquiry (why testing is essential), and sophistication (deciding when to eliminate tests). Key points discussed include:

  • Definitions: She defines critical terms such as 'testable', 'subject', and 'dependency' to establish clarity for the audience.
  • Pyramid Model: The ideal testing structure resembles a pyramid with unit tests at the base, followed by integration tests, and lastly acceptance tests at the top. However, she notes that the reality often diverges from this ideal.
  • Integration Tests: Jenny explains a flawed integration test example involving restaurant service, showcasing confusion arising from inappropriate implementations of test doubles leading to random test failures.
  • Helicopter Unit Tests: She describes these tests as overly cautious, attempting to control every detail, which leads to tests that cannot effectively catch production errors.
  • Solution Strategies: Jenny suggests strategies such as simplifying dependencies by utilizing the principle of least knowledge, separating decisions and delegations, and making dependencies explicit.
  • Refactoring Tips: Jenny provides practical advice on enhancing test reliability by better structuring production code, thereby minimizing the need for random or awkward test doubles. She concludes that if writing tests feels convoluted, it is often indicative of underlying problems in the production code itself.

In conclusion, Jenny advocates for creating a resilient and enjoyable test suite by being mindful of test design and relying less on arbitrary test doubles. By addressing the structure of production code and its dependencies, developers can foster quality testing practices that contribute to overall software resilience and maintainability.

00:00:11.360 I want to start off with two things. First of all, there's a Discord channel for this talk, and I'll be checking messages as often as I can remember to. If you have anything to say, please feel free to use that option.
00:00:19.119 The second thing is that Emily Giulio gave an excellent talk yesterday on how to find the right scenarios to use mocks. If you weren't there, I strongly recommend you check it out. This talk is about how to find the wrong scenarios to use mocks, so you are very well covered.
00:00:30.400 My name is Jenny, and I work at Pico Lodge. We build apps and strive to get people to use them. Two years ago, we were the main sponsor at RubyConf Taiwan. As a sponsor, we had a nice booth and a chance to display some swag. I took the opportunity to print out my very first Ruby joke sticker.
00:00:53.360 Thank you! Great crowd. Now, let's talk about Douglas Adams. Some of you might have heard of him because he wrote a little series called 'Hitchhiker's Guide to the Galaxy'. In this book, we learn that each major civilization tends to pass through three distinct phases: survival, inquiry, and sophistication. We can easily memorize them with three words: how, why, and where.
00:01:09.520 Specifically, we can expand the questions to: how can we eat, why do we eat, and where should we have lunch? Let's talk about testing. Likewise, my relationship with testing has gone through three distinct phases as well. The survival phase is about how to use test doubles. The inquiry phase focuses on why I test, and sophistication has come when I arrived at the question: where should I begin to delete my old tests?
00:01:22.640 It turns out that not every test deserves a place in your test suite. In this talk, I will try to answer all three of these questions. First, I'll provide a higher-level overview of the testing universe, and then we'll dive into some specific cases of tests that should be deleted. I'll wrap this up by sharing tips on how to have a stronger, more sustainable relationship with your testables.
00:01:35.680 Let's begin by defining some terms that I'm going to use frequently in this talk to avoid any ambiguity. 'Testable' is the catch-all term for all the fake objects that you use in tests to substitute for the real thing. 'Stub', 'mock', and 'spy' are the three most common categories of testables.
00:01:45.840 We won't go into the differences between the three today, but just know that they are somewhat interchangeable for the purpose of this talk. The 'subject' is the thing that you want to test, and a 'dependency' is another module or any external component that the subject knows about. For example, if module A calls module B, we say that module A depends on module B.
00:02:00.080 Now, imagine a perfect world where every piece of software is a joy to maintain, and JavaScript doesn't exist. Even better, the tests in your codebase look like a pyramid. At the bottom, we have unit tests, which should have the highest percentage in your test suite because they are cheap and fast. Integration tests ensure that when several independent components are put together, they actually work. At the top, we have acceptance tests, or end-to-end tests, which comprise a smaller portion because they are expensive to run and maintain.
00:02:16.000 That's the ideal situation. In reality, you probably won't achieve that. There will be tests where you can kind of see their intention, despite what they have become, and there will also be tests that are just as confused as you are. So we have integration tests, wannabe tests, and helicopter unit tests, both of which I will explain in a minute. At the top, we have screen tests, which are actually very cheap to maintain because there are no tests; you only need a screening manager to know when things have gone wrong.
00:02:32.239 Let's first look at integration tests. Let's talk about lunch. Imagine a restaurant service that has one method: serve french toast. This is the implementation. First, it retrieves a chef and a waiter from their respective classes. The chef makes the french toast, which the waiter serves. In the middle, there is a condition: if the weather is angry, they will eat the french toast and end the operation by raising an error.
00:02:56.160 Now, we open the test file and immediately know that it is an integration test because there are no variable assignments in these two lines, so they must be changing some state of the system. We surmise that they are all Active Record objects, and that's fine. What’s not fine is this line: a test double for a method called 'floor'. We didn’t see that in our production code.
00:03:05.280 To find out what this testable is for, we need to look at each line thoroughly. First, we look at the chef, who sends a data query to the database. Not what we're looking for. The second line is getting a waiter. This is more interesting because it not only makes a database query but also executes a database command. At this point, you might be considering one thing: it’s the callback.
00:03:24.639 We see that the update action triggers this piece of code in the before_save hook, and sure enough, we find the floor. The logic reads: unless the floor is scrubbed, the waiter has to scrub it and be angry. Remember, when the waiter is angry, they will eat the food and raise an exception. But because we are testing the happiness of service, we don't want to trigger these to do so.
00:03:38.720 To avoid that, the simplest solution is to use a testable so that the test will pass, and the world will be perfect. So, Mr. Soft, you're feeling good. Except that you just witnessed the integration test that has been somewhat manipulated because of the random test double without proper context. This leads to tests with randomly inserted testables because the real objects are too awkward to deal with. This is when you find yourself reaching for testables because it is too painful or just too awkward to set up the real thing.
00:04:00.960 The downside is that if you have random test doubles like this, you will have random test failures in the future. Fast-forward one month: in the waiter class, someone decided to change a method. Now your restaurant test fails because some distant implementation detail has changed—not because your production code is broken. This is really sad for people wanting to do refactoring.
00:04:29.679 Katrina Owen, founder of Ecosystem and co-author of a notable OOP book, confessed her love for refactoring in one of her brilliant talks and I completely agree with her that refactoring is therapeutic, but with one caveat: it can be detrimental if not handled properly. Another reason to avoid this is that it hurts productivity. Imagine another developer joins the team, sees a random test double in a test, and is immediately faced with two options: ignore it and feel guilty afterward for not checking it, or dive into the rabbit hole to confirm that it is really just a boring implementation detail.
00:04:53.760 Test doubles create a contract between the test subject and other objects. When we decide to use a testable in a test, we declare that contract, asserting that it won’t change. Before we explore solutions, let’s look at another category—the helicopter unit tests. Let's talk about lunch again. The restaurant service is expanding to include cheese paninis to the menu. Conceptually, it's similar, but it’s slightly more complex.
00:05:18.560 The waiter must now obtain bread from a cupboard and cheese from a fridge. If they get angry, they will still have to eat the food. Before we write the test, let's review the dependency graph. This is what we know: restaurant service depends on the chef and waiter classes. The chef talks to the database and so does the waiter. The waiter also knows about the cupboard, fridge, and floor.
00:05:44.400 At this point, we remember that the floor class has multiple messy dependencies. Let’s think about our testing strategy. If we write unit tests, we should isolate our restaurant service from dependencies, so we need to use testables somewhere. The most obvious targets are the database and the floor because they reach out to the outside world. Hence, we apply our testables here. However, with them gone, we also need to remove the cupboard and fridge.
00:06:13.360 Here's the problem: we need to set up cupboard leading to bread and fridge leading to cheese assignments for the waiter. Adding new methods to access these ingredients increases complexity. Eventually, we realize that this method involves too many implementation details, and we end up creating complex test setups for what should be a straightforward process.
00:06:32.080 In essence, when you try to control every implementation detail thoughtfully in your tests, it leads to helicopter unit tests. These tests become detached from reality, meaning they can pass even when the production code may be breaking. If your tests cannot catch errors in your production code, you're denying their existence in the coding world. When mixing decisions and delegations, it creates a severe burden in testing, as every decision requires a corresponding setup of delegations.
00:06:53.760 If you change the logic in any method, you must alter the tests as well, otherwise they will fail. And that's especially brutal since for unit tests, we are usually required to validate every scenario. It isn’t just one test that becomes unbearable; it’s many tests that need to be revised. This paints a bleak picture of testing, but in the remaining parts of this talk, I'll share practical tips to help you navigate these situations.
00:07:12.480 If you forget everything else from this talk, which I sincerely hope you won’t, at least remember this: if you find it really hard to write a test without it feeling awkward, it may be time to consider changing your production code. Tests provide surprisingly good design feedback because they are the first consumers of your API. They can be the friend who will slap you in the face when you make a poor decision.
00:07:30.720 Tests reflect karma; if it feels impossible to escape a situation where you must choose between writing a helicopter unit test that cannot catch errors or an integration test with holes, it indicates a flaw in your production code. So, how do you avoid these scenarios? Here are three tips I've found useful to consider. First, execute the principle of least knowledge. This principle can be briefly defined as: don’t talk to strangers.
00:07:49.440 To illustrate, let's consider a hypothetical method that retrieves the answer to life—referred to as 'life', 'the universe', and 'everything'. At first glance, the dependency graph seems straightforward. However, 'life' also relies on 'universe' and 'everything', as it needs to extract information from those modules. This leads to a structure where dependencies are tightly coupled.
00:08:04.720 If everything changes, both 'life' and 'universe' must adapt too. This situation becomes exhausting and increases cognitive load for developers. The principle of least knowledge suggests we should strive for low coupling by only communicating with immediate neighbors. The example reveals a method chain, which often indicates a violation of this principle. Breaking it into separate lines won’t improve the situation.
00:08:20.640 Returning to our hypothetical 'cheese panini' method, we identify two places where this code violates the principle of least knowledge. Ideally, we shouldn't need to know how the waiter fetches cheese and bread; we only need to trust that they obtain them. Hence, we encapsulate those methods within the waiter class.
00:08:39.440 By doing so, we eliminate marking code that was unnecessary for our tests, as we no longer need those convoluted setups. The tests appear simpler, as we primarily mark the waiter and chef class instead of the entire universe of dependencies.
00:08:57.120 This leads us to the introduction of the second tip: separate decisions and delegations. Each piece of code can fall into two categories: decision—referring to business logic—and delegation, which pertains to the relationship between this code and other modules. Let's revisit the 'cheese panini' method and evaluate each line in terms of decision or delegation.
00:09:16.160 This method highlights a classic example of mixing delegations and decisions. The downside of this mix is that if you want to test it, you'll need to undertake a painful setup for every branching decision, and we would ideally want to isolate delegation by testing the contracts. This requires us to pay extra attention to those decision branches during testing.
00:09:38.320 To combat this, we should think about boundaries and the responsibilities clearly. By dividing responsibilities—getting resources, preparing ingredients, making dishes, and serving dishes—we can identify areas that sink into the mix of delegation and decision management.
00:09:59.360 For example, we can encapsulate the decision-making logic into the waiter and chef classes. This results in straightforward code that reads: 'First, find an available chef and waiter. The waiter prepares the ingredients; the chef makes the cheese panini, and the waiter serves it.'
00:10:16.880 The tests for this new implementation are similarly straightforward, only requiring a singular scenario since decisions no longer impact these tests. I would not even unit test this function; just utilize an integration test to cover the entire functionality. Let’s explore the separate independent components.
00:10:37.680 The waiter would implement the methods to get the ingredients for cheese paninis, and serve the dish. The chef would be responsible for its 'make cheese panini' method. By integrating those specific implementations into their respective classes, testing each becomes easier as we won't even need to use test doubles.
00:10:57.760 Now that those functionalities are allocated to their respective classes, testing them becomes very straightforward. It's also valuable to inform you there are plenty of well-written unit tests that you can write in scenarios like this without overwhelming the code.
00:11:15.440 Returning to our decision and delegation chart: it becomes evident that the previous components should focus on decision logic, aligning nicely with unit tests. It remains cost-effective because we're primarily testing logic. Conversely, components emphasizing delegation like our original service methodology align well with integration tests since we wish to validate whether all components function effectively together without any test doubles.
00:11:35.760 In other words, the best way to avoid abusing your testables is to minimize your usage of them. This pertains back to our concept of the testing pyramid; if we design our system in a certain way, we will naturally have fewer integration tests.
00:11:56.560 As we approach our final point, it’s essential to make dependencies obvious. Until now, we haven't discussed dependencies in detail, specifically, how we perceive them. We know dependencies are critical—they inform why testables exist in the first place. But there are two types of dependencies: those that don't have side effects and those that do.
00:12:16.080 The ones without side effects are preferable since we don't need to guess whether a remote state will change. On the other hand, those that do require careful management as they can unexpectedly alter the state of your system. We want to make these dependencies as obvious as possible; at least if they are unstable, we should anticipate them.
00:12:37.760 Let us return to the initial example of the french toast. Recall how hard it was to write the integration test due to the multitude of hidden dependencies within the floor class. The dependencies hidden within a before_save callback inside the waiter class add complexity, causing unforeseen issues.
00:12:57.040 It’s crucial to recognize that the database is another entity possessing side effects, which require similar consideration. So, how do we render these dependencies apparent? Well, what we can see clearly is the restaurant service class and the served method.
00:13:18.080 What if we relocated unstable dependencies within this module? We can slightly rearrange the code to enhance clarity, which allows us to establish a new module called 'restaurant repo'. This module can manage those messy dependencies with clean dependency injection, consequently reversing the direction of those dependencies.
00:13:36.160 To illustrate this in code, we would first identify dependencies with hidden instability—such as time-based functions—and segregate them into the restaurant repo via dependency injection. With that approach, our implementation of the 'serve french toast' method becomes much simpler as we only need to instantiate an object that implements the necessary methods and pass it as an argument.
00:13:54.720 Thus, we eliminate unexpected random test doubles, leading to much clearer unit tests and less convoluted integration tests, supervised by the restaurant repo. That’s the core takeaway for today. If you find yourself awkwardly using testables, reflect upon the design of your code.
00:14:11.520 Prioritize writing clean, maintainable code to prevent resorting to awkward dependencies or artificially faking them. Thank you for your time! One last thing before I let you go—I want to share some valuable resources regarding testing software architecture and refactoring.
00:14:29.600 These resources are certainly not comprehensive, but I’ve done my best to gather some highly recommended talks and relevant books that I believe will benefit you in your journey. Here’s how you can find me online. Please don't hesitate to reach out via the Discord channel if you have any questions or feedback for me.