Test Doubles are Not To Be Mocked

by Noel Rappin

In the talk "Test Doubles are Not To Be Mocked" presented by Noel Rappin at RubyConf 2016, the speaker explores the complexities surrounding the use of test doubles in software testing. Rappin emphasizes that test doubles, often referred to as mock objects, are commonly misunderstood and misused, and that their correct application can expose underlying assumptions in the code.

Key points discussed in the video include:

Ambivalence Toward Test Doubles: Rappin shares his fluctuating views on test doubles, revealing a history of ambivalence due to inconsistent experiences. He notes that while test doubles can simplify certain scenarios, they can also complicate the testing process.
SWIFT Framework for Testing: The speaker introduces his acronym, SWIFT (Straightforward, Well-defined, Independent, Fast, Truthful), as criteria for evaluating the effectiveness of tests considering how test doubles impact test outcomes.
State Tests vs. Spy Tests: Rappin contrasts two testing approaches: a state-based test that relies on real objects and a spy-based test that uses test doubles to assert behavior. He discusses the implications of both on the clarity, independence, and speed of tests.
Importance of Test Isolation: The need for test isolation is highlighted as a crucial design goal, suggesting that tests should fail independently when a specific piece of code breaks, thereby simplifying debuggability.
Integration of Test Doubles: Rappin suggests that while using test doubles can lead to clearer tests, it requires careful design consideration to ensure tests remain reliable even when the code structure evolves.
Real-World Challenges: He acknowledges the challenges posed by third-party frameworks, such as Rails, which complicate the isolation of code and the use of test doubles, leading to a potential reliance on integration tests at the cost of unit tests.

In conclusion, Rappin encourages developers to consider their approach to testing carefully. He advocates for maintaining a balance between using test doubles effectively and acknowledging their limitations in ensuring comprehensive test coverage. The ultimate goal is to use tests to create well-structured, maintainable code while avoiding the pitfalls that can arise from misuse of test doubles. The underlying message is to adopt a thoughtful approach to testing, focusing on what makes tests fail and the implications of that failure for overall code design.

00:00:14.850 All right, I'm going to start because someone is possibly waving at me to start, or either that or giving me the finger. Excellent! I can't tell because of the lights.

00:00:20.740 So, how's the conference going for you all so far? Good? Yeah? All right, cool.

00:00:26.490 So, sometimes this happens to me at conferences. Thank you! You made the exact same sound that the audience made when this happened. This laptop survived, but it’s going to be put to pasture in about two weeks. So, this is its last conference.

00:00:40.090 I'm very excited. This is a talk about test doubles. It is not the talk about test doubles that was before lunch; that was Justin. If you came here expecting to see Justin, you were about two hours late, and I'm sorry. Justin is not here; he couldn't stick around. So, I am allowed to mock him.

00:01:07.570 In keeping with the theme of this talk, I'm going to mock him in our spec syntax. So, you can all expect Justin to receive this talk and return politely. Justin is actually tremendously polite to disagree with, so I recommend it. If you ever find yourself in that situation, I recommend disagreeing with Justin.

00:01:31.990 Unlike Justin, who made a joke about writing a book, I actually wrote a book. I've written a couple, and particularly this one, which you should buy. It’s great. It’s not the point of the talk, but the point is I wrote this book called "Take My Money: Accepting Payments on the Web," which you can get at PragProg. Not that this is the point here.

00:01:56.290 However, the thing about writing a book like this is you have to write example code. Code for a sample, like for a book-length sample project, is this sort of weird version of software engineering where some of your constraints go away, like the constraint that it actually has to be valid in a production environment. Yet, some of your other constraints are super strong; like you want the code to be extra clear and extra idiomatic because it’s a teaching or explanatory exercise. You have an extra burden. At least, I feel that way.

00:02:39.010 So, the relevant point for this talk is that pursuant to writing this book and creating this example, I wrote some tests. I am a Ruby engineer and known for writing tests. I felt like I needed to write tests to go along with my sample application, so I did.

00:03:04.120 I thought I'm going to do this in the super purest way. I have these middleware workflow kind of objects. They are not the Active Record objects, but they deal with the Active Record objects. I’m going to write tests that use a lot of test doubles. It started off writing a bunch of tests that kind of looked like this, and we'll talk about the details of what this actually does in about ten minutes.

00:03:31.990 Essentially, this is a rather simple structure where you have a couple of ticket objects and this discount object. Rather than them being Active Record objects, they are specifications made with their test doubles. This worked fine, but eventually, as the sample application progressed, I wound up rewriting things.

00:03:46.510 If you look at later examples in the book—and I don’t really make a point of this in the book itself—but if you look through the code samples, those tests eventually wind up looking more like this, where I'm actually creating Active Record objects. These are simplified examples. So, I wind up creating the actual Active Record objects, and they are not doubles; they are actual Active Record objects.

00:04:10.420 I am doing what would be considered a more standard test, and this talk, to some extent, is about why I do that. Like, why I start off trying to test double all the things, and why I often, especially in a Rails context, wind up pulling back and replacing them all with real objects. I have been somewhat ambivalent about how to use test doubles for years.

00:04:39.849 Sometimes people call my writing about this stuff non-dogmatic, which is a polite way of saying ambivalent or unsure. But I can actually prove this. I have a paper trail here. This was written in 2011 and published in 2011, written probably in 2010. It says, 'As much as I love using mocks and stubs to cover hard-to-reach objects and states, my own history with very strict behavior-based mock test structures hasn't been great. My experience was that writing all the mocks for a given object tended to be a drag on the test process, but I'm open to the possibilities that the method works better for others or that I'm not doing it right.'

00:05:02.949 Three years later, I did not intend this when I proposed this talk. I did not intend this to be a tour through my entire library, but it is, and you are all just captive audience. Three years later, in 2014, I rewrote the entire book basically top to bottom, but I said basically the same thing: My opinion about the best way to use mock objects changes every few months. I'll try some; they'll work well. I'll start using more mocks; they'll start getting in the way; I'll back off, and then I'll think, 'Let’s try some mocks.' This cycle has been going on for years, and I have no reason to think that it's going to change any time soon.

00:05:56.809 I suspect this describes at least some of you in the audience. Don't clap or anything. You can nod or at least look up from your laptops or something like that. So, why does this happen to me? Like, I'll watch a Justin Searles talk, or I will read something in a book like "Growing Object-Oriented Software," and I will think mock objects, that test doubles are really the way to go for this testing.

00:06:57.780 Then I will try them in practice, and I will roll back. So this is what I'm wrestling with. More to the point, what can I do about this? Is this a useful process? Should I pick a side and stay there? What should I do?

00:07:15.320 This talk is called 'Test Doubles Are Not to Be Mocked.' My name is Noel Rappin. I work for a consulting company in Chicago called Table XI, and you can find me on Twitter at @noelrappin. There are some other URLs at the bottom of the footer here that might be of interest to you.

00:07:57.200 So, to be clear about one thing, there’s one usage of test doubles that I consider not controversial, and that is not what I'm talking about. A really common use case is this: this is just a standard test, but I’m using a test double in the first line to wrap a Stripe charge, which is Stripe’s payment gateway.

00:08:08.840 This is wrapping an interaction with the third-party networking—basically a third-party API that would normally be called over the network. So, I'm doing this. I'm using this test double to avoid having to make this complicated network call or to put the system in a state that might be hard to replicate line by line.

00:08:20.750 I’m actually using it as a pure stub to replace something or to help me get in a certain state. That is not the controversial use I'm talking about.

00:08:29.510 So, the non-controversial piece here is to replace a heavyweight object or to specify failure states. That’s great! I do that all the time, I expect to continue to do that, and that’s not really the focus here.

00:08:41.180 But I wanted to back up before I show a couple of examples to establish a sort of framework for this discussion. What we are talking about is tests being useful versus tests not being useful. What makes me want to pull these tests back? What makes tests good? I have a somewhat tortured acronym for this: SWIFT—Straightforward, Well-defined, Independent, Fast, and Truthful.

00:09:22.430 Straightforward means that you can tell what the test does just by looking at it; it is easy to read. Well-defined means that it will replicate itself, always yielding the same result no matter how often it’s run. Independent means it doesn't depend on tests or other state of the code. Fast, I think, is self-evident; and Truthful means it is an accurate representation of the code. If the code is broken, the test fails, and if the code works, the test passes.

00:09:49.710 So that’s how I’m going to evaluate the test samples I show. In the long term, the priorities for tests are important. One of the things about tests is that they are an aid immediately in development, but then they stay in your test suite forever.

00:10:02.550 So something like 'fast' might be a much more important priority later in the test suite than it is when you’re actually developing code. Ultimately, though, a good test leads to well-designed code, which is a whole other rabbit hole about what well-designed code is.

00:10:41.230 I'm not going to get into that very much, other than to say I would consider well-defined code to be clear and easy to change. Tests enable us to get to well-defined code in three ways, more or less.

00:10:51.940 Tests let you engage in domain discovery. This is the idea behind Test-Driven Development; you write your tests first, allowing you to learn something about your domain as you write the test so that you understand it. The act of writing the tests causes you to understand the code that you’re about to write.

00:11:04.030 Also, obviously, tests validate behavior, and then they act as a safety net when you're trying to change your code so you know you can change it without breaking anything.

00:11:38.890 Now, I want to talk about two tests that do the same thing. The code under test here calculates the total price of multiple tickets given a discount. This is example code for a small theater that’s selling tickets. They sometimes give out discount codes, so there’s a calculator that needs to put that all together into a number.

00:12:12.570 One way to write this test is to have what I hate to call a standard state-based test. You create two tickets, you create a discount code, and you pass all of those off to your price calculator. You tell the calculator to go, and at the end of it, I expect the calculator to come out with, in this case, thirty dollars for two tickets with a twenty-five dollar discount.

00:12:28.080 Most of us would not blink if we came upon a code review and someone wrote this back; this is a fairly standard way to write this test. Another way to write this test is to say that the price calculator should not depend on actually having tickets or having discount codes. We can use spies.

00:12:49.080 So this is using RSpec syntax. Instead of creating two tickets and a discount, I’m creating three spy objects. I’m passing them to a real price calculator as before, I’m doing the calculation, and I actually am making the same expectation here.

00:13:10.740 But more importantly, I have an expectation in the last two lines that I'm expecting the calculator to call this price check method on the tickets. There are two differences I want to point out here between the two tests.

00:13:46.960 First, in the first test, I'm setting a base attribute—the price of the ticket. In the second test, I'm setting the expectation of like a derived attribute, the price check. I don't actually care what the underlying actual cost of the ticket is; all it cares about is what comes out of what I call this price check method.

00:14:05.750 I could make this final line here, the price check line, another behavior expectation. It just got really convoluted, so to keep it a little bit clearer, this test is making the same check of the behavior of the price calculator. It also is checking that the calculator does certain things along the way.

00:14:25.990 So, the state test is using real objects, and it is making an expectation on the state of the world at the end of those tests. The spy test is using test doubles, and it is making expectations on the behavior of the calculator along the way. So that is a difference of philosophy, but does it actually affect how these tests live in our test suite over time and interact with the rest of our code?

00:15:10.960 I want to go through those five SWIFT checks really quickly. Which test is more straightforward? The state test is, I think, most people will have an easier time understanding the state test than the test double test unless you’ve done a lot of testing with test doubles. That’s generally been my experience.

00:15:47.210 Well-defined? They’re both equally well-defined; they will both equally replicate themselves over time. The spy test is more independent because the state test depends on the behavior of the ticket object, and the spy test does not depend on that. The spy test is also faster. As I have it written, the state test is creating database objects, which makes the spy test a lot faster.

00:16:24.210 If the state test wasn't doing that, it would still probably be a little bit slower. Truthful is interesting. One of the things I like to do when I think about what tests I'm writing next or what tests I should be writing is to think about the circumstance in which the test fails.

00:17:02.710 When I write a test, I think about what will cause this test to fail, and in the case of these two tests, they will fail under different situations. If the actual price calculator has a bug after it gets the information from the tickets and before it converts that into a final price, both tests will fail.

00:17:37.860 But if the ticket has a bug—if there's a flaw in that price check method that the ticket is using—the state test will still fail, but the spy test will pass. The spy test doesn't actually touch that part. Even if the ticket class doesn't exist at all, if that code hasn't been written, the state test will fail, but the spy test will pass.

00:18:13.570 This is where a lot of people jump off the mock object bandwagon. Many people say this is a terrible situation: the code can be wrong, and the test will still pass. This is a huge problem, and in my experience, many people give up on test double testing at that point.

00:18:49.090 I want to argue here that saying the code can be wrong and the double test will still pass is actually kind of a matter of perspective because the double test implies the existence of other tests in a way that the state test does not. So if I’m saying that this test uses test doubles to test this calculator object but not to test the ticket object, I’m implying that somewhere else I’m going to write tests that do touch that ticket object and test that behavior.

00:19:22.250 To put it another way, the truthfulness of these two tests is that if the discount logic were to get more complicated—let’s say they introduced double coupon Wednesdays, or something like that. You'd get a situation where the ticket API would change, and the double test would still pass while the state test would fail.

00:20:02.830 This becomes a problem because, nominally, there is breakage in the code that’s not in the thing being tested, but the test fails. This can be a real issue in larger test suites. If the tests are not independent, you can write some code, and you get a bunch of test failures on the other side of the planet, making it very hard to track down where they are.

00:20:49.570 This leads me to a design goal I have, which pushes me toward using more test doubles—the idea that a failure state causes exactly one test to fail. This is impossible in practice in part because you probably have integration tests and unit tests that cover the same ground.

00:21:21.830 However, it’s a good way to think about isolating tests. If this test fails, I know to look here for the remedy. The double version of the test tests both the behavior and also the design of the code in that I’m making strong claims about the design of the code that need to be true for the test to pass.

00:21:49.100 Sometimes one of the complaints about the heavy use of test doubles is that they make it harder to refactor because changing the design breaks the tests. I would argue that this is at least potentially a design consideration that is somewhat intentional. That’s actually part of what you’re getting when you use mock objects.

00:22:13.320 If you make good claims about the structure of the test, then your refactoring won’t necessarily have problems. However, if you make poor choices about what you are asserting in the design, then you're going to feel pain—much like the pain experienced if you make poor choices about the parts of the status in the code that you use.

00:22:51.440 The double version of the test encourages the creation of additional tests; it prompts more isolated unit tests. As I said before, if you only utilize the state tests, you might end up putting a lot of logic into the ticket class without writing new tests for it because it’s nominally already covered by the existing state test. This can become a problem.

00:23:54.090 It leads to slower tests and a lot of not well unit tested logic because those units just happen to get called by another piece of code. Creating additional tests is a good thing because they can encourage good design practices. I think what I’m saying here is that, much like Betsy discussed this morning, boring code can lead to better tests.

00:24:30.530 If you write your tests in such a manner that encourages the creation of smaller units, you will indeed obtain smaller units. Conversely, if you only have integration tests, you won’t have design pressure pushing you to create smaller units.

00:25:02.810 That said, writing a lot of unit tests, writing extra unit tests in this case is not ideal, since the design might change. Many people struggle to drive the next failing test.

00:25:34.630 When they write that test with the test doubles and it passes, but the code is incomplete, they often have trouble understanding what the next step is. The things that are stubbed out by that double test need to be themselves written and tested.

00:26:07.880 One of the advantages of what is called outside-in testing, where you start with an end-to-end integration test and then fill it in piece by piece, is that you have that failing integration test that helps drive the next piece of unit testing that you need to write.

00:26:46.280 One of the things about writing a lot of tests with test doubles is that it implies a lot of tests and code isolation. I believe that many people are often reluctant at the beginning of a project to commit to a really elaborate object-oriented structure, even when it seems necessary later on.

00:27:38.930 It feels like overkill in a small project. Still, there’s an intuitive sense that getting those things in at the beginning of a project will be beneficial over the long term. A testable testing structure can serve as a design canary. If you need to list all of the collaborators with a given method, and you need to stub them one by one, it makes you very sensitive to the number of collaborators you have.

00:29:04.650 Having extra collaborators and dependencies will often cause noticeable pain during test writing. The test writing will become longer, there will be more setup, and it will be evident where the setup has become complex. Often, people will respond by stopping the use of test doubles.

00:29:41.380 They may choose to reduce complexity. I think this can be problematic, as I have walked into projects that have large block scripts that are heavily duplicated. I may start to break down this duplication to make it easier to test, modifying the classes and methods, but end users might find that the original structure matches their mental model.

00:30:27.960 And once I split it into something that most might consider better coded—it’s less clear. Design can be subjective. Being the only person on a team that cares about a specific level of isolation and testing can lead to tough situations.

00:31:03.480 You may feel pressure to adhere to a style that is not shared by others. This creates discontinuity in the design efforts. Third-party frameworks can also complicate the use of test doubles.

00:31:23.370 Many people in this room have experience with such frameworks, including Rails, which is designed for developer simplicity, making it harder to isolate code for test doubles. For instance, association proxies in Active Record are nearly impossible to cover with test doubles without compromising the functionality.

00:32:07.230 With that said, I wrote those spy tests, and at some point, either the Rails complexity or the complexity of the underlying logic made code isolation difficult. I threw my hands up; I was unwilling or unable to make the changes necessary to isolate the code while maintaining clarity.

00:32:53.880 This leads to a design tension. I generally love encapsulation, but it does come with its own costs. A couple things to take away from this: I have no idea where I am on time, so I’ll assume that time has continued to flow.

00:33:39.610 Remember that Conway’s law applies to tests as well; the structure of your code will match the structure of your organization. This implies that the way you approach testing and the structures you maintain in testing have implications for the way you design your code.

00:34:36.320 It’s crucial to consider what will cause tests to fail rather than just what will make them pass. So, thinking about what will make the tests fail can be a different perspective than focusing solely on success. In conclusion, I appreciate everyone for their time. Thank you.